CN109918270B - Multi-server system, error detection method, system, electronic device and storage medium - Google Patents

Multi-server system, error detection method, system, electronic device and storage medium Download PDF

Info

Publication number
CN109918270B
CN109918270B CN201910221795.1A CN201910221795A CN109918270B CN 109918270 B CN109918270 B CN 109918270B CN 201910221795 A CN201910221795 A CN 201910221795A CN 109918270 B CN109918270 B CN 109918270B
Authority
CN
China
Prior art keywords
server
debugging
abnormal
target
logic device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910221795.1A
Other languages
Chinese (zh)
Other versions
CN109918270A (en
Inventor
杨志民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN201910221795.1A priority Critical patent/CN109918270B/en
Publication of CN109918270A publication Critical patent/CN109918270A/en
Application granted granted Critical
Publication of CN109918270B publication Critical patent/CN109918270B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The application discloses a multi-server system, a debugging method, a system, an electronic device and a computer readable storage medium, the multi-server system comprises a plurality of servers, each server comprises: the logic device is connected with the debugging port corresponding to the server; the BMC is connected with the logic device; wherein, the logic devices in each server are connected through a communication link. According to the multi-server system, the logic device is added in each server, each debugging port is connected with the BMC through the logic device, and each logic device in the system is connected. When the debugging port corresponding to a certain server fails, the abnormal debugging result of the server can be output through other normal debugging ports through the communication link between the logic devices, and the redundant design of the abnormal debugging under the multi-server system is realized.

Description

Multi-server system, error detection method, system, electronic device and storage medium
Technical Field
The present application relates to the field of computer technologies, and more particularly, to a multi-server system, a debugging method, a debugging system, an electronic device, and a computer-readable storage medium.
Background
In the multi-server system in the prior art, as shown in fig. 1, each server corresponds to a single error detection port, and after performing an abnormal error detection operation on the server by a BMC (Baseboard Management Controller, full name of chinese) chip, a result is output through the error detection port corresponding to the server. By adopting the scheme, when the debugging port is abnormal, the abnormal debugging result can not be output, and a user can not monitor the system and collect error messages through the debugging port.
Therefore, how to implement redundancy design for exception debugging in a multi-server system is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
An object of the present application is to provide a multi-server system, a fault detection method, a system, an electronic device, and a computer-readable storage medium, which implement redundancy design for exception fault detection in the multi-server system.
To achieve the above object, the present application provides a multi-server system, comprising a plurality of servers, each of the servers comprising:
the logic device is connected with the debugging port corresponding to the server;
the BMC is connected with the logic device;
wherein, the logic devices in each server are connected through a communication link.
Wherein the logic device comprises a CPLD.
The servers correspond to the debugging ports one to one.
In order to achieve the above object, the present application provides a debug method applied to a logic device in the multi-server system, including:
when an abnormal debugging command is received, judging whether a target server needing abnormal debugging is the server or not according to the abnormal debugging command;
if not, forwarding the abnormal debugging command to a target BMC in the target server through a target logic device in the target server to obtain an abnormal debugging result;
and sending the abnormal debugging result to a debugging port corresponding to the server.
If the target server needing to perform the abnormal debugging is the server, the method also comprises the following steps;
sending the abnormal debugging command to the BMC in the server to obtain an abnormal debugging result;
and sending the abnormal debugging result to a debugging port corresponding to the server.
In order to achieve the above object, the present application provides a debug system applied to a logic device in the multi-server system, including:
the judging module is used for judging whether a target server needing to be subjected to the abnormal debugging is the server according to the abnormal debugging command when the abnormal debugging command is received;
the forwarding module is used for forwarding the abnormal debugging command to a target BMC in the target server through a target logic device in the target server to obtain an abnormal debugging result when the target server needing abnormal debugging is not the server;
and the abnormal debugging result sending module is used for sending the abnormal debugging result to the debugging port corresponding to the server.
Wherein, also include;
and the sending module is used for sending the abnormal debugging command to the BMC in the server to obtain an abnormal debugging result and starting a working process of the abnormal debugging result sending module when a target server needing abnormal debugging is the server.
To achieve the above object, the present application provides an electronic device including:
a memory for storing a computer program;
a processor for implementing the steps of the error detection method when executing the computer program.
To achieve the above object, the present application provides a computer-readable storage medium having a computer program stored thereon, which, when being executed by a processor, implements the steps of the error detection method as described above.
According to the above scheme, the multi-server system provided by the present application includes a plurality of servers, each of the servers includes: the logic device is connected with the debugging port corresponding to the server; the BMC is connected with the logic device; wherein, the logic devices in each server are connected through a communication link.
According to the multi-server system, the logic device is additionally arranged in each server, each debugging port is connected with the BMC through the logic device, and each logic device in the system is connected. When the debugging port corresponding to a certain server fails, the abnormal debugging result of the server can be output through other normal debugging ports through the communication link between the logic devices, and the redundant design of the abnormal debugging under the multi-server system is realized. The application also discloses a debugging method, a debugging system, an electronic device and a computer readable storage medium, which can also realize the technical effects.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is a block diagram of a multi-server system of the prior art;
FIG. 2 is a block diagram illustrating a multi-server system in accordance with an exemplary embodiment;
FIG. 3 is a flow diagram illustrating a method of debugging according to an exemplary embodiment;
FIG. 4 is a block diagram illustrating a debug system in accordance with an exemplary embodiment;
FIG. 5 is a block diagram of an electronic device shown in accordance with an example embodiment.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application discloses multi-server system, including a plurality of servers, every the server all includes:
the logic device is connected with the debugging port corresponding to the server;
the BMC is connected with the logic device;
wherein, the logic devices in each server are connected through a communication link.
Taking the example that the multi-server system includes 2 servers, as shown in fig. 2, each server includes a BMC100 and a logic device 200, and the BMC and the logic device are connected through a communication link and can transmit an exception debug command, an exception debug result, and the like. The present embodiment does not limit the kind of Logic devices, such as CPLD (Complex Programmable Logic Device, chinese full name).
The error detecting ports are UART (Universal Asynchronous Receiver Transmitter/Transmitter), each server has a corresponding error detecting port, a user sends an abnormal error detecting command to the server through the error detecting ports, the running state of the server is monitored, error information of the running of the server is collected, and abnormal error detecting results are output through the error detecting ports.
The debugging port is directly connected to a logic device in the server in an interfacing manner, an abnormal debugging command of a user sequentially passes through the debugging port and the logic device to reach the BMC in the server, the BMC is detected to generate an abnormal debugging result, and the abnormal debugging result is output sequentially through the logic device and the debugging port.
The logic devices in different servers are connected through communication links, and the logic devices realize transmission codes of the debugging ports. The logic device can perform shunt control on the received abnormal debugging command, namely when the abnormal debugging object of the abnormal debugging command is the server, the abnormal debugging command is sent to the BMC in the service, when the abnormal debugging object is other target servers, the abnormal debugging command is sent to the target logic device in the target server, and the target logic device forwards the abnormal debugging command to the target BMC in the target logic device.
By the method, when one debugging port is abnormal, namely the abnormality debugging of the corresponding server can not be carried out through the debugging port, a user can carry out the abnormality debugging on the server through the normal debugging port corresponding to the other server.
It should be noted that, in the present embodiment, the corresponding relationship between the server and the debug port is not limited, and may be a one-to-many relationship, that is, one server corresponds to a plurality of debug ports, for example, the server a corresponds to the debug port 1 and the debug port 2, and the user may perform the exception debug on the server a through the debug port 1 or through the debug port 2. When the debugging port 1 fails, a user can perform exception debugging on the server A through the debugging port 2, and when the debugging port 1 and the debugging port 2 both fail, the user can also perform exception debugging on the server A through other debugging ports by using communication connection between the logic devices.
Of course, a many-to-one relationship may be adopted, that is, a plurality of servers correspond to one debug port, for example, server B and server C both correspond to debug port 3, and a user may perform exception debugging on server B and server C through debug port 1. When the debug port 3 fails, the user can perform exception debug on the server B and the server C through other debug ports (such as the debug port 1 or the debug port 2) by using the communication connection between the logic devices.
Preferably, in order to simplify the internal design of the multi-server system, the servers correspond to the debug ports one to one. For example, server D corresponds to debug port 4, and the user can perform exception debugging on server D through debug port 4. When the debug port 4 fails, the user can perform exception debugging on the server D through other debug ports (such as the debug port 1, the debug port 2, or the debug port 3) by using the communication connection between the logic devices.
According to the multi-server system provided by the embodiment of the application, the logic device is additionally arranged in each server, each debugging port is connected with the BMC through the logic device, and each logic device in the system is connected. When the debugging port corresponding to a certain server fails, the abnormal debugging result of the server can be output through other normal debugging ports through the communication link between the logic devices, and the redundant design of the abnormal debugging under the multi-server system is realized.
The embodiment of the application discloses a debugging method, which realizes redundancy design of abnormal debugging under a multi-server system.
Referring to fig. 3, a flowchart of a debug method according to an exemplary embodiment is shown, as shown in fig. 3, including:
s101: when an abnormal debugging command is received, judging whether a target server needing abnormal debugging is the server or not according to the abnormal debugging command; if yes, entering S102; if not, the process proceeds to S103:
the execution subject of this embodiment is a logic device in the multi-server system in the previous embodiment. The user sends an abnormal debugging command through the normal debugging port, and the command is sent to the logic device connected with the command through the logic port. When the logic device receives the exception debug command, the logic device performs shunting according to the debug object corresponding to the exception debug command, that is, when the exception debug object of the exception debug command is the server, the logic device enters S102, and when the exception debug object is another target server, the logic device enters S103.
S102: sending the abnormal debugging command to the BMC in the server to obtain an abnormal debugging result;
in this step, the exception debugging target is the server, and the logic device sends an exception debugging command to the BMC in the server to detect the running state of the server and obtain an exception debugging result.
S103: forwarding the abnormal debugging command to a target BMC in the target server through a target logic device in the target server to obtain an abnormal debugging result;
in this step, the exception debugging object is another target server, the logic device sends the exception debugging command to the target logic device in the target server, and the target logic device forwards the exception debugging command to the target BMC in the target server to detect the running state of the target server, so as to obtain an exception debugging result. The abnormal debugging result is returned to the logic device in the server through the target logic device.
S104: and sending the abnormal debugging result to a debugging port corresponding to the server.
In this step, when the logic device obtains the exception debug result, it is output through the debug port connected to the logic device, that is, through the debug port through which the user sends the exception debug command.
According to the embodiment of the application, a logic device is added in each server, each debugging port is connected with the BMC through the logic device, and each logic device in the system is connected. When the debugging port corresponding to a certain server fails, the abnormal debugging result of the server can be output through other normal debugging ports through a communication link between the logic devices, so that the redundant design of abnormal debugging under a multi-server system is realized.
In the following, a fault detection system provided by an embodiment of the present application is introduced, and a fault detection system described below and a fault detection method described above may be referred to each other.
Referring to fig. 4, a block diagram of a debug system is shown according to an exemplary embodiment, as shown in fig. 4, including:
a determining module 401, configured to determine, when receiving the exception debugging command, whether a target server that needs to perform exception debugging is a local server according to the exception debugging command;
a forwarding module 402, configured to forward the exception debugging command to a target BMC in a target server through a target logic device in the target server to obtain an exception debugging result when a target server that needs to perform exception debugging is not the local server;
an exception debugging result sending module 403, configured to send the exception debugging result to the debugging port corresponding to the server.
On the basis of the above embodiment, as a preferred implementation, the method further includes:
and the sending module is used for sending the abnormal debugging command to the BMC in the server to obtain an abnormal debugging result and starting a working process of the abnormal debugging result sending module when a target server needing abnormal debugging is the server.
According to the embodiment of the application, a logic device is added in each server, each debugging port is connected with the BMC through the logic device, and each logic device in the system is connected. When the debugging port corresponding to a certain server fails, the abnormal debugging result of the server can be output through other normal debugging ports through the communication link between the logic devices, and the redundant design of the abnormal debugging under the multi-server system is realized.
With respect to the system in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated herein.
The present application further provides an electronic device, and referring to fig. 5, a structure diagram of an electronic device 500 provided in an embodiment of the present application may include a processor 11 and a memory 12, as shown in fig. 5. The electronic device 500 may also include one or more of a multimedia component 13, an input/output (I/O) interface 14, and a communication component 15.
The processor 11 is configured to control the overall operation of the electronic device 500, so as to complete all or part of the steps of the error detection method. The memory 12 is used to store various types of data to support operation at the electronic device 500, such as instructions for any application or method operating on the electronic device 500, and application-related data, such as contact data, messaging, pictures, audio, video, and so forth. The Memory 12 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically Erasable Programmable Read-Only Memory (EEPROM), erasable Programmable Read-Only Memory (EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia component 13 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 12 or transmitted via the communication component 15. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 14 provides an interface between the processor 11 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 15 is used for wired or wireless communication between the electronic device 500 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near Field Communication (NFC), 2G, 3G or 4G, or a combination of one or more of them, so that the corresponding Communication component 15 may include: wi-Fi module, bluetooth module, NFC module.
In an exemplary embodiment, the electronic Device 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described error detection method.
In another exemplary embodiment, a computer-readable storage medium is also provided that includes program instructions that, when executed by a processor, implement the steps of the above-described debug method. For example, the computer readable storage medium may be the memory 12 comprising program instructions executable by the processor 11 of the electronic device 500 to perform the error detection method described above.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, without departing from the principle of the present application, the present application can also make several improvements and modifications, and those improvements and modifications also fall into the protection scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

Claims (6)

1. A debugging method is applied to a logic device in a multi-server system and comprises the following steps:
when an abnormal debugging command is received, judging whether a target server needing abnormal debugging is the server or not according to the abnormal debugging command;
if not, forwarding the abnormal debugging command to a target BMC in the target server through a target logic device in the target server to obtain an abnormal debugging result;
and sending the abnormal debugging result to a debugging port corresponding to the server.
2. The debugging method of claim 1, wherein if the target server to be exception debugged is the local server, further comprising;
sending the abnormal debugging command to the BMC in the server to obtain an abnormal debugging result;
and sending the abnormal debugging result to a debugging port corresponding to the server.
3. An error detection system applied to a logic device in a multi-server system, comprising:
the judging module is used for judging whether a target server needing to be subjected to the abnormal debugging is the server according to the abnormal debugging command when the abnormal debugging command is received;
the forwarding module is used for forwarding the abnormal debugging command to a target BMC in the target server through a target logic device in the target server to obtain an abnormal debugging result when the target server needing abnormal debugging is not the server;
and the exception debugging result sending module is used for sending the exception debugging result to a debugging port corresponding to the server.
4. The debug system according to claim 3, further comprising;
and the sending module is used for sending the abnormal debugging command to the BMC in the server to obtain an abnormal debugging result and starting a working process of the abnormal debugging result sending module when a target server needing abnormal debugging is the server.
5. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the error detection method according to claim 1 or 2 when executing the computer program.
6. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the error detection method according to claim 1 or 2.
CN201910221795.1A 2019-03-22 2019-03-22 Multi-server system, error detection method, system, electronic device and storage medium Active CN109918270B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910221795.1A CN109918270B (en) 2019-03-22 2019-03-22 Multi-server system, error detection method, system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910221795.1A CN109918270B (en) 2019-03-22 2019-03-22 Multi-server system, error detection method, system, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN109918270A CN109918270A (en) 2019-06-21
CN109918270B true CN109918270B (en) 2023-01-10

Family

ID=66966242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910221795.1A Active CN109918270B (en) 2019-03-22 2019-03-22 Multi-server system, error detection method, system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN109918270B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354594A (en) * 2016-08-26 2017-01-25 浪潮(北京)电子信息产业有限公司 Fault-tolerance method and device of multi-controller communication, and NTB facility
CN109144584A (en) * 2018-07-27 2019-01-04 浪潮(北京)电子信息产业有限公司 A kind of programmable logic device and its starting method, system and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354594A (en) * 2016-08-26 2017-01-25 浪潮(北京)电子信息产业有限公司 Fault-tolerance method and device of multi-controller communication, and NTB facility
CN109144584A (en) * 2018-07-27 2019-01-04 浪潮(北京)电子信息产业有限公司 A kind of programmable logic device and its starting method, system and storage medium

Also Published As

Publication number Publication date
CN109918270A (en) 2019-06-21

Similar Documents

Publication Publication Date Title
US11789760B2 (en) Alerting, diagnosing, and transmitting computer issues to a technical resource in response to an indication of occurrence by an end user
US20090292951A1 (en) Method and device for fault location in a system
CN110674034A (en) Health examination method and device, electronic equipment and storage medium
CN111866083A (en) Equipment debugging system and method, equipment to be debugged, remote debugging equipment and transfer server
CN108418859B (en) Method and device for writing data
CN108306787B (en) Application monitoring method and related equipment
JP2021192214A (en) Method and device for verifying operation states of applications
CN110895469A (en) Method and device for upgrading dual-computer hot standby system, electronic equipment and storage medium
CN113986270B (en) Distributed application deployment method and device, storage medium and electronic equipment
CN107872363B (en) Data packet loss processing method and system, readable storage medium and electronic device
CN108362957B (en) Equipment fault diagnosis method and device, storage medium and electronic equipment
CN112015689A (en) Serial port output path switching method, system and device and switch
CN109918270B (en) Multi-server system, error detection method, system, electronic device and storage medium
CN110930110B (en) Distributed flow monitoring method and device, storage medium and electronic equipment
JP2001005692A (en) Computer system, its maintenance and management system, and method for informing of fault
CN111190761A (en) Log output method and device, storage medium and electronic equipment
KR20130029250A (en) Method and apparatus for requesting examination and fault detection
WO2019129196A1 (en) Fault-tolerant control method, system and device for electric motor controller, and storage medium
CN110569163A (en) method, device, equipment and medium for monitoring health state of cloud host in telescopic group
CN105786865B (en) Fault analysis method and device for retrieval system
CN110908701B (en) Firmware version switching method and device, storage medium and electronic equipment
US10089200B2 (en) Computer apparatus and computer mechanism
CN113079059A (en) Network state monitoring method and device, electronic equipment and readable storage medium
CN108648298B (en) Message processing method and device, storage medium and electronic equipment
US9330030B2 (en) Bridge decoder for a vehicle infotainment system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant