CN115529261A - Multi-BMC communication method, device, equipment and storage medium - Google Patents

Multi-BMC communication method, device, equipment and storage medium Download PDF

Info

Publication number
CN115529261A
CN115529261A CN202211059411.9A CN202211059411A CN115529261A CN 115529261 A CN115529261 A CN 115529261A CN 202211059411 A CN202211059411 A CN 202211059411A CN 115529261 A CN115529261 A CN 115529261A
Authority
CN
China
Prior art keywords
bmc
slave
communication
heartbeat request
master
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211059411.9A
Other languages
Chinese (zh)
Inventor
韩利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202211059411.9A priority Critical patent/CN115529261A/en
Publication of CN115529261A publication Critical patent/CN115529261A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters

Abstract

The application relates to a multi-BMC communication method, device, equipment and storage medium. The main technical scheme comprises: the method comprises the steps that a master BMC sends heartbeat request information to a plurality of slave BMCs in turn, whether communication between the master BMC and the slave BMCs is established or not is judged based on response information returned by the slave BMCs according to the heartbeat request information, when communication between the master BMC and the slave BMCs is established and communication is abnormal, communication is not disconnected immediately, heartbeat request information for preset times is sent to the slave BMCs, and communication anti-shake is achieved. And determining the communication state of the master BMC and the slave BMC based on response information returned by the slave BMC according to the heartbeat request information received each time. The method and the device can improve the communication stability of the multiple BMCs, so that the effectiveness and timeliness of data interaction between the multiple BMCs are guaranteed.

Description

Multi-BMC communication method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer communication technologies, and in particular, to a multi-BMC communication method, apparatus, device, and storage medium.
Background
The server is the key of data storage, network management and calculation, and stable operation is the premise of good operation of the whole system. Aiming at the complexity of the current business scene and the rapid development of new technology, a system of a single (Baseboard Management Controller, BMC) Baseboard Management Controller cannot well complete monitoring and Management work, and a mode of mutual cooperation of multiple BMCs is developed accordingly.
The multiple BMCs cooperate with each other to monitor and manage the whole server, so that the problem of overhigh load of a single BMC can be avoided, and the overall management and monitoring efficiency is effectively improved. However, when multiple BMCs cooperate with each other, the communication stability between the BMCs is poor, which affects data interaction between the multiple BMCs, and thus stable operation of the server cannot be guaranteed.
Disclosure of Invention
Based on this, the application provides a multi-BMC communication method, device, equipment and storage medium to improve the stability of multi-BMC communication.
In a first aspect, a multi-BMC communication method is provided, and the method includes:
the master BMC sends heartbeat request information to the plurality of slave BMCs in turn;
judging whether the master BMC and the slave BMC establish communication or not based on response information returned by the slave BMC according to the heartbeat request information;
when communication between the master BMC and the slave BMC is established and communication is abnormal, heartbeat request information of preset times is sent to the slave BMC;
and determining the communication state between the master BMC and the slave BMC based on response information returned by the slave BMC according to the heartbeat request information received each time.
According to an implementation manner in the embodiment of the present application, determining the communication state between the master BMC and the slave BMC based on the response information returned by the slave BMC according to the heartbeat request information received each time includes:
according to the response information, determining heartbeat request results of the master BMC and the slave BMC, wherein the heartbeat request results comprise request failure and request success;
and when the heartbeat request result is that the number of times of request failure is smaller than the number of times of request success of the heartbeat request result, and the heartbeat request result of the response information corresponding to the last heartbeat request information is that the request is successful, or when the heartbeat request result of the response information corresponding to each heartbeat request information is that the request is successful, determining that the master BMC and the slave BMC are normally communicated.
According to an implementation manner in the embodiment of the present application, determining the communication state between the master BMC and the slave BMC based on the response information returned by the slave BMC according to the heartbeat request information received each time includes:
and when the heartbeat request result is that the number of times of request failure is greater than the number of times of request success, or when the heartbeat request result of response information corresponding to the heartbeat request information is that the request fails every time, determining that the communication between the master BMC and the slave BMC is abnormal.
According to one implementation manner in the embodiment of the present application, the method further includes:
when the communication between the master BMC and the slave BMC is abnormal, heartbeat request information is sent to the slave BMC at intervals of a preset time interval;
judging whether the master BMC and the slave BMC establish communication or not based on response information returned by the slave BMC according to the heartbeat request information;
if the communication between the master BMC and the slave BMC is not established within the preset time range, determining the slave BMC as the abnormal slave BMC;
alarm information about the exception from the BMC is sent to the user.
According to one implementation manner in the embodiment of the present application, the method further includes: after the alarm information is sent to the user, heartbeat request information is sent to the slave BMCs in turn, so that the communication is automatically established after the slave BMCs have overhauled the abnormity by the user.
According to one implementation manner in the embodiment of the present application, the response information includes status bit information; based on response information returned by the slave BMC according to the heartbeat request information, whether the master BMC and the slave BMC establish communication is judged, and the method comprises the following steps:
and judging whether the master BMC and the slave BMC establish communication or not based on the status bit information returned by the slave BMC according to the heartbeat request information.
According to one implementation manner in the embodiment of the present application, the method further includes:
and when the state bit information is abnormal, sending heartbeat request information to the slave BMC.
In a second aspect, a multi-BMC communication device is provided, the device including:
the sending module is used for sending heartbeat request information to the plurality of slave BMCs by the master BMC in turn;
the judging module is used for judging whether the master BMC and the slave BMC establish communication or not based on response information returned by the slave BMC according to the heartbeat request information;
the sending module is also used for sending heartbeat request information of preset times to the slave BMC when the master BMC establishes communication with the slave BMC and communication is abnormal;
and the determining module is used for determining the communication state of the master BMC and the slave BMC based on the response information returned by the slave BMC according to the heartbeat request information received each time.
In a third aspect, a computer device is provided, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores computer instructions executable by the at least one processor to enable the at least one processor to perform the method referred to in the first aspect above.
In a fourth aspect, a computer-readable storage medium is provided, on which computer instructions are stored, wherein the computer instructions are configured to cause a computer to perform the method according to the first aspect.
According to the technical content provided by the embodiment of the application, a master BMC sends heartbeat request information to a plurality of slave BMCs in turn, whether communication is established between the master BMC and the slave BMC is judged based on response information returned by the slave BMC according to the heartbeat request information, when communication is abnormal after the master BMC and the slave BMC establish communication, communication is not disconnected immediately, the master BMC sends heartbeat request information of preset times to the slave BMC, communication jitter prevention is achieved, based on the response information returned by the slave BMC according to the heartbeat request information received every time, the communication state of the master BMC and the slave BMC is determined, the stability of communication of the multiple BMCs is improved, and therefore validity and timeliness of data interaction between the multiple BMCs are guaranteed.
Drawings
FIG. 1 is a flow diagram illustrating a method for multi-BMC communication in one embodiment;
FIG. 2 is a flow chart illustrating a method for multi-BMC communication in another embodiment;
FIG. 3 is a block diagram of a multi-BMC communication device in one embodiment;
FIG. 4 is a schematic block diagram of a computer apparatus in one embodiment.
Detailed Description
The present application will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The embodiment of the application provides a multi-BMC communication method, device and equipment and a computer storage medium. First, a multi-BMC communication method provided in an embodiment of the present application is described below.
Fig. 1 shows a flowchart of a multi-BMC communication method according to an embodiment of the present application. As shown in fig. 1, the method may include the steps of:
s110, the master BMC sends heartbeat request information to the plurality of slave BMCs in turn.
The multi-BMC server system can effectively reduce the load of a single BMC, and improve the management efficiency and the running stability of the server. The multi-BMC server system comprises a master BMC and a plurality of slave BMCs, wherein each slave BMC is connected with a plurality of components, and the components can be network cards, power supplies and the like. Each slave BMC monitors the health condition of the components connected with the slave BMC, and the master BMC monitors the working condition of the slave BMC to acquire the health condition of the components. The multiple BMCs jointly undertake the monitoring function of the health states of different components of the whole server system, and can manage and monitor the operation and the health states of all the components more accurately and finely.
After the system is powered on, the master BMC sends heartbeat request information to the plurality of slave BMCs in a polling mode, wherein the heartbeat request information comprises information such as temperature, voltage, current, alarm and health state which are requested to be acquired. And after receiving the heartbeat request information from the BMC, responding according to the heartbeat request information, generating response information, and returning to the main BMC.
And S120, judging whether the master BMC and the slave BMC establish communication or not based on response information returned by the slave BMC according to the heartbeat request information.
If the master BMC and the slave BMC establish communication, the response information comprises response information of the request content of the heartbeat request information, and if the master BMC and the slave BMC do not establish communication, the response information comprises state information of failure in execution of the heartbeat command, or the heartbeat request information of the master BMC is not responded.
And according to response information returned by the slave BMC according to the heartbeat request information, whether the communication between the master BMC and the slave BMC is established can be known.
And S130, when communication between the master BMC and the slave BMC is established and communication is abnormal, sending heartbeat request information of preset times to the slave BMC.
On the basis that the master BMC and the slave BMC establish communication, the master BMC initiates a heartbeat request at intervals, and the slave BMC replies to the master BMC in time after receiving the request. After receiving the response message, the master BMC may repeat S120 to determine whether the current communication is normal. If the communication is normal, the master BMC sends time data, configuration information and the like to the slave BMC to synchronize the master BMC with the slave BMC.
If the communication between the master BMC and the slave BMC is abnormal, the master BMC cannot immediately disconnect the communication with the slave BMC. In order to avoid instant communication abnormality and influence on data interaction of the whole system, heartbeat request information of preset times is sent to the slave BMC. According to actual operation, the experiment effect is good when the preset times are 8-12, wherein the effect is best when the preset times are 10.
And S140, determining the communication state of the master BMC and the slave BMC based on response information returned by the slave BMC according to the heartbeat request information received each time.
The slave BMC responds to return a response message or no response every time the master BMC sends a heartbeat request message to the slave BMC. The master BMC can judge whether the communication state of the master BMC and the slave BMC is normal or abnormal according to the response information.
It can be seen that, in the embodiment of the present application, heartbeat request information is sent to a plurality of slave BMCs in turn by a master BMC, whether communication is established between the master BMC and the slave BMCs is determined based on response information returned by the slave BMCs according to the heartbeat request information, when communication is established between the master BMC and the slave BMCs and communication is abnormal, communication is not immediately disconnected, but heartbeat request information of preset times is sent to the slave BMCs, so that communication jitter is prevented, and based on response information returned by the slave BMCs according to heartbeat request information received every time, the communication state between the master BMC and the slave BMCs is determined, so that the stability of communication between the multiple BMCs is improved, and thus the validity and timeliness of data interaction between the multiple BMCs are ensured.
The steps in the above-described process flow are described in detail below. First, the above-mentioned S140, namely, "determining the communication status between the master BMC and the slave BMC based on the response information returned by the slave BMC according to the heartbeat request information received each time" will be described in detail with reference to the embodiments.
As an implementation manner, according to the response information, determining heartbeat request results of the master BMC and the slave BMC, wherein the heartbeat request results comprise request failure and request success;
and when the heartbeat request result is that the times of request failure are smaller than the times of request success of the heartbeat request result and the heartbeat request result of the response information corresponding to the last heartbeat request information is that the request is successful, or when the heartbeat request result of the response information corresponding to each heartbeat request information is that the request is successful, determining that the master BMC and the slave BMC are normally communicated.
And if the response information is empty, or the response information comprises state information of heartbeat command execution failure, determining that the heartbeat request results of the master BMC and the slave BMC are request failure, and if the response information comprises response information of the request content of the heartbeat request information, determining that the heartbeat request results of the master BMC and the slave BMC are request success.
Taking the preset number of times as 10 times as an example, when the number of times of request failure is smaller than the number of times of request success, and the heartbeat request result of the response message corresponding to the last heartbeat request message is the request success, that is, the probability of request success is more than 50%, and the last heartbeat request is successful, it is determined that the master BMC and the slave BMC are in normal communication.
And determining that the master BMC and the slave BMC are in normal communication, and the master BMC keeps the current normal data interaction.
As an implementation manner, when the heartbeat request result is that the number of times of request failure is greater than the number of times of request success, or when the heartbeat request result of the response message corresponding to the heartbeat request message is a request failure each time, it is determined that the master BMC and the slave BMC are abnormal in communication.
If the probability of success of the request is less than 50%, or if 10 heartbeat requests fail, then it is determined that the master BMC is out of communication with the slave BMC.
As an implementation manner, if the probability of success of the request is lower than 50%, or the heartbeat requests of the preset times fail, the heartbeat request information is sent to the slave BMC at preset time intervals;
judging whether the master BMC and the slave BMC establish communication or not based on response information returned by the slave BMC according to the heartbeat request information;
if the communication between the master BMC and the slave BMC is not established within the preset time range, the slave BMC is determined to be the abnormal slave BMC;
alarm information about the exception from the BMC is sent to the user.
After the communication between the master BMC and the slave BMC is abnormal, the master BMC continues to send heartbeat request information to the slave BMC at intervals of a preset time period, and whether the master BMC and the slave BMC establish communication is continuously judged according to response information returned by the slave BMC according to the heartbeat request information until the communication is normally established again. The preset time period may be 1 second, 2 seconds, 3 seconds, and the like, but the smaller the value of the preset time period, the better.
The preset time range can be set to be 5 minutes, if the communication between the master BMC and the slave BMC is not established for more than 5 minutes, the slave BMC is determined to be the abnormal slave BMC, and the alarm information about the abnormal slave BMC is generated according to the abnormal situation. And triggering a main BMC communication abnormity alarm mechanism, reporting alarm information to a user so that the user can find abnormity in time for maintenance, recovering the normal operation of the server system as soon as possible, and ensuring the stability of the server system.
As a realizable way, after sending the alarm information to the user, the heartbeat request information is sent to the plurality of slave BMCs in turn, so that the communication is automatically established after the user overhauls the abnormal slave BMCs.
After the master BMC triggers the communication abnormality alarm, the master BMC does not stop sending the heartbeat request, but keeps polling and sending heartbeat request information, and repeatedly executes S110 and S120 until the communication is normally established, and the master BMC releases the alarm and informs the user.
The above step S120, that is, determining whether the master BMC and the slave BMC establish communication based on the response information returned by the slave BMC according to the heartbeat request information, will be described in detail with reference to the embodiment.
As an implementation manner, whether the master BMC and the slave BMC establish communication is determined based on status bit information returned by the slave BMC according to the heartbeat request information.
The response information includes status bit information including a field value for indicating success or failure in execution of the heartbeat command.
If the status bit information includes 0x00, indicating that the master BMC and the slave BMC have established communication, the data interaction is normally performed. If the status bit information includes other abnormal values such as 0xd5 and 0x83, it indicates that the master BMC and the slave BMC do not establish communication, sends heartbeat request information to the slave BMC again, and checks the status of the slave BMC according to the status bit information in the response information returned by the slave BMC until the master BMC and the slave BMC normally communicate. In addition, if the master BMC does not receive the response message from the slave BMC, the master BMC repeatedly sends the heartbeat request message to the slave BMC again.
According to the embodiment of the application, when the master BMC and the slave BMC are abnormal, the master BMC sends heartbeat request information to the slave BMC for multiple times to determine the communication state of the master BMC and the slave BMC, so that the instant communication abnormality is avoided, and the communication anti-shaking is realized. And in a preset time period, sending alarm information to the user by abnormal communication to remind the user of timely maintenance. The master BMC sends heartbeat request information to the slave BMC in a polling mode, and can establish connection in time after abnormal recovery of the slave BMC so as to guarantee validity and timeliness of data interaction among multiple BMCs, stably monitor the health state of the server, stably manage services in real time and further guarantee stable operation of the whole server.
With reference to the implementation manner in the foregoing embodiment, a preferred method flow provided by the embodiment of the present application is described below with reference to fig. 2 by way of example. As shown in fig. 2, the method may include the steps of:
s201, the master BMC sends heartbeat request information to the plurality of slave BMCs in turn.
S202, whether the master BMC and the slave BMC establish communication is judged based on the status bit information returned by the slave BMC according to the heartbeat request information.
If the master BMC and the slave BMC have established communication, S203 is executed, otherwise, S204 is executed.
And S203, sending the time data and the configuration information to the slave BMC.
And S204, sending heartbeat request information of preset times to the slave BMC.
S205, determining the heartbeat request results of the master BMC and the slave BMC based on the response information returned by the slave BMC according to the heartbeat request information received each time.
And S206, when the heartbeat request result is that the times of request failure are smaller than the times of request success of the heartbeat request result, and the heartbeat request result of the response information corresponding to the last heartbeat request information is that the request is successful, or when the heartbeat request result of the response information corresponding to the heartbeat request information each time is that the request is successful, determining that the communication between the master BMC and the slave BMC is normal.
And executing S203 when the communication between the master BMC and the slave BMC is normal.
And S207, when the heartbeat request result is that the frequency of request failure is greater than the frequency of request success, or when the heartbeat request result of response information corresponding to the heartbeat request information is that the request fails, determining that the communication between the master BMC and the slave BMC is abnormal.
And when the communication between the master BMC and the slave BMC is abnormal, executing S208.
And S208, sending heartbeat request information to the slave BMC at intervals of preset time intervals.
S202 is performed based on response information returned from the BMC according to the heartbeat request information. If the master BMC and the slave BMC have established communication, S203 is performed, otherwise, S209 is performed.
And S209, if the communication between the master BMC and the slave BMC is not established within the preset time range, determining the slave BMC as the abnormal slave BMC.
S210, alarm information about the abnormal slave BMC is sent to the user.
After the warning information is transmitted to the user, S201 is continuously performed.
The technical effects achieved by the above-mentioned S201-S210 are the same as the technical effects achieved by the S110-S140, and are not described herein again.
It should be understood that although the various steps in the flow charts of fig. 1-2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in a strict order unless explicitly stated in the application, and may be performed in other orders. Moreover, at least some of the steps in fig. 1-2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least some of the sub-steps or stages of other steps.
Fig. 3 is a schematic structural diagram of a multi-BMC communication device according to an embodiment of the present disclosure, configured to execute the method flows shown in fig. 1 and fig. 2. As shown in fig. 3, the apparatus may include: a sending module 310, a judging module 320 and a determining module 330. The main functions of each component module are as follows:
a sending module 310, configured to send heartbeat request information to a plurality of slave BMCs in turn by the master BMC;
the determining module 320 is configured to determine whether the master BMC and the slave BMC establish communication based on response information returned by the slave BMC according to the heartbeat request information;
the sending module 310 is further configured to send heartbeat request information of a preset number of times to the slave BMC when the master BMC establishes communication with the slave BMC and communication is abnormal;
a determining module 330, configured to determine a communication state between the master BMC and the slave BMC based on response information returned by the slave BMC according to the heartbeat request information received each time.
As an implementation manner, the determining module 330 is specifically configured to determine, according to the response information, heartbeat request results of the master BMC and the slave BMC, where the heartbeat request results include a request failure and a request success;
and when the heartbeat request result is that the times of request failure are smaller than the times of request success of the heartbeat request result and the heartbeat request result of the response information corresponding to the last heartbeat request information is that the request is successful, or when the heartbeat request result of the response information corresponding to each heartbeat request information is that the request is successful, determining that the master BMC and the slave BMC are normally communicated.
As an implementation manner, the determining module 330 is specifically configured to determine that the master BMC and the slave BMC are abnormal in communication when the number of times that the request fails is greater than the number of times that the request succeeds as the result of the heartbeat request, or when the request fails as the result of the heartbeat request of the response information corresponding to the heartbeat request information each time.
As an implementation manner, the sending module 310 is further configured to send heartbeat request information to the slave BMC at preset time intervals when the master BMC and the slave BMC are abnormally communicated;
judging whether the master BMC and the slave BMC establish communication or not based on response information returned by the slave BMC according to the heartbeat request information;
if the communication between the master BMC and the slave BMC is not established within the preset time range, determining the slave BMC as the abnormal slave BMC;
alarm information about the exception from the BMC is sent to the user.
As an implementation manner, the sending module 310 is further configured to send heartbeat request messages to the multiple slave BMCs in turn after sending the warning message to the user, so that the user automatically establishes communication after the abnormal slave BMCs are repaired.
As one way of accomplishing this, the response information includes status bit information; the determining module 320 is specifically configured to:
and judging whether the master BMC and the slave BMC establish communication or not based on the status bit information returned by the slave BMC according to the heartbeat request information.
As an implementation manner, the sending module 310 is further configured to send the heartbeat request message to the slave BMC when the status bit information is abnormal.
It will be understood that it is not necessary for any method or article of manufacture to achieve all of the above-described advantages in connection with the practice of the present application
The same and similar parts among the various embodiments described above can be referred to each other, and each embodiment is described with emphasis on differences from other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
According to an embodiment of the present application, a computer device and a computer-readable storage medium are also provided.
As shown in fig. 4, a block diagram of a computer device according to an embodiment of the present application is shown. Computer apparatus is intended to represent various forms of digital computers or mobile devices. Which may include desktop computers, laptop computers, workstations, personal digital assistants, servers, mainframe computers, and other suitable computers. The mobile device may include a tablet, smartphone, wearable device, and the like.
As shown in fig. 4, the apparatus 400 includes a computing unit 401, a ROM 402, a RAM 403, a bus 404, and an input/output (I/O) interface 405, the computing unit 401, the ROM 402, and the RAM 403 being connected to each other via the bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
The computing unit 401 may perform the following steps in an embodiment of the method according to the present application, according to computer instructions stored in a Read Only Memory (ROM) 402 or loaded from a storage unit 408 into a Random Access Memory (RAM) 403:
the method comprises the steps that a master BMC sends heartbeat request information to a plurality of slave BMCs in turn, whether communication between the master BMC and the slave BMCs is established is judged based on response information returned by the slave BMCs according to the heartbeat request information, when communication between the master BMC and the slave BMCs is established and communication is abnormal, heartbeat request information of preset times is sent to the slave BMCs, and the communication state of the master BMC and the slave BMCs is determined based on response information returned by the slave BMCs according to the heartbeat request information received each time.
As an implementation manner, determining the communication state between the master BMC and the slave BMC based on the response information returned by the slave BMC according to the heartbeat request information received each time includes: and determining heartbeat request results of the master BMC and the slave BMC according to the response information, wherein the heartbeat request results comprise request failure and request success, and when the heartbeat request result is that the number of times of the request failure is smaller than that of the request success, and the heartbeat request result of the response information corresponding to the last heartbeat request information is that the request is successful, or when the heartbeat request result of the response information corresponding to the heartbeat request information is that the request is successful, determining that the master BMC and the slave BMC are in normal communication.
As one way to implement, determining the communication status between the master BMC and the slave BMC based on the response information returned by the slave BMC according to the heartbeat request information received each time includes: and when the heartbeat request result is that the number of times of request failure is greater than the number of times of request success, or when the heartbeat request result of the response information corresponding to the heartbeat request information is that the request fails every time, determining that the communication between the master BMC and the slave BMC is abnormal.
As an implementable manner, the method further comprises: when the communication between the master BMC and the slave BMC is abnormal, heartbeat request information is sent to the slave BMC at intervals of a preset time interval, whether the master BMC and the slave BMC establish communication is judged based on response information returned by the slave BMC according to the heartbeat request information, if the communication between the master BMC and the slave BMC is not established over a preset time range, the slave BMC is determined to be abnormal slave BMC, and alarm information about the abnormal slave BMC is sent to a user.
As an implementable manner, the method further comprises: after the alarm information is sent to the user, heartbeat request information is sent to the slave BMCs in turn, so that the communication is automatically established after the slave BMCs have overhauled the abnormity by the user.
As an implementation manner, the response information includes status bit information, and the determining, based on the response information returned by the slave BMC according to the heartbeat request information, whether the master BMC and the slave BMC establish communication includes: and judging whether the master BMC and the slave BMC establish communication or not based on the status bit information returned by the slave BMC according to the heartbeat request information.
As an implementable manner, the method further comprises: and when the state bit information is abnormal, sending heartbeat request information to the slave BMC.
Computing unit 401 may be a variety of general and/or special purpose processing components with processing and computing capabilities. The computing unit 401 may include, but is not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. In some embodiments, the methods provided by embodiments of the present application may be implemented as a computer software program tangibly embodied in a computer-readable storage medium, such as storage unit 408.
The RAM 403 may also store various programs and data required for the operation of the device 400. Part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 802 and/or the communication unit 409.
An input unit 406, an output unit 407, a storage unit 408 and a communication unit 409 in the device 400 may be connected to the I/O interface 405. The input unit 406 may be, for example, a keyboard, a mouse, a touch screen, a microphone, or the like; the output unit 407 may be, for example, a display, a speaker, an indicator lamp, or the like. The device 400 is capable of exchanging information, data, etc. with other devices via the communication unit 409.
It should be noted that the device may also include other components necessary to achieve proper operation. It may also contain only the components necessary to implement the solution of the present application and not necessarily all of the components shown in the figures.
Various implementations of the systems and techniques described here can be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof.
Computer instructions for implementing the methods of the present application may be written in any combination of one or more programming languages. These computer instructions may be provided to the computing unit 401 such that the computer instructions, when executed by the computing unit 401 such as a processor, cause the steps involved in embodiments of the method of the present application to be performed.
The computer-readable storage medium provided herein may be a tangible medium that may contain, or store, computer instructions for performing the steps involved in the method embodiments of the present application. The computer readable storage medium may include, but is not limited to, storage media in the form of electronic, magnetic, optical, electromagnetic, and the like.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A multi-BMC communication method, the method comprising:
the master BMC sends heartbeat request information to the plurality of slave BMCs in turn;
judging whether the master BMC and the slave BMC establish communication or not based on response information returned by the slave BMC according to the heartbeat request information;
when communication between the master BMC and the slave BMC is established and communication is abnormal, heartbeat request information of preset times is sent to the slave BMC;
and determining the communication state of the master BMC and the slave BMC based on response information returned by the slave BMC according to the heartbeat request information received each time.
2. The method of claim 1, wherein the determining the communication status of the master BMC with the slave BMC based on the response information returned by the slave BMC from each received heartbeat request message comprises:
determining heartbeat request results of the master BMC and the slave BMC according to the response information, wherein the heartbeat request results comprise request failure and request success;
and when the heartbeat request result is that the number of times of request failure is smaller than the number of times of request success of the heartbeat request result and the heartbeat request result of the response information corresponding to the last heartbeat request information is that the request is successful, or when the heartbeat request result of the response information corresponding to each heartbeat request information is that the request is successful, determining that the master BMC and the slave BMC are in normal communication.
3. The method of claim 2, wherein determining the communication status of the master BMC and the slave BMC based on the response information returned by the slave BMC from each received heartbeat request message comprises:
and when the heartbeat request result is that the number of times of request failure is greater than the number of times of request success, or when the heartbeat request result of response information corresponding to the heartbeat request information is that the request fails every time, determining that the communication between the master BMC and the slave BMC is abnormal.
4. The method of claim 3, further comprising:
when the communication between the master BMC and the slave BMC is abnormal, heartbeat request information is sent to the slave BMC at intervals of a preset time interval;
judging whether the master BMC and the slave BMC establish communication or not based on response information returned by the slave BMC according to the heartbeat request information;
if the communication between the master BMC and the slave BMC is not established within the preset time range, determining the slave BMC as an abnormal slave BMC;
sending an alert message to a user regarding the exception from the BMC.
5. The method of claim 4, further comprising: after the alarm information is sent to the user, heartbeat request information is sent to the plurality of slave BMCs in turn, so that the communication is automatically established after the abnormal slave BMCs are repaired by the user.
6. The method of claim 1, wherein the response information comprises status bit information; the step of judging whether the master BMC and the slave BMC establish communication based on the response information returned by the slave BMC according to the heartbeat request information comprises the following steps:
and judging whether the master BMC and the slave BMC establish communication or not based on the status bit information returned by the slave BMC according to the heartbeat request information.
7. The method of claim 6, further comprising:
and when the status bit information is abnormal, sending heartbeat request information to the slave BMC.
8. A multi-BMC communication device, the device comprising:
the sending module is used for sending heartbeat request information to the plurality of slave BMCs by the master BMC in turn;
the judging module is used for judging whether the master BMC and the slave BMC establish communication or not based on response information returned by the slave BMC according to the heartbeat request information;
the sending module is further configured to send heartbeat request information of preset times to the slave BMC when communication between the master BMC and the slave BMC is established and communication is abnormal;
and the determining module is used for determining the communication state of the master BMC and the slave BMC based on response information returned by the slave BMC according to the heartbeat request information received each time.
9. A computer device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores computer instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
10. A computer-readable storage medium having computer instructions stored thereon for causing a computer to perform the method of any one of claims 1 to 7.
CN202211059411.9A 2022-08-31 2022-08-31 Multi-BMC communication method, device, equipment and storage medium Pending CN115529261A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211059411.9A CN115529261A (en) 2022-08-31 2022-08-31 Multi-BMC communication method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211059411.9A CN115529261A (en) 2022-08-31 2022-08-31 Multi-BMC communication method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115529261A true CN115529261A (en) 2022-12-27

Family

ID=84697104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211059411.9A Pending CN115529261A (en) 2022-08-31 2022-08-31 Multi-BMC communication method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115529261A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101631011A (en) * 2008-07-16 2010-01-20 中国科学院声学研究所 Hotspare method and system suitable for device for processing and forwarding IP media stream in real time
CN102571452A (en) * 2012-02-20 2012-07-11 华为技术有限公司 Multi-node management method and system
US20160099886A1 (en) * 2014-10-07 2016-04-07 Dell Products, L.P. Master baseboard management controller election and replacement sub-system enabling decentralized resource management control
CN109471770A (en) * 2018-09-11 2019-03-15 华为技术有限公司 A kind of method for managing system and device
US20200099584A1 (en) * 2018-09-21 2020-03-26 Cisco Technology, Inc. Autonomous datacenter management plane
CN113905055A (en) * 2021-09-11 2022-01-07 苏州浪潮智能科技有限公司 Method, device, equipment and readable medium for synchronous data transmission between BMCs
CN114116280A (en) * 2021-11-11 2022-03-01 苏州浪潮智能科技有限公司 Interactive BMC self-recovery method, system, terminal and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101631011A (en) * 2008-07-16 2010-01-20 中国科学院声学研究所 Hotspare method and system suitable for device for processing and forwarding IP media stream in real time
CN102571452A (en) * 2012-02-20 2012-07-11 华为技术有限公司 Multi-node management method and system
US20160099886A1 (en) * 2014-10-07 2016-04-07 Dell Products, L.P. Master baseboard management controller election and replacement sub-system enabling decentralized resource management control
CN109471770A (en) * 2018-09-11 2019-03-15 华为技术有限公司 A kind of method for managing system and device
US20200099584A1 (en) * 2018-09-21 2020-03-26 Cisco Technology, Inc. Autonomous datacenter management plane
CN113905055A (en) * 2021-09-11 2022-01-07 苏州浪潮智能科技有限公司 Method, device, equipment and readable medium for synchronous data transmission between BMCs
CN114116280A (en) * 2021-11-11 2022-03-01 苏州浪潮智能科技有限公司 Interactive BMC self-recovery method, system, terminal and storage medium

Similar Documents

Publication Publication Date Title
CN107480014B (en) High-availability equipment switching method and device
US9917741B2 (en) Method and system for processing network activity data
CN109150662B (en) Message transmission method, distributed system, device, medium, and unmanned vehicle
CN112437001B (en) Method and device for guaranteeing reliable delivery and consumption of messages
CN112527567A (en) System disaster tolerance method, device, equipment and storage medium
CN111045811A (en) Task allocation method and device, electronic equipment and storage medium
CN109257396B (en) Distributed lock scheduling method and device
CN111541762B (en) Data processing method, management server, device and storage medium
CN115964153A (en) Asynchronous task processing method, device, equipment and storage medium
WO2018137114A1 (en) Device monitoring method, server, and monitoring system
CN112631756A (en) Distributed regulation and control method and device applied to space flight measurement and control software
CN115529261A (en) Multi-BMC communication method, device, equipment and storage medium
CN115098294B (en) Abnormal event processing method, electronic equipment and management terminal
CN113900855B (en) Active hot start method, system and device for abnormal state of switch
CN110752972A (en) Network card state monitoring method, device, equipment and medium
CN109995597A (en) A kind of network equipment failure processing method and processing device
CN115396296A (en) Service processing method and device, electronic equipment and computer readable storage medium
CN113986135B (en) Method, device, equipment and storage medium for processing request
TW201344403A (en) Power supply management method
CN110659184B (en) Health state checking method, device and system
CN117395263B (en) Data synchronization method, device, equipment and storage medium
CN114296979A (en) Method and device for detecting abnormal state of Internet of things equipment
CN116016265B (en) Message all-link monitoring method, device, system, equipment and storage medium
CN117609194A (en) Cloud database processing method and device, electronic equipment and storage medium
CN115174447B (en) Network communication method, device, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination