CN113254247A - Server BMC I2C exception recovery method and related device - Google Patents

Server BMC I2C exception recovery method and related device Download PDF

Info

Publication number
CN113254247A
CN113254247A CN202110528848.1A CN202110528848A CN113254247A CN 113254247 A CN113254247 A CN 113254247A CN 202110528848 A CN202110528848 A CN 202110528848A CN 113254247 A CN113254247 A CN 113254247A
Authority
CN
China
Prior art keywords
state
state machine
bmc
bus
exception
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110528848.1A
Other languages
Chinese (zh)
Inventor
曲勇
李永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yingxin Computer Technology Co Ltd
Original Assignee
Shandong Yingxin Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yingxin Computer Technology Co Ltd filed Critical Shandong Yingxin Computer Technology Co Ltd
Priority to CN202110528848.1A priority Critical patent/CN113254247A/en
Publication of CN113254247A publication Critical patent/CN113254247A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0745Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in an input/output transactions management context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Abstract

The application relates to a server BMC I2C abnormity recovery method, which uses a counter to count the state duration of an I2C state machine, and comprises the following steps: s1, acquiring the current I2C state machine state of each I2C bus according to the time sequence; s2, comparing whether the current I2C state machine state is the same as the last I2C state machine state, if not, executing S3, and if so, executing S4; s3, updating the last I2C state machine state as the current I2C state machine state, resetting the counter and executing the step S1; s4, judging whether the current I2C state machine state is an Idle state, if so, executing the step S3, and if not, executing the step S5; s5, judging whether the counter is overtime, if yes, executing step S6, otherwise executing step S1; s6, recording abnormal logs, resetting BMC I2C, and proceeding to step S1. The method and the device can effectively and comprehensively detect and repair the I2C hang-up fault.

Description

Server BMC I2C exception recovery method and related device
Technical Field
The application relates to the field of I2C exception handling, in particular to a server BMC I2C exception recovery method and a related device.
Background
The I2C bus is the most commonly used serial bus in the server BMC embedded system, and the BMC monitors other hardware devices through I2C communication.
The I2C protocol has the advantages of good compatibility, less pin occupation and simple chip implementation, and although the I2C protocol is simple, the problems in the practical use process are many, and the most common is that the I2C bus is hung up. The I2C bus is a multi-master multi-slave serial synchronous communication bus composed of a clock line SCL and a data line SDA, according to the I2C protocol specification, the SCL clock line and the SDA data line of a device accessing I2C are both bidirectional open-drain structures, and are pulled to a logic high level through a pull-up resistor on the bus, generally, the SDA of I2C is changed only when SCL is low level, and is maintained when SCL is high level, and the I2C bus issues a bus start condition (SCL is high, SDA is low by high) and a stop condition (SCL is high, SDA is high by low) by a host. As long as any device on the I2C bus pulls SDA or SCL low and does not release the bus, communication on the entire bus is suspended, which is called I2C bus hang-up. The I2C bus deadlock is mostly caused by the master device resetting abnormally or the slave device resetting abnormally, and therefore, we can set the I2C deadlock self-recovery logic by detecting the SDA or SCL low state, and usually generate 9 clock pulses for the master device to control the SCL clock line or directly reset the I2C controller. However, the status of the I2C master device is abnormal due to some exception handling of the hardware link or the driver, which is different from the I2C bus hang-up scenario, in which the SDA or SCL is not continuously low, it cannot be determined that I2C is abnormal, and it cannot trigger the I2C hang-up self-recovery logic to perform abnormal self-recovery, and in this case, the master device needs to be manually reset.
Disclosure of Invention
In order to solve the technical problem or at least partially solve the technical problem, in one aspect, the present application provides a method for recovering an exception of a server BMC I2C, where the method includes counting a state duration of an I2C state machine with a counter, and includes:
s1, acquiring the current I2C state machine state of each I2C bus according to the time sequence;
s2, comparing whether the current I2C state machine state is the same as the last I2C state machine state, if not, executing S3, and if so, executing S4;
s3, updating the last I2C state machine state as the current I2C state machine state, resetting the counter and executing the step S1;
s4, judging whether the current I2C state machine state is an Idle state, if so, executing the step S3, and if not, executing the step S5;
s5, judging whether the counter is overtime, if yes, executing step S6, otherwise executing step S1;
s6, recording abnormal logs, resetting BMC I2C, and proceeding to step S1.
Further, the counter uses a clock independent of BMC I2C as a reference clock with a clock frequency higher than the I2C clock frequency to enable the counter to clock the I2C state machine for minimum duration.
Furthermore, the I2C status register of the BMC stores bits indicating the status of the I2C state machine, the bit values indicating the status of the I2C state machine correspond to the status of the I2C state machine one by one, and the bit values indicating the status of the I2C state machine are parsed to obtain the status of the I2C state machine.
Further, polling the I2C bus or periodically collecting the I2C status register of the BMC of each I2C bus realizes acquiring the I2C state machine state.
Furthermore, the polling period for polling the I2C buses or the collection period for periodically collecting the I2C buses is less than the minimum value of the state duration of each state of the I2C state machine in the normal state.
Further, a timeout threshold is set, and the technology of the counter is compared with the timeout threshold to judge whether the counter is timed out, wherein the timeout threshold is larger than the maximum value of the duration of each state of the I2C state machine non-Idle in the normal state.
Further, logging the exception log content includes the I2C bus where the exception occurred and logging the time the exception occurred.
Furthermore, the content in the abnormal log is counted, the average frequency of the abnormal occurrence of each I2C bus is calculated, a frequency threshold value is set, whether the average frequency of the abnormal occurrence of each I2C bus exceeds the frequency threshold value or not is compared, and alarm information for informing a user is generated if the average frequency exceeds the frequency threshold value.
In another aspect, the present application provides an I2C exception self-healing BMC, comprising:
the acquisition module is used for acquiring data of the BMC I2C status register according to a time sequence;
the first judgment module is used for judging whether the state of the I2C state machine changes or not;
the counting module is used for timing the state duration time of the non-Idle state according to the result of the first judging module;
the second judgment module judges whether the time recorded by the counting module is overtime or not;
the analysis module is used for analyzing the bit which represents the state of the I2C state machine in the BMC I2C state register to obtain the state of the I2C state machine, the analysis process is carried out according to the mapping relation between the state of the I2C state machine adopted by BMC and the bit which represents the state of the state machine in the I2C state register, and the analysis module analyzes the overtime state of the I2C state machine according to the bit which represents the state of the I2C state machine;
and the execution module records an abnormal log and resets the BMC I2C when the second judgment module judges the timeout, the recorded abnormal log comprises an I2C bus with an abnormality, and the I2C state machine state of the card and the time of the abnormality are analyzed by the analysis module.
Preferably, the I2C abnormal self-recovery BMC provided by the present application further includes a statistics module, where the statistics module obtains the content in the abnormal log and counts the average frequency of each I2C bus abnormality;
and the comparison module compares the average frequency with a set frequency threshold, and if the average frequency exceeds the frequency threshold, the communication module sends alarm information.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:
the server BMC I2C exception recovery method provided by the application embodiment counts the duration of the state of the I2C state machine through the counter; acquiring the current I2C state machine state by adopting a mode of acquiring data from an I2C state register of the BMC according to time sequence, judging whether the state of the last I2C state machine is continuously ended or not by comparing whether the current I2C state machine state acquired by the time sequence is changed with the state of the last I2C state machine, if the state of the last I2C state machine is ended, resetting the counting of the counter, and counting the duration time of the current I2C state machine state. For the non-Idle state of the I2C state machine, whether the state duration time is overtime is judged by setting a overtime threshold, if any non-Idle state of the I2C state machine is overtime, the I2C bus which is overtime is considered to be hung, an abnormal log is recorded, the BMC I2C is reset to solve the problem of hanging, and the server BMC I2C is automatically recovered when abnormal. For the continuous Idle state, the method of resetting the counter when the Idle state is continuously acquired every time is adopted, so that the counting of the counter in the continuous Idle state can never be overtime, and the condition of false alarm abnormity is avoided.
Compared with the prior art that whether the I2C is hung up is judged by using the level of the data line SDA or the clock line SCL, whether the I2C bus is hung up is judged by using the duration of the non-Idle state of the I2C state machine, so that the condition that the I2C is hung up due to the abnormal state of the I2C main equipment caused by the abnormal hardware link or the abnormal driver can be determined, the determined abnormal range is larger, and a better recovery effect is achieved on the I2C abnormal condition which cannot be judged as the I2C is hung up by measuring the level of the data line SDA or the clock line SCL.
After the reset is repaired, since the detection process of the I2C abnormity is still in effect, self-checking effect can be achieved on whether the reset repairs the abnormity. In general, the reset has a good repairing effect on the I2C exception caused by the driver problem, but cannot repair the I2C exception caused by the hardware link problem.
In addition, the method and the device record the abnormality of each path of I2C and the time of the abnormality occurrence by using the log, and the user can clearly know the abnormality occurrence condition of each path of I2C according to the log; the method and the device also calculate the average frequency of abnormal occurrence of each path of I2C by counting the log content, and if the average frequency exceeds a frequency threshold, the method and the device can generate corresponding alarm information to inform a user, do not solve the problem in self-reset repair, and can enable the user to intervene in repair in time by the mode. Especially for I2C abnormity caused by hardware link problem which can not be solved by reset, alarm information can be timely notified to users.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a schematic diagram of an I2C bus architecture;
FIG. 2 is a waveform diagram of a data line SDA and a clock line SCL in an I2C bus communication process;
FIG. 3 is a state transition diagram of an I2C state machine that implements I2C communications in accordance with the I2C protocol;
fig. 4 is a flowchart of an exception recovery method for the server BMC I2C according to an embodiment of the present application;
FIG. 5 is a flowchart of analyzing contents of an exception log according to an embodiment of the present application;
fig. 6 is a BMC diagram of I2C abnormal self-recovery provided in this embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiments of the present application will be described below with reference to the drawings. FIG. 1 is a schematic diagram of an I2C bus architecture; FIG. 2 is a waveform diagram of a data line SDA and a clock line SCL in an I2C bus communication process; FIG. 3 is a state transition diagram of an I2C state machine that implements I2C communications in accordance with the I2C protocol; fig. 4 is a flowchart of an exception recovery method for the server BMC I2C according to an embodiment of the present application; FIG. 5 is a flowchart of analyzing contents of an exception log according to an embodiment of the present application; fig. 6 is a BMC diagram of I2C abnormal self-recovery provided in this embodiment.
First, the knowledge about I2C will be described.
Referring to fig. 1, the I2C serial bus includes two signal lines, one bi-directional data line SDA and the other bi-directional clock line SCL. The data interface of the devices connected to the I2C bus is connected to the data line SDA and the clock interface of the devices is connected to the clock line SCL. The data line SDA and the clock line SCL are both of a bidirectional open-drain structure, a pull-up resistor on the bus is pulled to a logic high level, and a logic first level is generated by the pull-down of a data interface or a clock interface of the master device or the slave device.
The master device connected to the I2C bus is responsible for generating conditional and end conditions for data, sending address commands, and generating clock signals, and the devices that the master device controls to communicate are slave devices. Both the slave and master devices are capable of sending and receiving messages.
Referring to fig. 2, in data transmission using the I2C bus, the transmitted content includes a start condition (the clock line SCL is high, and the data line SDA changes from high to low), which follows the start condition, 8 bits of slave device addressing data, the slave device addressing data includes 7 bits of address bits and one bit of read/write setting bit, the read/write setting bit is write 0 or 1, which determines whether the data is transmitted from the master device to the slave device or from the slave device to the master device, and if there is a slave device address matching with the 7 bits of address bits, the slave device is the participant of the next communication. The slave addressing data is followed by communication data. The I2C bus needs to follow a response bit (response when the clock line SCL is high) for every 8 bits of data (including slave addressing data) during data transmission, and has an end condition (the clock line SCL is high and the data line SDA changes from low to high) after the data transmission is finished.
In the data transmission process of dividing the starting condition and the ending condition, when the clock line SCL is at a high level, the data on the data line SDA is kept, and the data is valid; when the clock line SCL is low, the data on the data line SDA may change, with the data invalid.
The start condition is that when the clock line SCL is at a high level, the data line SDA is changed from a high level to a low level; the end condition is that the data line SDA goes from low to high while the clock line SCL is high.
Because the I2C bus adopts the bidirectional leaky open structure, any device on the bus pulls down the data line SDA or the clock line SCL, and other devices cannot pull up the bus, which results in the problem of I2C hang-up. However, during the device usage process, because some exception handling of the hardware link or the driver may also cause the I2C master device status to be abnormal, unlike the scenario of pulling down the I2C bus to hang up, in this scenario, the data line SDA or the clock line SCL is not at a low level, and it is often impossible to determine that the I2C is hung up according to the low level of the data line SDA or the clock line SCL, and it is also impossible to trigger the I2C hang up self-recovery logic based on the determination to perform the abnormal self-recovery.
The I2C state machine controls the state transition of the I2C bus in data transmission, so that the whole process conforms to the specification of the I2C protocol. Referring to FIG. 3, one possible state of the I2C state machine includes: header (Header status), Ack _ Header (Header acknowledge status), Rcv _ Data (received Data status), Xmit _ Data (transmitted Data status), Ack _ Data (acknowledge status), Wait _ Ack (acknowledge status), Stop (end status), and Idle (Idle status).
In Idle state, the bus is Idle, Reset (Reset) into Idle state; when a start condition (Detect _ start is 1) is detected, an I2C state machine enters a Header state, an 8-Bit Data head is sent after the Header state is entered, when the Bit number of a transmitted Data head reaches 8 bits (Bit _ cht is 8), the state enters an Ack _ Header state, the SDA value of a Data line is sampled, whether a response exists or not is judged, if no response exists, the state enters a Stop state, a period of time is waited to enter an Idle state, if a response exists, whether Data is sent or received is judged according to the read-write setting in the Data head, the state enters an Xmit _ Data state or the state of Rcv _ Data is received; when the starting condition is detected in the Xmit _ Data state, entering a Header state, receiving a Data head on the SDA bus, and judging whether an address in the Data head is matched with the Data head; entering a Stop state when an ending condition is detected in an Xmit _ Data state (Detect _ Stop is 1), entering a Wait _ Ack state when 8-bit Data is transmitted in the Xmit _ Data state, waiting for a receiving end to answer, entering the Xmit _ Data state if the answer is 0, and ending if the answer is 1; when the starting condition is detected in the Rcv _ Data state, entering a Header state, receiving a Data head on the SDA bus, and judging whether an address in the Data head is matched with the Data head; in the Rcv _ Data state, when an ending condition is monitored, the state enters a Stop state, and in the Rcv _ Data state, each time 8-bit Data is received, the state enters an Ack _ Data state and replies to a transmitting end. The time for transmitting 8-bit data and the time for generating corresponding response are limited, the Stop state will enter the Idle state after being maintained to the time limit, and therefore, except the Idle state, the duration of the rest states of the I2C state machine is limited under normal conditions. Therefore, once the duration of the non-Idle state of any I2C state machine is over time, the I2C is abnormal, and the communication of the I2C bus is affected.
Example 1
Referring to fig. 4, the present invention provides a method for recovering from an exception of the server BMC I2C, in which a counter counts the state duration of the I2C state machine, the counter uses a clock independent of the BMC I2C as a reference clock, and has a frequency higher than the clock frequency of the BMC I2C, so that the counter can time the status related to an Acknowledgement (ACK). The method comprises the following steps:
s1, acquiring the current I2C state machine state of each I2C bus according to the time sequence; the I2C state register of the BMC contains bits representing the state of the I2C state machine, bit values representing the state of the I2C state machine correspond to the state of the I2C state machine in a one-to-one mode, and the bit values representing the state of the I2C state machine are analyzed to obtain the state of the I2C state machine. In a specific implementation process, a feasible manner of obtaining the current I2C state machine state of each I2C bus according to a time sequence includes:
1. polling I2C status registers of each I2C bus in the BMC; the polling cycle of each I2C bus is less than the minimum value of the state duration of each state of the I2C state machine in the normal state, so as to avoid the condition of missing acquisition of the states of the state machine caused by overlong polling cycle; the polling mode is low in consumption of hardware resources of the BMC, but the polling period is longer as the number of the I2C state registers is polled, the time interval for acquiring the state of the I2C state machine is longer due to the long polling period, and when the I2C bus is too much, the interval is larger than the state duration, so that the state is missed to be acquired. Therefore, when the I2C buses are excessive, a grouping polling mode is adopted, the number of the I2C polled buses is reduced, the polling period is ensured to be proper, and the condition that the I2C state machine state is missed to collect due to the periodic process is avoided.
2. And periodically collecting the I2C status register of each path of I2C bus in the BMC. The acquisition period of each I2C bus is periodically acquired to be less than the minimum value of the state duration of each state of the I2C state machine in the normal state, so that each state of the I2C state machine can be acquired. Compared with polling, the periodic collection has more hardware overhead, but the collection is carried out independently for each I2C state register, the period is fixed, and the condition that the period is too long and further the collection of the I2C state machine state is missed due to too many I2C bus connections does not exist.
S2, comparing whether the current I2C state machine state is the same as the last I2C state machine state, if not, executing S3, and if so, executing S4; in the implementation process, whether the current I2C state machine state of each I2C bus is the same as the last I2C state machine state is compared respectively.
S3, updating the last I2C state machine state as the current I2C state machine state, resetting the counter and executing the step S1; in a specific implementation process, if the current I2C state machine state is different from the last I2C state machine state, it is indicated that the I2C state machine state changes, the last I2C state machine state does not continue, the counter stops counting the last I2C state machine state duration, and the counter is reset to record the current I2C state machine state duration.
S4, judging whether the current I2C state machine state is an Idle state, if so, executing the step S3, and if not, executing the step S5; in a specific implementation process, since the duration of the Idle state may not be limited, the Idle state needs to be excluded when the state of the I2C state machine is not changed. Since the polling period of the present application is less than the duration of the shortest state in the I2C state machine, the counter timing polled to the adjacent Idle state of the present application must not be timed out, so resetting the counter again can ensure that the timeout caused by Idle state maintenance does not occur.
S5, judging whether the counter is overtime, if yes, executing step S6, otherwise executing step S1; in a specific implementation process, a timeout threshold is set, the technology of the counter is compared with the timeout threshold to judge whether the counter is overtime, and the timeout threshold is larger than the maximum value of the duration of each state of the non-Idle I2C state machine in a normal state.
According to the basic knowledge of I2C, since the response time for transmitting 8 bits of data and generating 1 bit is limited, the Stop state duration will automatically reach the Idle state, so the duration of the non-Idle state of the I2C state machine should be limited under normal conditions. If the I2C bus is hung up such that the I2C state machine remains in a stable state, causing the counter to fail to reset, and the count of the counter exceeds the timeout threshold over a number of cycles, then an exception to the I2C state machine stuck may be raised to the I2C bus based on the counter timeout determination.
S6, recording the exception log, automatically resetting the BMC I2C to repair the I2C exception, and entering the step S1. In the specific implementation process, the recorded abnormal log content comprises an I2C bus in which an abnormality occurs, an I2C state machine state in which a jam occurs, and the time when the abnormality occurs.
Analyzing the recorded abnormal log, as shown in fig. 5, initializing a set frequency threshold, obtaining the content in the abnormal log, calculating the average frequency of the abnormal bus of each I2C in a fixed time interval according to the time and the number of times of the abnormal occurrence, comparing whether the average frequency of the abnormal bus of each I2C exceeds the frequency threshold, and generating alarm information for notifying a user if the average frequency exceeds the frequency threshold. After the BMC I2C is reset, the method provided by the present application is still executed, and if the problem is not reset and repaired, the abnormal I2C still gets stuck when communicating next time, and the present application automatically records the abnormal log again and resets the repair. This makes the average frequency of anomalies recorded by the anomaly log higher. Informing the user through the alarm message can make the user know and intervene to solve the I2C abnormality which is not processed by the reset.
In a specific implementation process, an I2C abnormity alarm module can be constructed in the BMC background, the BMC sends alarm information to the I2C abnormity alarm module, and a user can know the abnormity condition through the BMC background.
Example 2
Referring to fig. 6, the present application further provides an I2C abnormal self-recovery BMC, where the I2C abnormal self-recovery BMC of the present application can implement the server BMC I2C abnormal recovery method, including:
and the acquisition module is used for acquiring the data of the BMC I2C status register according to a time sequence. In a specific implementation process, the collection module collects data of all BMC I2C state registers in a polling manner, and a period for the collection module to perform polling collection is less than the duration of the shortest state in the states of the I2C state machine, so that the shortest state of the I2C state machine can be collected. Or the acquisition module acquires the data of each BMC I2C state register periodically, and the period of each I2C state register acquired by the acquisition module is less than the shortest state duration of the I2C state machine, so that the shortest state of the I2C state machine can be acquired.
The first judgment module is used for judging whether the state of the I2C state machine changes or not; in a specific implementation process, the first determining module compares the data currently acquired by the acquiring module with the data acquired last time, and if the currently acquired data is different from the data acquired last time, it indicates that the state of the last I2C state machine is not continued, and if the currently acquired data is the same as the data acquired last time, it indicates that the state of the last I2C state machine is still continued.
The first judging module judges whether the state which is still continuous is an Idle state.
The counting module is used for timing the state duration time of the non-Idle state according to the result of the first judging module; in a specific implementation process, the counting module uses a reference clock with a frequency higher than that of the I2C clock, and triggers counting on a rising edge or a falling edge of the reference clock, and the counting module clocks a non-Idle state of the I2C state machine which is still in a continuous state. For an Idle state that is still persistent, the count module resets each time a poll or each time a persistent Idle state is acquired. The counting module is reset, so that the timing of the counting module of the Idle is always lower than the set timeout threshold.
The second judgment module judges whether the time recorded by the counting module is overtime or not; in a specific implementation process, the timeout threshold is set, the second determining module compares the time recorded by the counting module with the timeout threshold, and if the time recorded by the counting module is greater than the timeout threshold, it is determined that I2C is abnormal.
The analysis module is used for analyzing the bit which represents the state of the I2C state machine in the BMC I2C state register to obtain the state of the I2C state machine, and the analysis process is carried out according to the mapping relation between the state of the I2C state machine adopted by the BMC and the bit which represents the state of the state machine in the I2C state register; when the second judging module judges that the I2C is abnormal, the analyzing module obtains the value of the I2C state register, and analyzes the corresponding I2C state machine state through the mapping relation according to the value.
And the execution module records an exception log and resets the BMC I2C to repair the exception when the second judgment module judges the timeout. In the specific implementation process, when an abnormal log is recorded, the I2C of the abnormal state is recorded, the state of the I2C state machine clamped when the abnormal state occurs is recorded, and the time of the abnormal state is recorded.
Furthermore, the I2C abnormal self-recovery BMC provided by the present application further includes a statistics module, where the statistics module obtains the content in the abnormal log, and obtains the number of times of occurrence of an abnormality recorded in the abnormal log within a set time interval according to the time statistics of occurrence of the state machine, so as to calculate an average frequency of abnormality of each I2C bus within the set time interval.
And manually setting a frequency threshold according to the set time interval and storing the frequency threshold in the BMC.
And the comparison module acquires the frequency threshold, compares the average frequency with the frequency threshold, and sends alarm information by the communication module if the average frequency exceeds the frequency threshold.
Example 3
The application also provides a computer-readable storage medium, wherein the computer-readable storage medium stores at least one instruction, and the BMC is connected with the computer-readable storage medium to execute the instruction to realize the server BMC I2C exception recovery method.
Example 4
A processor that configures an I2C state machine to implement I2C communications, reads instructions in a storage medium, executes the instructions to implement the server BMC I2C exception recovery method.
According to the server BMC I2C exception recovery method provided by the embodiment of the application, the duration of the state of the I2C state machine is counted through the counter; acquiring the current I2C state machine state by adopting a mode of acquiring data from an I2C state register of the BMC according to time sequence, judging whether the state of the last I2C state machine is continuously ended or not by comparing whether the current I2C state machine state acquired by the time sequence is changed with the state of the last I2C state machine, if the state of the last I2C state machine is ended, resetting the counting of the counter, and counting the duration time of the current I2C state machine state. For the non-Idle state of the I2C state machine, whether the state duration time is overtime is judged by setting a overtime threshold, if any non-Idle state of the I2C state machine is overtime, the I2C bus which is overtime is considered to be hung, an abnormal log is recorded, the BMC I2C is reset to solve the problem of hanging, and the server BMC I2C is automatically recovered when abnormal. For the continuous Idle state, the method of resetting the counter when the Idle state is continuously acquired every time is adopted, so that the counting of the counter in the continuous Idle state can never be overtime, and the condition of false alarm abnormity is avoided.
Compared with the prior art that whether the I2C is hung up is judged by using the level of the data line SDA or the clock line SCL, whether the I2C bus is hung up is judged by using the duration of the non-Idle state of the I2C state machine, so that the condition that the I2C is hung up due to the abnormal state of the I2C main equipment caused by the abnormal hardware link or the abnormal driver can be determined, the determined abnormal range is larger, and a better recovery effect is achieved on the I2C abnormal condition which cannot be judged as the I2C is hung up by measuring the level of the data line SDA or the clock line SCL.
After the reset is repaired, since the detection process of the I2C abnormity is still in effect, self-checking effect can be achieved on whether the reset repairs the abnormity. In general, the reset has a good repairing effect on the I2C exception caused by the driver problem, but cannot repair the I2C exception caused by the hardware link problem.
In addition, the method and the device record the abnormality of each path of I2C and the time of the abnormality occurrence by using the log, and the user can clearly know the abnormality occurrence condition of each path of I2C according to the log; the method and the device also calculate the average frequency of abnormal occurrence of each path of I2C by counting the log content, and if the average frequency exceeds a frequency threshold, the method and the device can generate corresponding alarm information to inform a user, do not solve the problem in self-reset repair, and can enable the user to intervene in repair in time by the mode. Especially for I2C abnormity caused by hardware link problem which can not be solved by reset, alarm information can be timely notified to users.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for recovering I2C exception of server BMC, is characterized in that, a counter is used to count the state duration of I2C state machine, the steps include:
s1, acquiring the current I2C state machine state of each I2C bus according to the time sequence;
s2, comparing whether the current I2C state machine state is the same as the last I2C state machine state, if not, executing S3, and if so, executing S4;
s3, updating the last I2C state machine state as the current I2C state machine state, resetting the counter and executing the step S1;
s4, judging whether the current I2C state machine state is an Idle state, if so, executing the step S3, and if not, executing the step S5;
s5, judging whether the counter is overtime, if yes, executing step S6, otherwise executing step S1;
s6, recording abnormal logs, resetting BMC I2C, and proceeding to step S1.
2. The server BMC I2C exception recovery method of claim 1, wherein the counter uses a BMC I2C independent clock as a reference clock.
3. The server BMC I2C exception recovery method of claim 1, wherein the I2C state register of the BMC includes bits indicating the state of the I2C state machine, wherein a bit value indicating the state of the I2C state machine corresponds to the state of the I2C state machine, and wherein the bit value indicating the state of the I2C state machine is parsed to obtain the state of the I2C state machine.
4. The server BMC I2C exception recovery method of claim 1, wherein polling the I2C bus or the I2C status register of the BMC that periodically collects each I2C bus enables obtaining the I2C state machine state.
5. The server BMC I2C exception recovery method of claim 4, wherein a polling period for polling each I2C bus or a collection period for periodically collecting each I2C bus is less than a minimum value of a state duration of each state of the I2C state machine in a normal state.
6. The server BMC I2C exception recovery method of claim 1, wherein a timeout threshold is set, and the counter technique is compared with the timeout threshold to determine whether the counter has timed out, wherein the timeout threshold is greater than the maximum value of the duration of each state of the I2C state machine non-Idle in the normal state.
7. The server BMC I2C exception recovery method of claim 1, wherein recording exception log content comprises recording an I2C bus on which the exception occurred and recording a time at which the exception occurred.
8. The server BMC I2C abnormality recovery method of claim 7, wherein the method comprises counting contents in the abnormality log, calculating an average frequency of abnormality occurrence of each I2C bus, setting a frequency threshold, comparing whether the average frequency of abnormality occurrence of each I2C bus exceeds the frequency threshold, and generating an alarm message to notify a user if the average frequency exceeds the frequency threshold.
9. An I2C exception self-healing BMC, comprising:
the acquisition module is used for acquiring data of the BMC I2C status register according to a time sequence;
the first judgment module is used for judging whether the state of the I2C state machine changes or not;
the counting module is used for timing the state duration time of the non-Idle state according to the result of the first judging module;
the second judgment module judges whether the time recorded by the counting module is overtime or not;
the analysis module is used for analyzing the bit which represents the state of the I2C state machine in the BMC I2C state register to obtain the state of the I2C state machine, and the analysis process is carried out according to the mapping relation between the state of the I2C state machine adopted by the BMC and the bit which represents the state of the state machine in the I2C state register;
and the execution module records an abnormal log and resets the BMC I2C when the second judgment module judges that the time is out.
10. The I2C exception self-healing BMC of claim 9, comprising a statistics module, wherein the statistics module obtains contents of the exception log and counts an average frequency of I2C bus exceptions for each lane;
and the comparison module compares the average frequency with a set frequency threshold, and if the average frequency exceeds the frequency threshold, the communication module sends alarm information.
CN202110528848.1A 2021-05-14 2021-05-14 Server BMC I2C exception recovery method and related device Pending CN113254247A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110528848.1A CN113254247A (en) 2021-05-14 2021-05-14 Server BMC I2C exception recovery method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110528848.1A CN113254247A (en) 2021-05-14 2021-05-14 Server BMC I2C exception recovery method and related device

Publications (1)

Publication Number Publication Date
CN113254247A true CN113254247A (en) 2021-08-13

Family

ID=77181965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110528848.1A Pending CN113254247A (en) 2021-05-14 2021-05-14 Server BMC I2C exception recovery method and related device

Country Status (1)

Country Link
CN (1) CN113254247A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114296879A (en) * 2021-12-30 2022-04-08 南通机敏软件科技有限公司 Method, storage medium and processor for supporting activity detection in openstack cloud hard disk creation process
CN114816020A (en) * 2022-04-11 2022-07-29 北京计算机技术及应用研究所 GD32 single-chip microcomputer-based PMBUS interface power board and BMC control method thereof

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004029952A (en) * 2002-06-21 2004-01-29 Yamaha Corp Semiconductor integrated circuit and image processor
CN1537281A (en) * 2001-06-29 2004-10-13 �ʼҷ����ֵ������޹�˾ Generalized I2C slave transmitter/receiver state machine
CN1991783A (en) * 2005-12-29 2007-07-04 国际商业机器公司 12c bus monitor and method for detecting and correcting hanged 12c bus
WO2008154851A1 (en) * 2007-06-15 2008-12-24 Huawei Technologies Co., Ltd. Method and device for processing binding link
US20100180162A1 (en) * 2009-01-15 2010-07-15 International Business Machines Corporation Freeing A Serial Bus Hang Condition by Utilizing Distributed Hang Timers
JP2010225104A (en) * 2009-03-25 2010-10-07 Seiko Epson Corp Electronic device
CN104834624A (en) * 2015-05-26 2015-08-12 广州正力通用电气有限公司 Anti-interference method for IIC bus interface
CN105760247A (en) * 2016-02-05 2016-07-13 浪潮(北京)电子信息产业有限公司 System and method for processing hard disk faults
KR20190107373A (en) * 2018-03-12 2019-09-20 주식회사 아이오티큐브 Fuzzing method and device for network protocol vulnerability detection
CN111124981A (en) * 2019-11-29 2020-05-08 苏州浪潮智能科技有限公司 Management system and method for server I2C equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1537281A (en) * 2001-06-29 2004-10-13 �ʼҷ����ֵ������޹�˾ Generalized I2C slave transmitter/receiver state machine
JP2004029952A (en) * 2002-06-21 2004-01-29 Yamaha Corp Semiconductor integrated circuit and image processor
CN1991783A (en) * 2005-12-29 2007-07-04 国际商业机器公司 12c bus monitor and method for detecting and correcting hanged 12c bus
WO2008154851A1 (en) * 2007-06-15 2008-12-24 Huawei Technologies Co., Ltd. Method and device for processing binding link
US20100180162A1 (en) * 2009-01-15 2010-07-15 International Business Machines Corporation Freeing A Serial Bus Hang Condition by Utilizing Distributed Hang Timers
JP2010225104A (en) * 2009-03-25 2010-10-07 Seiko Epson Corp Electronic device
CN104834624A (en) * 2015-05-26 2015-08-12 广州正力通用电气有限公司 Anti-interference method for IIC bus interface
CN105760247A (en) * 2016-02-05 2016-07-13 浪潮(北京)电子信息产业有限公司 System and method for processing hard disk faults
KR20190107373A (en) * 2018-03-12 2019-09-20 주식회사 아이오티큐브 Fuzzing method and device for network protocol vulnerability detection
CN111124981A (en) * 2019-11-29 2020-05-08 苏州浪潮智能科技有限公司 Management system and method for server I2C equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114296879A (en) * 2021-12-30 2022-04-08 南通机敏软件科技有限公司 Method, storage medium and processor for supporting activity detection in openstack cloud hard disk creation process
CN114296879B (en) * 2021-12-30 2023-10-24 南通机敏软件科技有限公司 Method, storage medium and processor for supporting activity detection in process of creating opentack cloud hard disk
CN114816020A (en) * 2022-04-11 2022-07-29 北京计算机技术及应用研究所 GD32 single-chip microcomputer-based PMBUS interface power board and BMC control method thereof

Similar Documents

Publication Publication Date Title
CN113254247A (en) Server BMC I2C exception recovery method and related device
CN100365994C (en) Method and system for regulating ethernet
GB2404540A (en) Router for I2C bus which prevents unauthorised data access
CN103019871B (en) A kind of anti-deadlock system of I2C bus and deadlock prevention method thereof
US20110035180A1 (en) Diagnostic apparatus and system adapted to diagnose occurrence of communication error
US20030229406A1 (en) Computer system status monitoring
US9778971B2 (en) Slave device, master device, and communication method
CN114328102B (en) Equipment state monitoring method, equipment state monitoring device, equipment and computer readable storage medium
CN110908841B (en) I2C communication abnormity recovery method and device
US10592376B2 (en) Real-time hierarchical protocol decoding
CN111176952A (en) Monitoring method, monitoring system and related device for I2C channel
TW201514708A (en) I2C bus monitoring device
GB2403565A (en) Inter-integrated bus port or router for providing increased security
CN112422178A (en) Optical module monitoring method, electronic device and storage medium
CN1625736A (en) Method and system for monitoring DMA status in computer system
CN111538626B (en) Method for releasing from I2C device
EP3118749B1 (en) System and method of monitoring a serial bus
CN104539494A (en) Method and system for identifying awakening signals
CN111124785A (en) Hard disk fault checking method, device, equipment and storage medium
GB2403039A (en) Error management system for a I2C router
CN114095300B (en) Data reading and writing method and device with self-adaptive rate
CN112463707B (en) I2C link management system and method
CN113342592B (en) Disconnection detection method, device and medium of hot plug equipment
US20010025330A1 (en) Interface having plug and play function
CN107133130A (en) Computer operational monitoring method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210813

RJ01 Rejection of invention patent application after publication