CN111324486A - Method and system for repairing hanging die of expander chip and related device - Google Patents

Method and system for repairing hanging die of expander chip and related device Download PDF

Info

Publication number
CN111324486A
CN111324486A CN202010071007.8A CN202010071007A CN111324486A CN 111324486 A CN111324486 A CN 111324486A CN 202010071007 A CN202010071007 A CN 202010071007A CN 111324486 A CN111324486 A CN 111324486A
Authority
CN
China
Prior art keywords
cpld
register
expander chip
expander
preset value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010071007.8A
Other languages
Chinese (zh)
Inventor
陈树成
张猛
王军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010071007.8A priority Critical patent/CN111324486A/en
Publication of CN111324486A publication Critical patent/CN111324486A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1441Resetting or repowering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a repairing method for hanging death of an expander chip, which comprises the following steps: when the CPLD detects that the expander chip is hung up, the CPLD generates a preset value in a CPLD register; the CPLD starts to time for a preset time, and the expander chip is electrified again after the time is over; the BMC reads the CPLD register and judges whether a preset value exists in the CPLD register; if so, erasing the CPLD register after the expander chip is repaired. The method and the device have the advantages that the expander chip can be automatically restarted, the preset value detected at each time is guaranteed to represent the real-time state of the current expander chip, and the repairing efficiency after the expander chip is hung up is improved. The application also provides a system for repairing the hang-up of the expander chip, a computer readable storage medium and a server, which have the beneficial effects.

Description

Method and system for repairing hanging die of expander chip and related device
Technical Field
The application relates to the field of servers, in particular to a method and a system for repairing hanging death of an expander chip and a related device.
Background
The SAS expander is an embedded system chip and needs to perform normal operations by executing codes burned in the chip. If too many services are executed by the program or the code has problems, the expander is possibly abnormally hung up, and at the moment, the expander cannot normally work. During an Expander hang, the upper system will not be able to identify all the hard disks under the Expander. For the user, only the reduction of the storage space of the storage system can be sensed, and the reason of the problem generation cannot be sensed, so that the problem is difficult to be assisted and solved by customers.
Disclosure of Invention
The application aims to provide a method and a system for repairing hanging death of an expander chip, a computer readable storage medium and a server, which can automatically reset and repair the expander chip in time.
In order to solve the technical problem, the application provides a repairing method for hanging death of an expander chip, and the specific technical scheme is as follows:
when the CPLD detects that the expander chip is hung up, the CPLD generates a preset value in a CPLD register;
the CPLD starts to time for a preset time, and the expander chip is electrified again after the time is over;
the BMC reads the CPLD register and judges whether the CPLD register has the preset value or not;
and if so, erasing the CPLD register after the expander chip is powered on again.
Wherein, still include:
the CPLD judges whether to be hung up or not according to the output signal of the expander chip;
when the output signal is a square wave signal, determining that the expander chip works normally;
and when the output signal is in a continuous high level or a continuous low level, determining that the expander chip is hung up.
Wherein, the BMC reading the CPLD register comprises:
and the BMC reads the CPLD register according to a preset period.
If the preset value exists in the CPLD register, the method further comprises the following steps:
and generating alarm information and recording a log.
Wherein, the CPLD generating the preset value in the CPLD register comprises:
and generating a preset value by a register at a preset position in the CPLD register.
Wherein, still include:
and determining the reason for hanging up the expander chip according to the log.
The application also provides a repair system that expander chip hangs and dies, includes:
the CPLD is used for generating a preset value in a CPLD register when the expander chip is detected to be hung dead; timing a preset time length, and re-electrifying the expander chip after timing is finished;
the BMC is used for reading the CPLD register and judging whether the preset value exists in the CPLD register; and when the CPLD register has the preset value, erasing the CPLD register after the expander chip is electrified again.
Wherein, the BMC includes:
and the period detection unit is used for reading the CPLD register at a preset period.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the repair method as described above.
The present application further provides a server, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the repair method described above when calling the computer program in the memory.
The application provides a repairing method for hanging death of an expander chip, which comprises the following steps: when the CPLD detects that the expander chip is hung up, the CPLD generates a preset value in a CPLD register; the CPLD starts to time for a preset time, and the expander chip is electrified again after the time is over; the BMC reads the CPLD register and judges whether the CPLD register has the preset value or not; and if so, erasing the CPLD register after the expander chip is powered on again.
According to the method, after the expander chip is detected to be hung and dead, the CPLD is used for electrifying the expander chip again after a certain time, so that the expander chip can be automatically restarted, meanwhile, the CPLD generates a preset value in the CPLD register, the BMC can erase the CPLD register according to the preset value, the preset value detected each time is guaranteed to represent the real-time state of the current expander chip, and the repairing efficiency of the expander chip after being hung and dead is improved. The application also provides a system for repairing the hang-up of the expander chip, a computer readable storage medium and a server, which have the beneficial effects and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a method for repairing an expander die provided in an embodiment of the present application;
fig. 2 is a schematic structural diagram of a repair system for expander chip hang-up according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of a repairing method for expander die-hang provided in an embodiment of the present application, where the repairing method includes:
s101: when the CPLD detects that the expander chip is hung up, the CPLD generates a preset value in a CPLD register;
in general, the expander chip may be hung up due to software bug or excessive service pressure. There is no limitation on how expander die hang is detected. Preferably, the CPLD can judge whether the expander chip is dead according to an output signal of the expander chip, and when the output signal is a square wave signal, it is determined that the expander chip works normally; and when the output signal is continuously high level or continuously low level, determining that the expander chip is hung up.
During the normal operation of the Expander chip, a square wave signal (also called a heartbeat signal or simply a heartbeat) with a certain frequency is continuously output through one pin of the chip, and the CPLD can judge that the Expander chip normally operates by checking the signal output by the Expander. If the expander is hung, the thread cannot apply for the use authority of the chip processor, the corresponding pin cannot be operated to perform level turnover, the expander cannot continuously operate the chip pin to output high and low levels, and at the moment, the square wave signal is lost and externally shows a long-term high level or a long-term low level.
Once the expander chip is detected to be hung, the CPLD generates a preset value in the CPLD register of the CPLD, and the generation of the preset value mainly refers to the modification of a numerical value corresponding to the register. It should be noted that the preset value can be generated by a register at a preset position in the CPLD register. . I.e., which register in the CPLD is committed to indicate the expander chip status. The CPLD and the BMC can agree a special value to indicate that the expander chip is hung up, for example, the 1 st register of the CPLD is 0xA5 to indicate that the expander chip is hung up, and the default value is 0xFF to indicate that the expander is normal. When the CPLD checks that the expander heartbeat signal is lost, the register is operated and 0xA5 is written.
S102: the CPLD starts to time for a preset time, and the expander chip is electrified again after the time is over;
the step mainly re-powers the expander chip. The CPLD has the function of monitoring signals, and the CPLD and the expander can approximate the frequency of heartbeat, such as 1Hz, in the development stage. Thus, the CPLD checks the heartbeat signal of the expander once per second, and if the heartbeat is absent, the CPLD can determine that the expander is in the hang-up state. Generally, the software of the expander has certain fault-tolerant capability, and the influence of the expansion hang-up in a short time on the user service can be tolerated. Therefore, the CPLD can wait for one minute, and if the expander cannot repair the problem by itself, the heartbeat still does not exist, which indicates that the expander is hung up and cannot repair by itself. The CPLD can power down and re-power up the expander chip at the moment, and the expander powered up again can work normally.
S103: the BMC reads the CPLD register and judges whether a preset value exists in the CPLD register; if yes, entering S104;
in this step, the BMC needs to read the CPLD register and determine whether a preset value exists. Once present, expander chip hang-up is indicated. The read frequency of the BMC reading the CPLD register is not limited herein, and the CPLD register may be read in a predetermined cycle. The register state of the CPLD may be detected once for 1 second, for example.
S104: and erasing the CPLD register after the expander chip is powered on again.
Even if the CPLD powers off and powers on the expander chip again, the preset value still exists in the register of the CPLD, namely the register of the CPLD still shows the state that the expander is hung up, if the expander is hung up again at the moment, the BMC cannot judge whether the expander is hung up again, and cannot accurately record the log hung up. Therefore, after the BMC records the log, the register of the CPLD is erased and restored to the default value, so as to perform subsequent work.
In particular, the BMC may also generate alarm information and log.
The BMC accesses the CPLD register through the IIC, if the value of the register is not a preset value, the situation that the expander normally works and is not hung at the moment is indicated, the BMC does not perform any operation, and if the value of the register is found to be 0xA5, the situation that the expander is hung at the moment is indicated, on one hand, the BMC displays an alarm on a user interface, the alarm can be accurate to which expander, so that the problem is quickly positioned, meanwhile, the BMC records logs, and if the frequency of the BMC accessing the CPLD is short in high period, the hang time of the expander accurate to seconds can be recorded. The method can help research and development to quickly locate the reason for hanging death of the expander and carry out later improvement by combining the log recorded by the expander before hanging death.
It should be noted that there is no absolute sequential execution relationship between steps S103 and S104 and steps S101 and S102. In fact, the process of reading the CPLD register by the BMC is continuous, regardless of whether there is an expander chip hang-up condition. The method is used for ensuring that the CPLD register can be timely erased once the expander chip is hung dead, and avoiding influencing the judgment of next expander chip hanging dead. Usually, the BMC is also used for recording the system log, and once the preset value in the CPLD cannot be erased in time, the recording of the system log is affected.
According to the embodiment of the application, after the expander chip is detected to be hung dead, the CPLD is used for electrifying the expander chip again after a certain time, so that the expander chip can be automatically restarted, meanwhile, the CPLD generates the preset value in the CPLD register, the BMC can erase the CPLD register according to the preset value, the preset value detected each time is guaranteed to represent the real-time state of the current expander chip, and the repair efficiency after the expander chip is hung dead is improved.
The repair method provided by the present application is described below with CPLD and BMC, respectively.
For the CPLD, once the expander chip is hung up, the heartbeat signal cannot be sent, that is, only the high level or the low level continues, and the CPLD cannot receive the square wave signal, it is determined that the expander chip is hung up, and at this time, the preset value is written into the CPLD register. And meanwhile, timing is started when the expander chip is confirmed to be hung up, and the specific timing duration is within the fault-tolerant range. And after the timing is finished, the expander chip is powered on again so as to enable the expander chip to work normally.
For the BMC, when the expander chip is hung to death and cannot output a heartbeat signal, the CPLD writes a preset value into the CPLD register, and the BMC detects that the CPLD register has the preset value generated, namely, confirms that the expander chip is hung to death. After the CPLD timing is finished, the expander chip returns to normal. And after the BMC confirms that the expander chip is hung up, the preset value in the CPLD register needs to be cleared. Certainly, the BMC does not need to wait until the CPLD timing ends and then erase the CPLD register, that is, the CPLD timing is started can be regarded as being powered up again, and therefore, once the CPLD timing is started, the powering up again is an inevitable process. In addition, the BMC can record logs so that a person skilled in the art can confirm the reason for hanging up the expander chip and the like according to the logs. The BMC may also alarm to notify those skilled in the art that the expander chip is dead. When the frequency of hanging up the expander chip is higher, it is especially necessary to send out alarm information.
The following introduces a repair system for expander chip hang-up provided in the embodiments of the present application, and the repair system described below and the repair method for expander chip hang-up described above may be referred to correspondingly.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a repair system for expander chip hang-up provided in an embodiment of the present application, where the system includes:
the CPLD is used for generating a preset value in a CPLD register when the expander chip is detected to be hung dead; timing a preset time length, and re-electrifying the expander chip after timing is finished;
the BMC is used for reading the CPLD register and judging whether the preset value exists in the CPLD register; and when the CPLD register has the preset value, erasing the CPLD register after the expander chip is electrified again.
Typically, the BMC and CPLD are connected via an IIC bus.
Based on the above embodiment, as a preferred embodiment, the BMC may include:
and the period detection unit is used for reading the CPLD register at a preset period.
The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed, may implement the steps provided by the above-described embodiments. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The application also provides a server, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided by the foregoing embodiments when calling the computer program in the memory. Of course, the server may also include various network interfaces, power supplies, and the like.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system provided by the embodiment, the description is relatively simple because the system corresponds to the method provided by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method for repairing hanging death of an expander chip is characterized by comprising the following steps:
when the CPLD detects that the expander chip is hung up, the CPLD generates a preset value in a CPLD register;
the CPLD starts to time for a preset time, and the expander chip is electrified again after the time is over;
the BMC reads the CPLD register and judges whether the CPLD register has the preset value or not;
and if so, erasing the CPLD register by the BMC after the expander chip is powered on again.
2. The repair method according to claim 1, further comprising:
the CPLD judges whether to be hung up or not according to the output signal of the expander chip;
when the output signal is a square wave signal, determining that the expander chip works normally;
and when the output signal is in a continuous high level or a continuous low level, determining that the expander chip is hung up.
3. The repair method of claim 1, wherein the BMC reading the CPLD register comprises:
and the BMC reads the CPLD register according to a preset period.
4. The repair method according to claim 1, wherein if the CPLD register has the preset value, further comprising:
and generating alarm information and recording a log.
5. The repair method of claim 1, wherein the generating of the preset value in the CPLD register by the CPLD comprises:
and generating a preset value by a register at a preset position in the CPLD register.
6. The repair method according to claim 4, further comprising:
and determining the reason for hanging up the expander chip according to the log.
7. An expander chip hang-up repair system, comprising:
the CPLD is used for generating a preset value in a CPLD register when the expander chip is detected to be hung dead; timing a preset time length, and re-electrifying the expander chip after timing is finished;
the BMC is used for reading the CPLD register and judging whether the preset value exists in the CPLD register; and when the CPLD register has the preset value, erasing the CPLD register after the expander chip is electrified again.
8. The repair system of claim 7, wherein the BMC comprises:
and the period detection unit is used for reading the CPLD register at a preset period.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
10. A server, comprising a memory having a computer program stored therein and a processor that implements the steps of the method of any one of claims 1-6 when called upon by the processor in the memory.
CN202010071007.8A 2020-01-21 2020-01-21 Method and system for repairing hanging die of expander chip and related device Pending CN111324486A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010071007.8A CN111324486A (en) 2020-01-21 2020-01-21 Method and system for repairing hanging die of expander chip and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010071007.8A CN111324486A (en) 2020-01-21 2020-01-21 Method and system for repairing hanging die of expander chip and related device

Publications (1)

Publication Number Publication Date
CN111324486A true CN111324486A (en) 2020-06-23

Family

ID=71171027

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010071007.8A Pending CN111324486A (en) 2020-01-21 2020-01-21 Method and system for repairing hanging die of expander chip and related device

Country Status (1)

Country Link
CN (1) CN111324486A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949431A (en) * 2020-08-27 2020-11-17 英业达科技有限公司 Fatal error providing method and fatal error identification method for system-on-chip product
CN117290142A (en) * 2023-09-27 2023-12-26 镁佳(武汉)科技有限公司 Inter-core heartbeat interaction method, system, device, computer equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284207A (en) * 2018-08-30 2019-01-29 紫光华山信息技术有限公司 Hard disc failure processing method, device, server and computer-readable medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284207A (en) * 2018-08-30 2019-01-29 紫光华山信息技术有限公司 Hard disc failure processing method, device, server and computer-readable medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949431A (en) * 2020-08-27 2020-11-17 英业达科技有限公司 Fatal error providing method and fatal error identification method for system-on-chip product
CN117290142A (en) * 2023-09-27 2023-12-26 镁佳(武汉)科技有限公司 Inter-core heartbeat interaction method, system, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN101576860B (en) Detection method and detection system of linux or windows operation system
CN102591591B (en) Disk detection system, disk detection method and network store system
US9946600B2 (en) Method of detecting power reset of a server, a baseboard management controller, and a server
CN111324486A (en) Method and system for repairing hanging die of expander chip and related device
TW201426546A (en) Techniques for switching threads within routines
CN101826367A (en) Method and device for monitoring reliability of semiconductor storage device
CN111048138A (en) Hard disk fault detection method and related device
CN103984618A (en) Method for monitoring hard disk activity state of LINUX server
CN109741786A (en) A kind of solid state hard disk monitoring method, device and equipment
CN116383012B (en) Method and device for acquiring boot log and method for transmitting boot log
CN111813748B (en) File system mounting method and device, electronic equipment and storage medium
CN111124774A (en) Method and related device for testing stability of server in starting process
CN113656235B (en) Method, device, system and medium for controlling and testing power consumption of whole server
TW200532433A (en) Device and method for automatically detecting and announcing error on booting a motherboard
CN101071396A (en) Method for setting system reset reason monitoring information and monitoring method
CN113626233B (en) Method, device and equipment for automatically detecting BIOS watchdog function
CN111858239B (en) Server hard disk monitoring method, device, equipment and medium
CN111858532A (en) Solid state disk log export method, system and device and readable storage medium
CN109885328B (en) BIOS updating method and system and related components
WO2022239118A1 (en) History management device, control method, and computer-readable medium
CN113821387B (en) KVM function keep-alive test method, device, equipment and medium
JP3620984B2 (en) Computer automatic schedule control system, recording medium therefor, and computer automatic schedule control method
CN117762732A (en) Method, script, device, equipment and medium for monitoring memory read-write
CN112506726A (en) System AC test method, device and system based on Feiteng processor
CN114265555A (en) Method, device and medium for cleaning disk data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200623