CN111858122A - Fault detection method, device, equipment and storage medium of storage link - Google Patents

Fault detection method, device, equipment and storage medium of storage link Download PDF

Info

Publication number
CN111858122A
CN111858122A CN202010746811.1A CN202010746811A CN111858122A CN 111858122 A CN111858122 A CN 111858122A CN 202010746811 A CN202010746811 A CN 202010746811A CN 111858122 A CN111858122 A CN 111858122A
Authority
CN
China
Prior art keywords
link
storage
fault
judging
hard disk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010746811.1A
Other languages
Chinese (zh)
Inventor
韩廷卯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Inspur Data Technology Co Ltd
Original Assignee
Beijing Inspur Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Inspur Data Technology Co Ltd filed Critical Beijing Inspur Data Technology Co Ltd
Priority to CN202010746811.1A priority Critical patent/CN111858122A/en
Publication of CN111858122A publication Critical patent/CN111858122A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/076Error or fault detection not based on redundancy by exceeding limits by exceeding a count or rate limit, e.g. word- or bit count limit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a fault detection method of a storage link, which comprises the following steps: acquiring a link physical state of a storage link; the storage link is a link for accessing the target hard disk by the storage main system through the target controller; judging whether the link physical state is an offline state; if yes, judging that the storage link is a fault link, and triggering link switching operation; if not, judging whether the storage link is a fault link or not based on the link layer and the application layer; if yes, judging that the storage link is a fault link, and triggering link switching operation; if not, the flow is ended. Therefore, whether the link is a fault link or not is detected from the three layers of the physical layer, the link layer and the application layer, so that the fault detection accuracy can be improved, and the performance loss and the data loss risk caused by the link fault are reduced; the invention also discloses a fault detection device, equipment and a storage medium of the storage link, and the technical effects can be realized.

Description

Fault detection method, device, equipment and storage medium of storage link
Technical Field
The present invention relates to the field of storage technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting a failure of a storage link.
Background
At present, the requirement of the storage device on reliability is high, and therefore, the daily operation and maintenance management of the storage device is crucial. More and more intelligent management means are used for storage equipment at present, such as detecting the running state of the storage equipment, isolating faults and the like, so that more serious influence is avoided, the labor cost is reduced, and the accuracy and the timeliness are improved. For the storage device with dual-control redundancy design, the access of the storage main system to the back-end hard disk is uniformly distributed on the two controllers, but the access to a certain hard disk is generally communicated through a fixedly selected link. If there is a link instability situation, if IO (Input Output) operation is still performed on the original link, IO operation performance is degraded, and a data loss risk is generated. Therefore, how to accurately detect a failed link between a storage host system and a back-end hard disk is a problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a method, a device, equipment and a storage medium for detecting the fault of a storage link, so as to accurately detect the fault link between a storage main system and a back-end hard disk.
In order to achieve the above object, the present invention provides a method for detecting a failure of a storage link, including:
acquiring a link physical state of a storage link; the storage link is a link for accessing the target hard disk by the storage main system through the target controller;
judging whether the link physical state is an offline state;
if yes, judging the storage link to be a fault link, and triggering link switching operation; if not, judging whether the storage link is a fault link or not based on a link layer and an application layer; if yes, judging the storage link to be a fault link, and triggering link switching operation; if not, the flow is ended.
Wherein the determining whether the storage link is a failed link based on the link layer and the application layer includes:
determining whether a CRC error count corresponding to the memory link exceeds a predetermined threshold;
if yes, judging the storage link to be a fault link, and triggering link switching operation;
if not, judging whether the hard disk login state corresponding to the storage link is login failure or not; if the login state of the hard disk is login failure, judging that the storage link is a failure link, and triggering link switching operation; otherwise, the flow ends.
Wherein the acquiring the link physical state of the storage link includes:
and acquiring the link physical state of the storage link by taking the preset time length as a period.
Wherein the triggering a link handover operation comprises:
switching the storage link to: accessing a link of the target hard disk through a controller other than the target controller.
Wherein, after determining that the storage link is a failed link, the method further comprises:
and generating alarm information of the storage link as a fault link.
Wherein the generating of the alarm information that the storage link is a failed link includes:
determining failure cause information of the storage link;
and generating fault alarm information corresponding to the storage link by using the fault reason information.
To achieve the above object, the present invention further provides a failure detection apparatus for a storage link, including:
the physical state acquisition module is used for acquiring the link physical state of the storage link; the storage link is a link for accessing the target hard disk by the storage main system through the target controller;
the first judgment module is used for judging whether the link physical state is an offline state; if yes, triggering a judging module;
a second judging module, configured to, when the link physical state is not an offline state, judge whether the storage link is a failed link based on a link layer and an application layer; if yes, triggering a judging module;
the judging module is used for judging the storage link to be a fault link;
and the operation execution module is used for executing link switching operation on the failed link.
Wherein the second judging module comprises:
a first judgment unit configured to judge whether a CRC error count corresponding to the memory link exceeds a predetermined threshold; if yes, triggering the judging module and the operation executing module;
the second judgment module is used for judging whether the hard disk login state corresponding to the storage link is login failure or not when the CRC error count does not exceed a preset threshold; and if the login state of the hard disk is login failure, triggering the judging module and the operation executing module.
To achieve the above object, the present invention further provides an electronic device comprising:
a memory for storing a computer program;
a processor for implementing the steps of the above described method of fault detection of a storage link when executing said computer program.
To achieve the above object, the present invention further provides a computer-readable storage medium having a computer program stored thereon, which, when being executed by a processor, implements the steps of the above-mentioned storage link failure detection method.
According to the above scheme, the method for detecting the failure of the storage link provided by the embodiment of the invention comprises the following steps: acquiring a link physical state of a storage link; the storage link is a link for accessing the target hard disk by the storage main system through the target controller; judging whether the link physical state is an offline state; if yes, judging that the storage link is a fault link, and triggering link switching operation; if not, judging whether the storage link is a fault link or not based on the link layer and the application layer; if yes, judging that the storage link is a fault link, and triggering link switching operation; if not, the flow is ended. Therefore, whether the link is a fault link or not is detected from the three layers of the physical layer, the link layer and the application layer, so that the fault detection accuracy can be improved, and the performance loss and the data loss risk caused by the link fault are reduced; the invention also discloses a fault detection device, equipment and a storage medium of the storage link.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for detecting a failure of a storage link according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a link failure monitoring process according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a failure detection apparatus for a storage link according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a method, a device, equipment and a storage medium for detecting a fault of a storage link, which are used for accurately detecting the fault link between a storage main system and a rear-end hard disk.
Referring to fig. 1, a schematic flow chart of a method for detecting a failure of a storage link according to an embodiment of the present invention is shown; as can be seen, the fault detection method specifically includes the following steps:
s101, acquiring a link physical state of a storage link; the storage link is a link for accessing the target hard disk by the storage main system through the target controller;
specifically, for the storage device with dual-control redundancy design, the access of the storage main system to the rear-end hard disk is uniformly distributed on the two controllers, so if the controllers include the controller 1 and the controller 2, the storage link of the storage main system to access the hard disk includes the following two links: the storage main system accesses a first storage link of the hard disk through the controller 1, and the storage main system accesses a second storage link of the hard disk through the controller 2, so that if the storage link is accurately detected to be a fault link through the scheme, a link switching operation can be executed to switch the link accessing the hard disk to another storage link.
It should be noted that, when the link physical state of the storage link is obtained in the present solution, the link physical state of the storage link may be specifically obtained with a predetermined time as a period. Such as: and setting the detection period to be 5 minutes, and executing the fault detection method in the scheme every 5 minutes so as to find out the fault link in time and avoid performance loss and data risk when IO operation is executed through the fault link. Of course, the predetermined time period representing the detection period described in the present application may be set by self-definition according to actual requirements, and is only 5 minutes for illustration.
S102, judging whether the physical state of the link is an off-line state; if not, executing S103; if yes, executing S104;
this application provides three-layer fault detection mode for accurate detection link trouble, promptly: the memory link is checked sequentially through a physical layer, a link layer and an application layer. Specifically, when the link detection is performed based on the physical layer, the physical state of the link of the storage link is detected, if the storage link is in an offline state, the fault is directly marked, the link switching is triggered, and the subsequent detection step is not required to be executed; if the state is not the offline state, the subsequent detection steps are continuously executed so as to accurately detect the fault of the storage link.
S103, judging whether the storage link is a fault link or not based on a link layer and an application layer; if yes, executing S104; if not, ending the flow;
it should be noted that, when determining whether the storage link is a failed link based on the link layer and the application layer, the method may specifically include the following steps:
determining whether a CRC error count corresponding to the memory link exceeds a predetermined threshold;
if yes, judging the storage link to be a fault link, and triggering link switching operation;
if not, judging whether the hard disk login state corresponding to the storage link is login failure or not; if the login state of the hard disk is login failure, judging that the storage link is a failure link, and triggering link switching operation; otherwise, the flow ends.
Fig. 2 is a schematic diagram of a link fault monitoring process disclosed in the embodiment of the present invention; it can be seen that, when the link failure is periodically detected, whether the physical state of the link of the physical layer is the offline state is judged, if yes, the link is marked as the failed link, and the link switching operation is triggered; in addition, when judging whether the storage link is a fault link based on the link layer, the CRC error count in the link layer is judged whether to reach a preset threshold value; it should be noted that the CRC error count is the number of data transmission errors between the hard disk and the motherboard, so that each hard disk has a corresponding CRC error count, and the corresponding CRC error count of the storage link is the CRC error count of the hard disk corresponding to the storage link. And, the present application sets a predetermined threshold corresponding to the CRC error count in advance, if the predetermined threshold is exceeded, it indicates that the storage link is failed, marks the link as a failed link, and triggers a link switching operation, such as: if the error count is 2 times in 1 second and the predetermined threshold is 3 times in 1 second, it indicates that the memory link is not failed. Further, if the storage link is not judged to be in fault in the link layer level, the method and the device can also continue to judge from the application layer level. For example, in the present application, a login state of a hard disk may be detected, and if the login state is a failed state, a link failure is marked, and link switching is triggered.
And S104, judging the storage link to be a fault link, and triggering link switching operation.
Specifically, after determining that the storage link is a failed link, the triggered link switching operation specifically includes: switching the storage link to: the link of the target hard disk is accessed through a controller other than the target controller. As described above, if the controller includes the controller 1 and the controller 2, the storage link of the storage host system accessing the target hard disk includes the following two links: the storage main system accesses a first storage link of the target hard disk through the controller 1, the storage main system accesses a second storage link of the target hard disk through the controller 2, and if the current storage link is the first storage link, the first storage link is switched to the second storage link after the link switching operation is triggered, so that the storage main system accesses the target hard disk through the second storage link.
In summary, the application provides a three-layer fault detection scheme, a first layer judges a physical layer state, a second layer compares error counts of link layers, a third layer judges a hard disk logic state, a fault link is detected in a layer-by-layer progressive mode, if any layer meets a fault judgment condition, the fault link is marked, and the fault link is quickly switched to another link, so that performance reduction and data risk caused by IO execution on the original link are avoided, and the usability of the whole storage system is improved.
Based on the foregoing embodiment, in this embodiment, after determining that the storage link is a failed link, the method further includes: and generating alarm information of the storage link as a fault link. In addition, when the alarm information that the storage link is the failed link is generated, the failure cause information of the storage link needs to be determined first, and the failure alarm information corresponding to the storage link needs to be generated by using the failure cause information.
It can be understood that, after the storage link is determined to be a failed link, in order to enable a manager to know the state of each link in time and repair the failed link in time, alarm information needs to be generated, and the manager is prompted to have a link failure phenomenon in a mode of reporting the alarm information. Also, the failure warning information in the present application may include detail information of the storage link, failure discovery time, failure cause information, and the like, where the detail information indicates which hard disk the link is specifically accessed through which controller, the failure discovery time is a time when the storage link is determined to be a failed link, and the failure cause information may be determined according to a condition for determining that the storage link is a failed link, such as: if the storage link is determined to be a failed link according to the physical state of the link passing through the physical layer, the failure cause information is: if the link physical state is an offline state and is determined to be a failed link, or if the storage link is determined to be a failed link through the CRC error count of the link layer, the failure cause information is: if the CRC error count is too large, the link is determined to be a fault link, or if the storage link is determined to be a fault link through the hard disk login state of the application layer, the fault cause information is as follows: and determining the fault link as the hard disk login state is a failure state.
In summary, the monitoring scheme for the link failure of the main storage system and the rear-end hard disk provided by the scheme can select judgment bases from a physical layer, a link layer and an application layer respectively, so that the failure detection accuracy is improved, and the performance loss and the data risk caused by the link failure are reduced. In addition, by the mode of reporting the fault alarm information, the administrator can know the fault reason information of the fault link in detail according to the fault alarm information, so that the fault link can be repaired in a targeted manner, and the repairing speed is improved.
In the following, the fault detection apparatus provided by the embodiment of the present invention is introduced, and the fault detection apparatus described below and the fault detection method described above may be referred to each other.
Referring to fig. 3, an apparatus for detecting a failure of a storage link according to an embodiment of the present invention includes:
a physical state obtaining module 100, configured to obtain a link physical state of a storage link; the storage link is a link for accessing the target hard disk by the storage main system through the target controller;
a first determining module 200, configured to determine whether the link physical state is an offline state; if yes, triggering a judging module;
a second determining module 300, configured to determine whether the storage link is a failed link based on a link layer and an application layer when the physical state of the link is not an offline state; if yes, triggering a judging module;
a determining module 400, configured to determine that the storage link is a failed link;
an operation executing module 500, configured to execute a link switching operation on the failed link.
Wherein the second judging module comprises:
a first judgment unit configured to judge whether a CRC error count corresponding to the memory link exceeds a predetermined threshold; if yes, triggering the judging module and the operation executing module;
the second judgment module is used for judging whether the hard disk login state corresponding to the storage link is login failure or not when the CRC error count does not exceed a preset threshold; and if the login state of the hard disk is login failure, triggering the judging module and the operation executing module.
The physical state acquisition module is specifically configured to: and acquiring the link physical state of the storage link by taking the preset time length as a period.
Wherein the operation execution module is specifically configured to: switching the storage link to: accessing a link of the target hard disk through a controller other than the target controller.
Wherein, this scheme still includes:
and the alarm information generating module is used for generating the alarm information of which the storage link is a fault link.
The alarm information generation module is specifically configured to: determining failure cause information of the storage link; and generating fault alarm information corresponding to the storage link by using the fault reason information.
Referring to fig. 4, an electronic device disclosed for an embodiment of the present invention includes:
a memory 11 for storing a computer program;
a processor 12 for implementing the steps of the method for detecting a failure of a storage link according to the above embodiment when executing the computer program.
In this embodiment, the device may be a PC (Personal Computer), or may be a terminal device such as a smart phone, a tablet Computer, a palmtop Computer, or a portable Computer.
The device may include a memory 11, a processor 12, and a bus 13.
The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the device, for example a hard disk of the device. The memory 11 may also be an external storage device of the device in other embodiments, such as a plug-in hard disk provided on the device, a Smart Memory Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit of the device and an external storage device. The memory 11 may be used not only to store application software installed in the device and various types of data such as program codes for performing a failure detection method, etc., but also to temporarily store data that has been output or is to be output.
The processor 12 may be, in some embodiments, a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip for executing program codes stored in the memory 11 or Processing data, such as program codes for executing a fault detection method.
The bus 13 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
Further, the device may further include a network interface 14, and the network interface 14 may optionally include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are generally used to establish a communication connection between the device and other electronic devices.
Optionally, the device may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the device and for displaying a visualized user interface.
Fig. 4 shows only the device with the components 11-14, and it will be understood by those skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the device, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
A computer-readable storage medium is disclosed for an embodiment of the present invention, and the computer-readable storage medium stores thereon a computer program, which when executed by a processor implements the steps of the method for detecting a failure of a storage link according to the above-mentioned method embodiment.
Wherein the storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for failure detection of a storage link, comprising:
acquiring a link physical state of a storage link; the storage link is a link for accessing the target hard disk by the storage main system through the target controller;
judging whether the link physical state is an offline state;
if yes, judging the storage link to be a fault link, and triggering link switching operation; if not, judging whether the storage link is a fault link or not based on a link layer and an application layer; if yes, judging the storage link to be a fault link, and triggering link switching operation; if not, the flow is ended.
2. The method of claim 1, wherein the determining whether the storage link is a failed link based on a link layer and an application layer comprises:
determining whether a CRC error count corresponding to the memory link exceeds a predetermined threshold;
if yes, judging the storage link to be a fault link, and triggering link switching operation;
if not, judging whether the hard disk login state corresponding to the storage link is login failure or not; if the login state of the hard disk is login failure, judging that the storage link is a failure link, and triggering link switching operation; otherwise, the flow ends.
3. The method of claim 1, wherein the obtaining the link physical state of the storage link comprises:
and acquiring the link physical state of the storage link by taking the preset time length as a period.
4. The method of claim 1, wherein the triggering a link switching operation comprises:
switching the storage link to: accessing a link of the target hard disk through a controller other than the target controller.
5. The method according to any one of claims 1 to 4, wherein after determining that the storage link is a failed link, the method further comprises:
and generating alarm information of the storage link as a fault link.
6. The method according to claim 5, wherein the generating the alarm information that the storage link is a failed link includes:
determining failure cause information of the storage link;
and generating fault alarm information corresponding to the storage link by using the fault reason information.
7. An apparatus for failure detection of a storage link, comprising:
the physical state acquisition module is used for acquiring the link physical state of the storage link; the storage link is a link for accessing the target hard disk by the storage main system through the target controller;
the first judgment module is used for judging whether the link physical state is an offline state; if yes, triggering a judging module;
a second judging module, configured to, when the link physical state is not an offline state, judge whether the storage link is a failed link based on a link layer and an application layer; if yes, triggering a judging module;
the judging module is used for judging the storage link to be a fault link;
and the operation execution module is used for executing link switching operation on the failed link.
8. The fault detection device of claim 7, wherein the second determination module comprises:
a first judgment unit configured to judge whether a CRC error count corresponding to the memory link exceeds a predetermined threshold; if yes, triggering the judging module and the operation executing module;
the second judgment module is used for judging whether the hard disk login state corresponding to the storage link is login failure or not when the CRC error count does not exceed a preset threshold; and if the login state of the hard disk is login failure, triggering the judging module and the operation executing module.
9. An electronic device, comprising:
a memory for storing a computer program;
processor for implementing the steps of the method of failure detection of a storage link according to any one of claims 1 to 6 when executing said computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method of fault detection of a storage link according to any one of claims 1 to 6.
CN202010746811.1A 2020-07-29 2020-07-29 Fault detection method, device, equipment and storage medium of storage link Withdrawn CN111858122A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010746811.1A CN111858122A (en) 2020-07-29 2020-07-29 Fault detection method, device, equipment and storage medium of storage link

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010746811.1A CN111858122A (en) 2020-07-29 2020-07-29 Fault detection method, device, equipment and storage medium of storage link

Publications (1)

Publication Number Publication Date
CN111858122A true CN111858122A (en) 2020-10-30

Family

ID=72945391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010746811.1A Withdrawn CN111858122A (en) 2020-07-29 2020-07-29 Fault detection method, device, equipment and storage medium of storage link

Country Status (1)

Country Link
CN (1) CN111858122A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113300953A (en) * 2021-07-27 2021-08-24 苏州浪潮智能科技有限公司 Management method, system and related device for multipath failover group
CN113672415A (en) * 2021-07-09 2021-11-19 济南浪潮数据技术有限公司 Disk fault processing method, device, equipment and storage medium
CN113868000A (en) * 2021-09-03 2021-12-31 苏州浪潮智能科技有限公司 Link fault repairing method, system and related components
CN115291814A (en) * 2022-10-09 2022-11-04 深圳市安信达存储技术有限公司 Embedded memory core data storage method, embedded memory chip and memory system
CN115333970A (en) * 2022-07-22 2022-11-11 苏州浪潮智能科技有限公司 Method and device for evaluating connection stability of equipment, computer equipment and storage medium
CN116909494A (en) * 2023-09-12 2023-10-20 苏州浪潮智能科技有限公司 Storage switching method and device of server and server system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101378577A (en) * 2008-09-27 2009-03-04 华为技术有限公司 Method and system for detecting link failure
CN107688547A (en) * 2017-08-23 2018-02-13 郑州云海信息技术有限公司 A kind of method and system of controller active-standby switch
CN109933478A (en) * 2017-12-19 2019-06-25 杭州华为数字技术有限公司 A kind of fault handling method and storage system of storage system
CN110798347A (en) * 2019-10-25 2020-02-14 北京浪潮数据技术有限公司 Service state detection method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101378577A (en) * 2008-09-27 2009-03-04 华为技术有限公司 Method and system for detecting link failure
CN107688547A (en) * 2017-08-23 2018-02-13 郑州云海信息技术有限公司 A kind of method and system of controller active-standby switch
CN109933478A (en) * 2017-12-19 2019-06-25 杭州华为数字技术有限公司 A kind of fault handling method and storage system of storage system
CN110798347A (en) * 2019-10-25 2020-02-14 北京浪潮数据技术有限公司 Service state detection method, device, equipment and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672415A (en) * 2021-07-09 2021-11-19 济南浪潮数据技术有限公司 Disk fault processing method, device, equipment and storage medium
CN113300953A (en) * 2021-07-27 2021-08-24 苏州浪潮智能科技有限公司 Management method, system and related device for multipath failover group
WO2023005037A1 (en) * 2021-07-27 2023-02-02 苏州浪潮智能科技有限公司 Multi-path failover group management method, system and related device
CN113868000A (en) * 2021-09-03 2021-12-31 苏州浪潮智能科技有限公司 Link fault repairing method, system and related components
CN113868000B (en) * 2021-09-03 2023-07-18 苏州浪潮智能科技有限公司 Link fault repairing method, system and related components
CN115333970A (en) * 2022-07-22 2022-11-11 苏州浪潮智能科技有限公司 Method and device for evaluating connection stability of equipment, computer equipment and storage medium
CN115333970B (en) * 2022-07-22 2023-08-11 苏州浪潮智能科技有限公司 Device connection stability evaluation method and device, computer device and storage medium
CN115291814A (en) * 2022-10-09 2022-11-04 深圳市安信达存储技术有限公司 Embedded memory core data storage method, embedded memory chip and memory system
CN116909494A (en) * 2023-09-12 2023-10-20 苏州浪潮智能科技有限公司 Storage switching method and device of server and server system
CN116909494B (en) * 2023-09-12 2024-01-26 苏州浪潮智能科技有限公司 Storage switching method and device of server and server system

Similar Documents

Publication Publication Date Title
CN111858122A (en) Fault detection method, device, equipment and storage medium of storage link
CN109558282B (en) PCIE link detection method, system, electronic equipment and storage medium
CN112286709B (en) Diagnosis method, diagnosis device and diagnosis equipment for server hardware faults
CN108872762B (en) Electronic equipment leakage detection method and device, electronic equipment and storage medium
CN110247725B (en) Line fault troubleshooting method and device for OTN (optical transport network) and terminal equipment
CN106610712B (en) Substrate management controller resetting system and method
CN112380089A (en) Data center monitoring and early warning method and system
CN115858311A (en) Operation and maintenance monitoring method and device, electronic equipment and readable storage medium
CN106155826B (en) For the method and system of mistake to be detected and handled in bus structures
CN113608908B (en) Server fault processing method, system, equipment and readable storage medium
CN113672306B (en) Server component self-checking abnormity recovery method, device, system and medium
CN113832663B (en) Control chip fault recording method and device and control chip fault reading method
CN113868058A (en) Peripheral component high-speed interconnection equipment fault detection method and device and server
CN111124818B (en) Monitoring method, device and equipment for Expander
CN114564334B (en) MRPC data processing method, system and related components
CN113822478A (en) Equipment detection method and device, electronic equipment and storage medium
CN110704219B (en) Hardware fault reporting method and device and computer storage medium
CN113590203A (en) Failure processing method and system for substrate management controller, storage medium and single chip microcomputer
CN115658373B (en) Server-based memory processing method and device, processor and electronic equipment
CN111309532A (en) PCIE equipment abnormity detection method, system, electronic equipment and storage medium
CN110798347A (en) Service state detection method, device, equipment and storage medium
CN110633176A (en) Working system switching method, cube star and switching device
CN112670952B (en) Control method and equipment for generator set and readable storage medium
CN113986142B (en) Disk fault monitoring method, device, computer equipment and storage medium
CN114356617B (en) Error injection testing method, device, system and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20201030