CN107438010A - Fault protecting method, first, second processor, network storage equipment and system - Google Patents

Fault protecting method, first, second processor, network storage equipment and system Download PDF

Info

Publication number
CN107438010A
CN107438010A CN201610356375.0A CN201610356375A CN107438010A CN 107438010 A CN107438010 A CN 107438010A CN 201610356375 A CN201610356375 A CN 201610356375A CN 107438010 A CN107438010 A CN 107438010A
Authority
CN
China
Prior art keywords
processor
information acquisition
message
fault information
request message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610356375.0A
Other languages
Chinese (zh)
Inventor
王京
杨长领
黄安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201610356375.0A priority Critical patent/CN107438010A/en
Publication of CN107438010A publication Critical patent/CN107438010A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a kind of fault protecting method, first, second processor, network storage equipment and system, processor receives the reset request message with its mutually redundant processor, if now processor is performing fault information acquisition, processor is not resetted then, and when processor is not carried out fault information acquisition, processor is resetted according to reset request message, so as to avoid when processor carries out fault information acquisition by undesirable interruption, fault message can completely be collected very much, the fault in-situ being effectively protected, provided convenience for consequent malfunction analysis etc..

Description

Fault protecting method, first, second processor, network storage equipment and system
Technical field
The present invention relates to computer memory technical field, more particularly to a kind of fault protecting method, first, Two processors, network storage equipment and system.
Background technology
Current network storage equipment is usually into using the benefit of this dual control framework by two processor groups It is that another processor can also continue to offer service, can be very good to disappear when a processor hangs dead The problem of causing thrashing except Single Point of Faliure.Dual control framework is primarily present both of which, Yi Zhongshi Active-Standby patterns (main-standby mode), another is Active-Active patterns (double host mode). Active-Standby refers at a time only have a processor externally to provide service, another processor Stand-by state is completely in, service is not provided externally, it is not resource utilization is high the drawbacks of.And Two processors of Active-Active can externally provide service simultaneously, can carry out load sharing, the utilization of resources Rate and traffic handing capacity are relatively higher.The network storage equipment that the present invention describes preferably uses Active-Active Pattern.
After system electrification, heartbeat mechanism can be established between two processors, by the timing of two processors to The mode that other side sends heartbeat message determines other side in normal work;When a processor breaks down wherein, The heartbeat mechanism between processor can be caused to go wrong, i.e., the processor of failure will not be sent out to another processor Heartbeat message is sent, and the processor of failure often carries out fault information acquisition, existing mechanism is another Processor does not receive the heartbeat message of processor transmission within a certain period of time, then, now to take reset Or the mode of lower electric opposite end processor, isolated fault processor, enable a system to externally provide continually and steadily Service.But this mode can cause to be reset or lower electric processor on information be difficult to collect and preserve, Further difficulty is brought for follow-up accident analysis.
The content of the invention
The present embodiments relate to a kind of fault protecting method, first, second processor, network storage equipment And system, solve in the prior art when decision processor breaks down, i.e., processor is resetted or It is electric under person, cause to be reset or lower electric processor on information the problem of being difficult to collect and preserving.
In order to solve the above-mentioned technical problem, the embodiments of the invention provide a kind of fault protecting method, including:
First processor receives the reset request message from second processor, the first processor and second Processor backups each other;
If performing fault information acquisition, the first processor is not resetted;
If not performing fault information acquisition, the first processor is entered according to the reset request message Row resets.
In addition, the embodiment of the present invention additionally provides a kind of fault protecting method, including:Second processor according to Fixed time interval receives the heartbeat message of first processor, and the second processor and first processor are mutual For backup;
After not receiving the heartbeat message that the first processor is sent in the first preset time, to described the One processor sends reset request message;The reset request message includes:If the first processor is being held Row fault information acquisition, then the first processor is not resetted;If the first processor is not being held Row fault information acquisition, then the first processor is resetted according to the reset request message.
In addition, the embodiment of the present invention additionally provides a kind of first processor, including:
Message reception module, for receiving the reset request message from second processor, first processing Device backups each other with second processor;
Fault information acquisition module, for performing fault information acquisition;
Reseting module, for when being not carried out fault information acquisition, according to the reset request message to described Processor is resetted.
In addition, the embodiment of the present invention additionally provides a kind of second processor, including:
Heartbeat receiving module, for receiving the heartbeat message from first processor according to fixed time interval, The second processor and first processor backup each other;
Message transmission module, the heartbeat for not receiving first processor transmission in the first preset time disappear After breath, reset request message is sent with its mutually redundant processor to described;The reset request message package Include:If the first processor is performing fault information acquisition, the first processor is not resetted; If the first processor is not performing fault information acquisition, according to the reset request message to described the One processor is resetted.
In addition, the embodiment of the present invention additionally provides a kind of network storage equipment, including first processor and second Processor, the first processor and second processor backup each other;
The first processor includes the first controller, reset circuit;First controller is used to receive institute The reset for stating second processor transmission requires message, and performs fault information acquisition;The reset circuit is used When fault information acquisition is being not carried out, the first processor is resetted according to reset request message;
The second processor includes second controller, for receiving described first according to fixed time interval The heartbeat message that controller is sent, and do not receive the heart that the first controller is sent in the first preset time After jumping message, the reset request message is sent to the first processor.
In addition, the embodiment of the present invention additionally provides a kind of network store system, including the write-in of at least one data Device and above-mentioned network storage equipment;The data transfer apparatus is used to network data writing network storage In equipment.
Beneficial effects of the present invention:
The embodiments of the invention provide a kind of fault protecting method, first, second processor, network storage to set Standby and system, processor receives the reset request message with its mutually redundant processor, if now processor Fault information acquisition is being performed, then processor is not being resetted, and is being not carried out fault message in processor and adopts During collection, processor is resetted according to reset request message, failure letter is carried out in processor so as to avoid By undesirable interruption during breath collection, fault message can be completely collected very much, the failure being effectively protected Scene, provided convenience for consequent malfunction analysis etc..
Brief description of the drawings
Fig. 1 is a kind of fault protecting method flow chart that one embodiment of the invention provides;
Fig. 2 is a kind of fault protecting method flow chart that one embodiment of the invention provides;
Fig. 3 is a kind of structural representation for first processor that one embodiment of the invention provides;
Fig. 4 is a kind of structural representation for second processor that one embodiment of the invention provides;
Fig. 5 is a kind of structural representation for processor that one embodiment of the invention provides;
Fig. 6 is a kind of structural representation for network storage equipment that one embodiment of the invention provides;
Fig. 7 is a kind of structural representation for network store system that one embodiment of the invention provides;
Fig. 8 is a kind of fault protecting method flow chart that one embodiment of the invention provides;
Fig. 9 is a kind of structural representation for network storage equipment that one embodiment of the invention provides.
Embodiment
Inventive conception is that:In double-control system, if one of processor goes wrong, it can stop Heartbeat message is sent to another processor, itself can then carry out fault information acquisition, and in fault information acquisition When, forbid resetting itself or under it is electrically operated, smoothly complete fault information acquisition, so as to for The offer facility such as follow-up accident analysis.
The specific implementation of the embodiment of the present invention is described further below in conjunction with the accompanying drawings.
First embodiment
A kind of fault protecting method is present embodiments provided, refer to Fig. 1, including:
S101, the reset request message from second processor is received, first processor and second processor are mutual For backup;
If S102, performing fault information acquisition, first processor is not resetted;
If S103, not performing fault information acquisition, first processor is carried out according to reset request message Reset.
In double-control system, backuped each other between two processors;Two processors each perform respective industry Business, under Active-Active patterns, although two institute's reason devices are all working, the content of its work is different, One of processor is the OWNER (main control end) of a task, and another processor would not be task OWNER, and can only be the OWNER of other tasks;In double-control system, in order to prevent issuing simultaneously Service message, it is proposed that OWNER concept, that OWNER is recorded is the id (identity) of processor, OWNER is an attribute in lun (Logical Unit Number, LUN);At business During reason, the id of processor corresponding to the IO (input/output) of main frame is specified, then, that is, determine By which processor it is handled accordingly, if IO has been sent to non-main control end, i.e., non-OWNER , there is SCSI (Small Computer System Interface, small computer system interface) Target at end (target) module forwards are handled to OWNER ends.
In order to ensure processor in double-control system can normal work, generally establish the heart between two processors Jump mechanism, i.e. first processor send heartbeat message according to fixed time interval to second processor;Heartbeat Mechanism is such a technology, and it sends a heartbeat message by timing, makes other side learn oneself also just Often work, to ensure the validity connected between the two;So-called heartbeat message is exactly first processor timing Simple information is sent to second processor, tell it I also in normal work.Code is exactly every certain Time --- be typically second level --- sends a fix information to second processor, after second processor receives Reply a fix information;Disappear it is of course also possible to which both of which actively sends fixed news to other side, i.e. heartbeat The transmitting-receiving of breath is divided into:Two processors send heartbeat message according to fixed time interval mutually, or wherein One processor sends heartbeat message according to fixed time interval, after another processor receives heartbeat message A heartbeat message is replied, the transmitting-receiving of both heartbeat message is all feasible.If one of processor The heartbeat message of another processor transmission is not received within a certain period of time, then is considered as and failure occurred.The heart It is because it sends out once as heartbeat every the set time to jump message and why be heartbeat message, is come with this Second processor is told, this first processor also lives.In fact this be in order to keep growing connection, as The content of this heartbeat message, it is what no special provision, but the typically bag of very little, or The only empty bag of one comprising packet header.In normally transmitting-receiving heartbeat message, know each other between two processors pair It is upright in normal work, same industry is handled with OWNER identity simultaneously so as to avoid a processor The situation of business.
But two processors in double-control system can not possibly be always maintained at normal work, if first processor It there is a problem, then, second processor can also continue to offer service, and here it is the benefit of double-control system; When first processor goes wrong, first processor can then stop sending heartbeat message to second processor, Second processor can not then receive the heartbeat message of first processor;After the regular hour, that is, pass through After crossing the first preset time, second processor can judge that failure occurs in first processor, then, at second Reason device can be attempted to reset first processor, i.e., send reset request message to first processor, and reset includes weight Open or lower electricity.It is worth noting that, second processor, which can not receive heartbeat message, following several situations:
First, there is failure in first processor;This is most common situation, and first processor loses just The ability often to work, it generally includes system crash, locked or deadlock, can also include the finger in kernel Situations such as pin is empty;Now, the kernel of first processor is had no idea normal work, will stop sending Heartbeat message is to second processor;Moreover, this when first processor can corresponding startup separator information adopt Collection, the troubleshooting for after provide facility;And fault information acquisition can include kdump (kernel unloading), The operation such as packaging system daily record is gathered, and these operations are relatively time-consuming.
Second, there is failure in heartbeat mechanism;In this case, mutually redundant two processors may Do not break down, then, due to losing heartbeat contact between two processors, two processors can It can will be considered that there was only itself processor in system, just occur two processors as at OWNER The situation of same business is managed, so as to cause the data in storage device the problem of inconsistent or loss occur. Heartbeat mechanism breaks down, and may be only that heartbeat link failure occurs, or first processor can not be sent out Send heartbeat message, or second processor can not receive heartbeat message, belong to heartbeat mechanism failure Situation.
So, when first processor breaks down, first processor can startup separator information gathering;With Kdump is embodiment, and kdump is a kind of Linux for being based on kexec (quickly rebooting functional part) (operating system) Kernel Panic catch mechanism, it is to be used for turning when system crash, deadlock or deadlock An instrument and the service of internal memory operational factor are stored up, is drawn an analogy, if system is once collapsed, then normal Kernel just have no idea work, one will be produced by kdump during this time and is used for capture (seizure) The kernel of current operational information, kernel can collect all running statuses in internal memory now and data message Into dump core (kernel unloading) file in order to analyze crash reason, this process usually requires Several minutes.
In this case, first processor is performing fault information acquisition, and second processor is pre- first If the heartbeat message of first processor transmission is all not received by the time, then, second processor is judged as There is failure in first processor, can not normal work, it is necessary to carry out reset operation.So, second processing Device will send reset request message to first processor, it is intended to reset first processor;First processor connects When receiving reset request message, if now first processor is carrying out fault information acquisition, not to first Processor carries out reset operation, ensures that the fault information acquisition of first processor is not disrupted, can collect Complete fault message.
Certainly, if first processor receives reset request message, fault information acquisition is not carried out, can Can be that fault information acquisition is completed, or first processor does not break down, but heartbeat mechanism occurs Failure, now can directly first processor is resetted, the first processor is put into as early as possible Work, improve operating efficiency.
First processor includes reseting register;Reset to first processor is resetted by reset circuit etc. Means are realized, and whether reseting register be then to be used to characterize to reset first processor. Optionally, when performing fault information acquisition, setting reseting register is disabled status, in reseting register When being disabled status, reset to first processor is also not carried out even if reset request message is received Operation;And when being not carried out fault information acquisition, then it is upstate that can set reseting register, can be used Reseting register under state can perform to first processor when receiving reset request message and reset behaviour Make, i.e., first processor is resetted according to the state of reseting register.Its specific method is, When one processor carries out fault information acquisition, driving interface is first called, disables reseting register, more specifically Say, that is, call BSP (Board Support Package, board suppot package) call back function to close reset and post Storage, that is, it is 0 to set reseting register value, makes reset link failure, prevents first processor to be reset; During performing fault information acquisition, holding reseting register is disabled status, even if receiving at second The reset request message that reason device is sent can not make reseting register reset first processor.First During computing device fault information acquisition, transmission reset request message that second processor can be multiple To first processor, and directly answered when first processor completes fault information acquisition according to reset request message Position first processor, only it can also be sent once when second processor judges that first processor breaks down, Reset request message is not retransmited afterwards, allows first processor voluntarily to be carried out after fault information acquisition is completed Reset.
During fault information acquisition is performed, be not each time fault information acquisition be all smoothly, Fault information acquisition may also go wrong in itself;So, optionally, when meeting preparatory condition, second Processor sends forced resetting message to first processor;After first processor receives forced resetting message, Reseting register is arranged to upstate, then, first processor entered according to the state of reseting register Row resets.For forced resetting message compared with reset request message, its difference is exactly that reset request message is attempt to First processor is resetted, can not be to the first processing when reseting register be disabled status without mandatory Device is resetted;No matter forced resetting message is what the state of now reseting register is, will directly reset Register is arranged to upstate, and then first processor is resetted;This is that first processor carries out event When hindering information gathering, it is impossible to a kind of remedial measure resetted to it, occur in itself in fault information acquisition During failure, the form of forced resetting message can also be sent by second processor, makes reseting register to the One processor is resetted, and avoids that fault information acquisition is undying to go on.
So, preparatory condition can include:Second processor sends reset request message and passed through to first processor The second preset time is crossed;Or second processor sends reset request message and have passed through to first processor Second preset time, and first processor is still performing fault information acquisition.Here the second preset time root According to theoretical calculation or measuring, the maximum time required for fault information acquisition is typically represented;If When second processor transmission reset request message have passed through the maximum needed for fault information acquisition to first processor Between, then can thinks that failure occurs in fault information acquisition, now can force to enter first processor Row resets operation, that is, sends forced resetting message to first processor.Specifically, the setting side of preparatory condition Formula can be one fault information acquisition timer of setting in second processor, and the time of this timer is not Less than the maximum time needed for fault information acquisition, i.e. the second preset time;During this period, second processor The state of first processor can be monitored in real time, including whether it is in execution fault information acquisition, or whether Resetted;If timer expiry, then just illustrate, there is exception in fault information acquisition, or It is that the reset of the first processor after the completion of fault information acquisition exception occurs, now, second processor is just Forced resetting information can be sent to first processor, make first processor force to be resetted, in maximum limit While the guarantee fault message of degree can be preserved completely, the excessive waste of time it also avoid, and shadow Normal data processing work is rung.
Second processor, can be with the state of monitoring and reset register when monitoring the state of first processor;If The state of reseting register is disabled status, illustrates that first processor is carrying out the gatherer process of fault message, Now reset request message can not cause first processor to reset;If the state of reseting register is upstate, Illustrate that first processor does not carry out fault information acquisition, including fault information acquisition has been completed, at second Reason device can directly transmit reset request message to first processor, reset first processor;If timer After time-out, the state of reseting register is still disabled status, then, second processor can sends strong Reset message processed directly forces the state of reseting register to be changed to upstate to first processor, and right First processor is resetted.
When first processor breaks down, fault information acquisition is carried out, then, can be received at second Manage the reset request message that device is sent;If now carry out fault information acquisition, then just not at first Reason device is resetted;If do not carry out fault information acquisition now, then, just according to reset request message to One processor is resetted, and is reset so as to avoid first processor during fault information acquisition, And cause fault information acquisition to interrupt, first processor can collect the fault message of completion as far as possible, Provided convenience for the accident analysis positioning of follow-up first processor.
First processor and second processor in the present embodiment, its substantial structure can be it is consistent, And the relation between both can exchange, in a word, between two first processors, which processor hair Failure has been given birth to, fault information acquisition will be carried out, and in fault information acquisition, he forbids another processor pair It is resetted, and after the completion of fault information acquisition, can be resetted.
Second embodiment
A kind of fault protecting method is present embodiments provided, refer to Fig. 2, including:
S201, second processor receive the heartbeat message from first processor according to fixed time interval, Second processor and first processor backup each other;
S202, after not receiving in the first preset time the heartbeat message of first processor transmission, to first Processor sends reset request message;Reset request message includes:If first processor is performing fault message Collection, then do not reset to first processor;If first processor is not performing fault information acquisition, First processor is resetted according to reset request message.
In double-control system, backuped each other between two processors;Two processors each perform respective industry Business, under Active-Active patterns, although two processors are all working, its content made is different, One of processor is the OWNER of a task, and another processor would not be the OWNER of task, And it can only be the OWNER of other tasks;In double-control system, in order to prevent while issuing service message, carry OWNER concept is gone out, that OWNER is recorded is the id (identity) of processor, and OWNER is An attribute in lun (Logical Unit Number, LUN);During business processing, Specify the id of processor corresponding to the IO (input/output) of main frame, then, that is, determine and handled by which Device is handled it accordingly, if IO has been sent to non-main control end, i.e., non-OWNER ends, there is SCSI (Small Computer System Interface, small computer system interface) Target (target) module OWNER ends are forwarded to be handled.
In order to ensure processor in double-control system can normal work, generally establish the heart between two processors Jump mechanism, i.e. second processor can receive the heartbeat sent from first processor according to fixed time interval Message;In normally transmitting-receiving heartbeat message, it is aware of each other between processor just in normal work, so that Avoid the situation that processor handles same business with OWNER identity simultaneously.
But two processors in double-control system can not possibly be always maintained at normal work, if at one of Reason device there is a problem, then, another processor can also continue to offer service, and here it is double-control system Benefit;Second processor passes through the first preset time in the heartbeat message for not receiving first processor transmission Afterwards, now, second processor judges that there occurs failure for first processor, then, second processor will Attempt to reset first processor, that is, send reset request message to first processor.
After second processor have sent reset request message to first processor, first processor ought to reset; But if now first processor is carrying out fault information acquisition, first processor can not be answered Position;First processor perform fault information acquisition during, second processor can multiple transmission answer Position request message to first processor, and first processor complete fault information acquisition when directly according to reset Request message resets first processor, can also be only when second processor judges that first processor breaks down Send once, i.e. not retransmiting reset requires message afterwards, allows first processor completing fault information acquisition Voluntarily resetted afterwards.
During fault information acquisition is performed, be not each time fault information acquisition be all smoothly, Fault information acquisition may also go wrong in itself;So, when meeting preparatory condition, second processor to First processor sends forced resetting message;After first processor receives forced resetting message, reset is posted Storage is arranged to upstate, and then, first processor is resetted according to the state of reseting register. For forced resetting message compared with reset request message, its difference is exactly that reset request message is attempt to reset first Processor, without mandatory, first processor can not be answered when reseting register is disabled status Position;No matter forced resetting message is what the state of now reseting register is, directly reseting register is set Upstate is set to, then first processor is resetted;This is that first processor progress fault message is adopted During collection, it is impossible to a kind of remedial measure resetted to it, when fault information acquisition breaks down in itself, The form of forced resetting message can also be sent by second processor, makes reseting register to first processor Resetted, avoid that fault information acquisition is undying to go on.
So, preparatory condition can include:Second processor sends reset request message and passed through to first processor The second preset time is crossed;Or second processor sends reset request message and have passed through to first processor Second preset time, and first processor is still performing fault information acquisition.Here the second preset time, Typically no less than first processor performs the maximum time of fault information acquisition;If sent in second processor Reset request message have passed through the maximum time needed for fault information acquisition to first processor, then can Think that failure occurs in fault information acquisition, can now force to carry out reset operation to first processor, i.e., Second processor sends forced resetting message to first processor.Specifically, the set-up mode of preparatory condition can To be that a timer is set in second processor, the time of this timer is not less than fault information acquisition Required maximum time, i.e. the second preset time;During this period, second processor can monitor first in real time Whether whether the state of processor, including first processor are performing fault information acquisition, or answered Position;If timer expiry, then just illustrate, exception, or failure letter occurs in fault information acquisition There is exception in the reset of first processor after the completion of breath collection, and now, second processor can is sent Forced resetting message makes first processor force to be resetted, ensured to greatest extent to first processor While the preservation that fault message can be done, avoid the excessive waste of time and have impact on normal number According to processing work.
Second processor, can be with the state of monitoring and reset register when monitoring the state of first processor;If The state of reseting register is disabled status, illustrates that first processor is carrying out the gatherer process of fault message, Now reset request message can not reset first processor;If the state of reseting register is upstate, Illustrate that first processor does not carry out fault information acquisition, including fault information acquisition has been completed, at second Reason device can directly transmit reset request message to first processor, reset first processor;If timer After time-out, the state of reseting register is still disabled status, then, second processor can sends strong Reset message processed directly forces the state of reseting register to be changed to upstate to first processor, and right First processor is resetted.
Second processor receives the heartbeat message of first processor transmission according to fixed time interval, is not connecing The heartbeat message of first processor transmission is received after the first preset time, sends and resets to first processor Request message, when meeting preparatory condition, forced resetting message is sent to first processor, so as to avoid First processor is reset during fault information acquisition, and causes fault information acquisition to interrupt, and first Processor can collect the fault message of completion as far as possible, determine for the accident analysis of follow-up first processor Provide convenience position.
Second processor and first processor in the present embodiment, its substantial structure can be it is consistent, And the relation between both can exchange, in a word, between two second processors, which second processing Device does not receive the heartbeat message that another second processor is sent within a certain period of time, and second processor is just to another One second processor sends reset request message.
3rd embodiment
A kind of first processor is present embodiments provided, refer to Fig. 3, including:
Message reception module 101, for receiving the reset request message from second processor, first processor 10 and second processor backup each other;
Fault information acquisition module 102, for performing fault information acquisition;
Reseting module 103, for when being not carried out fault information acquisition, according to reset request message to processor Resetted.
In double-control system, backuped each other between two processors;Two processors each perform respective industry Business, under Active-Active patterns, although two processors are all working, the content of its work is different, One of processor is the OWNER (main control end) of a task, and another processor would not be task OWNER, and can only be the OWNER of other tasks;In double-control system, in order to prevent issuing simultaneously Service message, it is proposed that OWNER concept, that OWNER is recorded is the id (identity) of processor, OWNER is an attribute in lun (Logical Unit Number, LUN);At business During reason, the id of processor corresponding to the IO (input/output) of main frame is specified, then, that is, determine By which processor it is handled accordingly, if IO has been sent to non-main control end, i.e., non-OWNER , there is SCSI (Small Computer System Interface, small computer system interface) Target at end (target) module forwards are handled to OWNER ends.
In order to ensure processor in double-control system can normal work, generally establish the heart between two processors Jump mechanism, i.e. first processor 10 also include heartbeat sending module 104, for according to fixed time interval Heartbeat message is sent to second processor;In normally transmitting-receiving heartbeat message, know each other between processor pair It is upright in normal work, same business is handled with OWNER identity simultaneously so as to avoid processor Situation, two processors can normally distribute business.
When first processor 10 breaks down, the meeting of fault information acquisition module 102 of first processor 10 Startup separator information gathering;Using kdump as embodiment, kdump is in a kind of Linux based on kexec Nuclear disruption catch mechanism, it is to be used for dump internal memory operational factor when system crash, deadlock or deadlock An instrument and service, draw an analogy, if system is once collapse, then normal kernel is not just done Method works, and will produce a kernel for being used for capture current operational informations by kdump during this time, All running statuses in internal memory now and data message can be collected into a dump core (kernel by kernel Unloading) in file in order to analyze crash reason, this process usually requires several minutes.
First processor 10 in the present embodiment also includes reseting register 105;To answering for first processor 10 Position is realized by the reset such as reset circuit means, and reseting register 105 is then whether to be used for sign First processor 10 can be resetted.Optionally, failure letter is gathered in fault information acquisition module 102 During breath, setting reseting register 105 is disabled status, and when being not carried out fault information acquisition, set multiple Bit register 105 is upstate.When reseting register 105 is disabled status, even if receiving Reset request message is also not carried out the reset operation to processor;And when being not carried out fault information acquisition, It is upstate that reseting register 105, which can then be set, and the reseting register 105 under upstate is receiving During to reset request message, computing device can be resetted and operated, i.e., according to the shape of reseting register 105 State resets to first processor 10.Specifically, carrying out failure letter in fault information acquisition module 102 During breath collection, driving interface is first called, disables reseting register 105, that is, calls BSP call back function to close Reseting register 105 is closed, it is 0 to set reseting register 105 to be worth, and makes reset link failure, prevents at first Reason device 10 is reset;, can be more with its mutually redundant processor during fault information acquisition is performed Secondary sends a request message to processor, and directly please according to reset when processor completes fault information acquisition Ask message to reset first processor 10, only can also judge that first processor 10 breaks down in second processor When send once, afterwards i.e. do not retransmit reset request message, allow first processor 10 complete fault message Voluntarily resetted after collection.
During fault information acquisition is performed, be not each time fault information acquisition be all smoothly, Fault information acquisition may also go wrong in itself;So, optionally, message reception module 101 is additionally operable to Receive the forced resetting message that second processor is sent;Accordingly, will after forced resetting message is received Reseting register 105 is arranged to upstate.In this manner it is possible to the state pair according to reseting register 105 First processor 10 is resetted.
When first processor breaks down, fault information acquisition is carried out, then, can be received at second Manage the reset request message that device is sent;If now carry out fault information acquisition, then just not at first Reason device is resetted;If do not carry out fault information acquisition now, then, just according to reset request message to One processor is resetted, and is reset so as to avoid first processor during fault information acquisition, And cause fault information acquisition to interrupt, first processor can collect the fault message of completion as far as possible, Provided convenience for the accident analysis positioning of subsequent processor.
First processor and second processor in the present embodiment, its substantial structure can be it is consistent, And the relation between both can exchange, in a word, between two processors, which processor there occurs Failure, fault information acquisition will be carried out, and in fault information acquisition, he forbid another processor to enter it Row resets, and after the completion of fault information acquisition, can be resetted.
Fourth embodiment
A kind of second processor is present embodiments provided, refer to Fig. 4, including:
Heartbeat receiving module 202, for receiving the heart from first processor 10 according to fixed time interval Message is jumped, first processor 10 and second processor 20 backup each other;
Message transmission module 201, for not receiving the transmission of first processor 10 in the first preset time After heartbeat message, reset request message is sent to first processor;Reset request message includes:If at first Manage device 10 and performing fault information acquisition, then first processor 10 is not resetted;If first processor 10 are not performing fault information acquisition, then first processor 10 are resetted according to reset request message.
In order to ensure processor in double-control system can normal work, generally establish the heart between two processors Jump mechanism, i.e. heartbeat receiving module 202 can be received from mutually redundant with it according to fixed time interval The heartbeat message that processor is sent;In normally transmitting-receiving heartbeat message, it is aware of each other just between processor In normal work, so as to avoid the situation that processor handles same business with OWNER identity simultaneously.
But two processors in double-control system can not possibly be always maintained at normal work, if at one of Reason device there is a problem, then, another processor can also continue to offer service, and here it is double-control system Benefit;The heartbeat receiving module 202 of second processor 20 is not receiving the transmission of first processor 10 Heartbeat message is after the first preset time, and now, second processor 20 judges, first processor 10 is sent out Give birth to failure, then, second processor 20 will be attempted to reset first processor 10, i.e. message transmission module 201 send reset request message to first processor 10.
After message transmission module 201 have sent reset request message to first processor 10, first processor 10 It ought to reset;But if now first processor 10 is carrying out fault information acquisition, can not be to first Processor 10 is resetted;During first processor 10 performs fault information acquisition, message is sent Module 201 can be multiple transmission reset request message to first processor 10, and mutually redundant with it Processor complete fault information acquisition when directly according to reset request message reset with its mutually redundant processor, Only it can also be sent once when processor judges and broken down with its mutually redundant processor, afterwards i.e. not Retransmit reset and require message, convey its mutually redundant processor after fault information acquisition is completed from traveling Row resets.
During fault information acquisition is performed, be not each time fault information acquisition be all smoothly, Fault information acquisition may also go wrong in itself;So, when meeting preparatory condition, message transmission module 201 can send forced resetting message to first processor 10;First processor 10 receives forced resetting and disappeared After breath, reseting register 105 is arranged to upstate, then, according to the state of reseting register 105 First processor 10 is resetted.For forced resetting message compared with reset request message, it is exactly multiple that it, which is distinguished, Position request message is attempt to reset first processor 10, without mandatory, in reseting register 105 to prohibit First processor 10 can not be resetted with during state;No matter forced resetting message is now reseting register What 105 state is, reseting register 105 directly is arranged into upstate, then to first processor 10 are resetted;When this is that first processor 10 carries out fault information acquisition, it is impossible to one resetted to it Kind remedial measure, when fault information acquisition breaks down in itself, can also be sent by second processor 20 The form of forced resetting message, reseting register 105 is resetted first processor 10, avoid event Information gathering is undying goes on for barrier.
Wherein, preparatory condition can include:Message transmission module 201 sends reset request message at first Reason device 10 have passed through the second preset time;Or message transmission module 201 sends reset request message to the One processor 10 have passed through the second preset time, and first processor 10 is still performing fault information acquisition. Here when the second preset time, typically no less than first processor 10 perform the maximum of fault information acquisition Between;If it have passed through fault message to first processor 10 in the transmission of second processor 20 reset request message to adopt Maximum time needed for collection, then can thinks that failure occurs in fault information acquisition, can now force Carry out reset operation to first processor 10, i.e., second processor 20 sends forced resetting message at first Manage device 10.Specifically, the set-up mode of preparatory condition can be one timing of setting in second processor 20 Device, the time of this timer is not less than the maximum time needed for fault information acquisition, i.e. the second preset time; During this period, second processor 20 can monitor the state of first processor 10, including the first processing in real time Whether whether device 10 is performing fault information acquisition, or resetted;If timer expiry, then With regard to explanation, there is the first processor after the completion of exception, or fault information acquisition in fault information acquisition There is exception in 10 reset, and now, the can of second processor 20 sends forced resetting message at first Device 10 is managed, makes first processor 10 force to be resetted, is ensureing that fault message can be complete to greatest extent Into preservation while, avoid the excessive waste of time and have impact on normal data processing work.
Second processor 20, can be with monitoring and reset register 105 when monitoring the state of first processor 10 State;If the state of reseting register 105 is disabled status, illustrate that first processor 10 is carrying out event Hinder the gatherer process of information, now reset request message can not reset first processor 10;If reset deposit The state of device 105 is upstate, illustrates first processor 10 without progress fault information acquisition, including therefore Barrier information gathering has been completed, and message transmission module 201 can directly transmit reset request message at first Device 10 is managed, resets first processor 10;If after timer expiry, the state of reseting register 105 is still It is disabled status, then, the can of message transmission module 201 sends forced resetting message to first processor 10, directly the state of reseting register 105 is forced to be changed to upstate, and carry out first processor 10 Reset.
Second processor receives the heartbeat message of first processor transmission according to fixed time interval, is not connecing The heartbeat message of first processor transmission is received after the first preset time, sends and resets to first processor Request message, when meeting preparatory condition, forced resetting message is sent to first processor, so as to avoid First processor is reset during fault information acquisition, and causes fault information acquisition to interrupt, and first Processor can collect the fault message of completion as far as possible, determine for the accident analysis of follow-up first processor Provide convenience position.
Second processor 20 and first processor 10 in the present embodiment, its substantial structure can be consistent , and the relation between both can exchange, in a word, between two processors, which processor exists The heartbeat message that another processor is sent is not received in certain time, processor is just sent to another processor Reset request message.
In addition, the present embodiment additionally provides a kind of network storage equipment, include first processing of above-described embodiment Device 10 and second processor 20.
5th embodiment
A kind of processor is present embodiments provided, refer to Fig. 5, including:
Message reception module 101, for receiving the reset request message of another processor 30;
Fault information acquisition module 102, for performing fault information acquisition;
Reseting module 103, for when being not carried out fault information acquisition, according to reset request message to processor 30 are resetted;
Heartbeat module 301, disappear for receiving and dispatching heartbeat between another processor 30 according to fixed time interval Breath;
Message transmission module 201, the heartbeat for not receiving another processor 30 when heartbeat module 301 disappear Breath sends reset request message after the first preset time, to another processor 30.
In double-control system, backuped each other between two processors 30;Two processors 30 each perform each From business, under Active-Active patterns, although two processors 30 are all working, its work Content is different, and one of processor 30 is the OWNER (main control end) of a task, another processor 30 would not be the OWNER of task, and can only be the OWNER of other tasks;In double-control system, In order to prevent while issuing service message, it is proposed that OWNER concept, OWNER records are processing The id (identity) of device, OWNER are in lun (Logical Unit Number, LUN) One attribute;During business processing, processor corresponding to the IO (input/output) of main frame is specified Id, then, that is, determine and it handled accordingly by which processor, if IO be sent to it is non- Main control end, i.e., non-OWNER ends, there are SCSI (Small Computer System Interface, small-sized meter Calculation machine system interface) Target (target) module forwards are handled to OWNER ends.
In order to ensure processor 30 in double-control system can normal work, generally between two processors 30 Establish heartbeat mechanism, i.e. processor 30 also includes heartbeat module 301, for according to fixed time interval to Another processor 30 sends heartbeat message;In normally transmitting-receiving heartbeat message, know each other between processor 30 Road other side is simultaneously same with OWNER identity processing so as to avoid processor 30 just in normal work The situation of business, two processors 30 can normally distribute business.
When processor 30 breaks down, the fault information acquisition module 102 of processor 30 can startup separator Information gathering;Using kdump as embodiment, kdump is that a kind of linux kernel collapse based on kexec is caught Mechanism is obtained, is for a work of dump internal memory operational factor when system crash, deadlock or deadlock Tool and service, draw an analogy, if system is once collapse, then and normal kernel is just had no idea work, A kernel for being used for capture current operational informations will be produced by kdump during this time, kernel can be by this When internal memory in all running statuses and data message be collected into a dump core file in order to point Crash reason is analysed, this process usually requires several minutes.
Processor 30 in the present embodiment also includes reseting register 105;Reset to processor 30 is to pass through The reset such as reset circuit means realize, and reseting register 105 is then whether be used to characterizing can be to place Reason device 30 is resetted.Optionally, when fault information acquisition module 102 gathers fault message, set multiple Bit register 105 is disabled status, and when being not carried out fault information acquisition, reseting register 105 is set For upstate.When reseting register 105 is disabled status, even if receiving reset request message Also it is not carried out the reset operation to processor 30;And when being not carried out fault information acquisition, then it can set Reseting register 105 is upstate, and the reseting register 105 under upstate is receiving reset request During message, processor 30 can be performed and reset operation, i.e., according to the state of reseting register 105 to processing Device 30 is resetted.Specifically, when fault information acquisition module 102 is carrying out fault information acquisition, first Driving interface is called, disables reseting register 105, that is, calls BSP call back function to close reseting register 105, it is 0 to set reseting register 105 to be worth, and makes reset link failure, prevents processor 30 to be reset; During performing fault information acquisition, another processor 30 repeatedly can send a request message to processor 30, and when processor 30 completes fault information acquisition directly according to reset request message resetting processor 30, It can also only send once when another decision processor 30 of processor 30 breaks down, no longer send out afterwards Reset request message is sent, allows processor 30 voluntarily to be resetted after fault information acquisition is completed.
During fault information acquisition is performed, be not each time fault information acquisition be all smoothly, Fault information acquisition may also go wrong in itself;So, optionally, message reception module 101 is additionally operable to Receive the forced resetting message that another processor 30 is sent;Accordingly, after forced resetting message is received, Reseting register 105 is arranged to upstate.In this manner it is possible to the state according to reseting register 105 Processor 30 is resetted.
In addition, likewise, what is broken down is also likely to be another processor 30;The heartbeat module of processor 30 301 are not receiving heartbeat message that another processor 30 sends after the first preset time, now, place Reason device 30 judges that there occurs failure for another processor 30, then, processor 30 will be attempted to reset another Processor 30, i.e. message transmission module 201 send reset request message to another processor 30.
After message transmission module 201 have sent reset request message to another processor 30, another processor 30 It ought to reset;But if now another processor 30 is carrying out fault information acquisition, can not be to another Processor 30 is resetted;During another processor 30 performs fault information acquisition, message is sent Module 201 can be multiple transmission reset request message to another processor 30, and mutually redundant with it Processor 30 is completed directly to be resetted and its mutually redundant processing according to reset request message during fault information acquisition Device 30, only it can also send one when processor 30 judges and broken down with its mutually redundant processor 30 Secondary, i.e. not retransmiting reset requires message afterwards, conveys its mutually redundant processor 30 and completes failure letter Voluntarily resetted after breath collection.
During fault information acquisition is performed, be not each time fault information acquisition be all smoothly, Fault information acquisition may also go wrong in itself;So, when meeting preparatory condition, message transmission module 201 can send forced resetting message to another processor 30;Another processor 30 receives forced resetting and disappeared After breath, reseting register 105 is arranged to upstate, then, according to the state of reseting register 105 Another processor 30 is resetted.For forced resetting message compared with reset request message, it is exactly multiple that it, which is distinguished, Position request message is attempt to reset another processor 30, without mandatory, in reseting register 105 to prohibit Another processor 30 can not be resetted with during state;No matter forced resetting message is now reseting register What 105 state is, reseting register 105 directly is arranged into upstate, then to another processor 30 are resetted;When this is that another processor 30 carries out fault information acquisition, it is impossible to one resetted to it Kind remedial measure, when fault information acquisition breaks down in itself, processor 30 can also be passed through and send pressure The form of reset message, reseting register 105 is resetted another processor 30, avoid failure letter Breath collection is undying to go on.
Wherein, preparatory condition can include:Message transmission module 201 sends reset request message to another place Reason device 30 have passed through preset time;Or message transmission module 201 sends reset request message to another place Reason device 30 have passed through preset time, and another processor 30 is still performing fault information acquisition.Here pre- If the time, typically no less than another processor 30 performs the maximum time of fault information acquisition;If handling The transmission reset request message of device 30 have passed through the maximum time needed for fault information acquisition to another processor 30, So can thinks that failure occurs in fault information acquisition, can now force to carry out another processor 30 Operation is resetted, i.e. processor 30 sends forced resetting message to another processor 30.Specifically, preparatory condition Set-up mode can be one timer of setting on processor 30, time of this timer not less than therefore Hinder the maximum time needed for information gathering, i.e. preset time;During this period, processor 30 can monitor in real time Whether the state of another processor 30, including another processor 30 are performing fault information acquisition, either It is no to be resetted;If timer expiry, then just illustrate, there is exception in fault information acquisition, or Person is that the reset of another processor 30 after the completion of fault information acquisition exception occurs, now, processor 30 Can sends forced resetting message to another processor 30, makes another processor 30 force to be resetted, To greatest extent ensure fault message can be done preservation while, avoid the time excessive waste and It has impact on normal data processing work.
Processor 30, can be with the shape of monitoring and reset register 105 when monitoring the state of another processor 30 State;If the state of reseting register 105 is disabled status, illustrate that another processor 30 is carrying out failure letter The gatherer process of breath, now reset request message can not reset another processor 30;If reseting register 105 State be upstate, illustrate another processor 30 without carrying out fault information acquisition, including fault message Collection has been completed, and message transmission module 201 can directly transmit reset request message to another processor 30, Reset another processor 30;If after timer expiry, the state of reseting register 105 is still disabling shape State, then, the can of message transmission module 201 sends forced resetting message to another processor 30, directly The state of reseting register 105 is forced to be changed to upstate, and another processor 30 is resetted.
Processor receives and dispatches heartbeat message according to fixed time interval and another processor, event occurs in processor During barrier, fault information acquisition is carried out, then, the reset request message sent from another processor can be received; If now carry out fault information acquisition, then with regard to not resetted to processor, so as to avoid processing Device is reset during fault information acquisition, and causes fault information acquisition to interrupt, and processor can use up The possible fault message for collecting completion, provided convenience for the accident analysis positioning of follow-up first processor.
In addition, when another processor 30 breaks down, i.e., processor 30 is not receiving another processor 30 The heartbeat message of transmission sends reset request message after the first preset time, to another processor 30, keeps away Exempt from another processor 30 to be reset during fault information acquisition, and cause fault information acquisition to interrupt, First processor 30 can collect the fault message of completion as far as possible.
The present embodiment additionally provides a kind of network storage equipment, refer to Fig. 6, including at least two above-mentioned places Device 30 is managed, and is backuped each other between the two processors 30.
In addition, additionally provide a kind of network store system, refer to Fig. 7, including network storage equipment 1 and At least one data transfer apparatus 2, data transfer apparatus 2 are used to network data writing network storage equipment 1 In;Data transfer apparatus 2 can be the various network equipments such as main frame, microcomputer, and these data transfer apparatus 2 All it is connected with network storage equipment.
Processor 30 in the present embodiment includes the various processors 30 that can realize double-control system, such as storage control Device etc. is numerous;Message reception module 101, fault information acquisition module 102 in the present embodiment, heartbeat Module 301, message transmission module 201 can be realized by the controller in processor 30, and message connects Receiving module 101, heartbeat module 301, message transmission module 201 etc. can be realized by same unit, failure Information acquisition module 102, can be by, in the system failure, one being produced by kdump in processor 30 The individual kernel for capture current operational informations, the kernel can be by all running statuses in internal memory now It is collected into data message in a dump core file in order to analyzing failure cause;And reseting module 103, It can be realized by hardware configurations such as reset circuits, reset circuit can have not for different processors 30 Same structure;Whether reseting register 105 is to be used to characterize to reset processor 30, by multiple Whether bit register 105 is enabled to realize.
Sixth embodiment
A kind of fault protecting method is present embodiments provided, by taking kdump as an example, refer to Fig. 8, including:
S801, storage control A (hereinafter referred to as A controls) and storage control B (hereinafter referred to as B controls) Electricity and heartbeat mechanism is established on the double-control system of composition;
S802, A control failure triggering kdump fault information acquisitions, start heartbeat between A controls and B controls and lose;
S803, A control call BSP call back function to close opposite end reseting register, that is, set reseting register It is worth for 0, prevents A controls to be reset;
S804, abnormal information is collected, judge whether the kdump of A controls performs completion, if performing completion, S805 is performed, otherwise performs S806;
S805, reseting register value is set to be 1, then A controls positive return;
S806, continue to collect abnormal information, it is 0 to keep reseting register value;
S807, B control detect that heartbeat is lost, and after the expired times for reaching setting, B controls are attempted to reset A controls;
S808, B control by EPLD (Erasable Programmable Logic Device, it is erasable to compile Collect logical device) signal detection, whether the opposite end reseting register value for judging A controls is 0, if 0, is held Row S809, otherwise go to S810 execution;
S809, the kdump of explanation A controls are not completed, and set a kdump to complete time-out timing in B controls Device, the time of this timer are greater than the kdump deadlines maximum in theory;
S810, the kdump of explanation A controls have been completed, if A controls are restarted, and are shaken hands with B controls Success, if then B controls judge that kdump timers are present, timer is deleted, whole flow process is completed;
If timer expiry in S811, B control, illustrate that kdump process exceptions, or kdump are completed Auto-reset function is abnormal afterwards, then A controls are restarted in B controls initiation.
7th embodiment
A kind of network storage equipment, including first processor 10 and second processor 20 are present embodiments provided, First processor 10 and second processor 20 backup each other;
First processor 10 includes the first controller 41, reset circuit 1031;First controller 41 is used to connect Receive the reset that second processor 20 is sent and require message, and perform fault information acquisition;Reset circuit 1031 For when being not carried out fault information acquisition, being resetted according to reset request message to first processor 10;
Second processor 20 includes second controller 42, for receiving the first control according to fixed time interval The heartbeat message that device 41 is sent, and do not receive the transmission of the first controller 41 in the first preset time After heartbeat message, reset request message is sent to first processor 10.
In order to ensure processor in double-control system can normal work, generally establish the heart between two processors Jump mechanism, i.e. the first controller 41 can be received according to fixed time interval and sent from second controller 42 Heartbeat message;In normally transmitting-receiving heartbeat message, it is aware of each other between processor just in normal work, So as to avoid the situation that processor handles same business with OWNER identity simultaneously.
But two processors in double-control system can not possibly be always maintained at normal work, if at one of Reason device there is a problem, then, another processor can also continue to offer service, and here it is double-control system Benefit;The second controller 42 of second processor 20 is not receiving the heartbeat of the first controller 41 transmission Message is after the first preset time, and now, second processor 20 judges, first processor 10 there occurs Failure, then, second processor 20 will be attempted to reset first processor 10, i.e. second controller 42 is sent out Reset request message is sent to the first controller 41.
First processor 10 in the present embodiment also includes reseting register 105;To answering for first processor 10 Position resets means to realize by the grade of reset circuit 1031, and reseting register 105 is then to be used to characterize Whether first processor 10 can be resetted.Optionally, fault message is gathered in the first controller 41 When, setting reseting register 105 is disabled status, and when being not carried out fault information acquisition, set and reset Register 105 is upstate.When reseting register 105 is disabled status, even if receiving multiple Position request message is also not carried out the reset operation to processor;And when being not carried out fault information acquisition, then It is upstate that reseting register 105, which can be set, and the reseting register 105 under upstate is receiving During reset request message, computing device can be resetted and operated, i.e., according to the state of reseting register 105 First processor 10 is resetted.Specifically, carrying out fault message in fault information acquisition module 102 During collection, driving interface is first called, disables reseting register 105, that is, calls BSP call back function to close Reseting register 105, it is 0 to set reseting register 105 to be worth, and makes reset link failure, prevents the first processing Device 10 is reset;, can be multiple with its mutually redundant processor during fault information acquisition is performed Send a request message to processor, and when processor completes fault information acquisition directly according to reset request Message resets first processor 10, only can also judge that event occurs in first processor 10 in second processor 20 Sent once during barrier, do not retransmit reset request message afterwards, allow first processor 10 completing failure letter Voluntarily resetted after breath collection.
During fault information acquisition is performed, be not each time fault information acquisition be all smoothly, Fault information acquisition may also go wrong in itself;So, when meeting preparatory condition, second controller 42 Forced resetting message can be sent to the first controller 41;First controller 41 then receives second processor 20 The forced resetting message of transmission;Accordingly, after forced resetting message is received, by reseting register 105 It is arranged to upstate.In this manner it is possible to first processor 10 is entered according to the state of reseting register 105 Row resets.
Wherein, preparatory condition can include:Second controller 42 sends reset request message to the first controller 41 have passed through the second preset time;Or second controller 42 sends reset request message to the first controller 41 have passed through the second preset time, and the first controller 41 is still performing fault information acquisition.
First processor receives and dispatches heartbeat message according to fixed time interval and second processor, goes out in processor During existing failure, fault information acquisition is carried out, then, the reset request sent from another processor can be received Message;If now carry out fault information acquisition, then with regard to not resetted to processor, so as to avoid Processor is reset during fault information acquisition, and causes fault information acquisition to interrupt, processor The fault message of completion can be collected as far as possible, provided for the accident analysis positioning of follow-up first processor Facility.
In addition, a kind of network store system is additionally provided, including network storage equipment and at least one data Writing station, data transfer apparatus are used to write network data in network storage equipment;Data transfer apparatus Can be the various network equipments such as main frame, microcomputer, and these data transfer apparatus all with network storage equipment phase Even.
Obviously, it will be understood by those skilled in the art that each module or each step of the invention described above can use it is logical Computing device realizes that they can be concentrated on single computing device, or be distributed in multiple meters Calculate on the network that device is formed, alternatively, they can be with the program code that computing device can perform come real It is existing, filled it is thus possible to be stored in storage medium (ROM/RAM, magnetic disc, CD) by calculating Put to perform, and in some cases, can be shown or described to be performed different from order herein Step, they are either fabricated to each integrated circuit modules respectively or by multiple modules in them or Step is fabricated to single integrated circuit module to realize.So the present invention is not restricted to any specific hardware Combined with software.
Above content is to combine specific embodiment further description made for the present invention, it is impossible to is recognized The specific implementation of the fixed present invention is confined to these explanations.For the ordinary skill of the technical field of the invention For personnel, without departing from the inventive concept of the premise, some simple deduction or replace can also be made, Protection scope of the present invention should be all considered as belonging to.

Claims (14)

1. a kind of fault protecting method, including:
First processor receives the reset request message from second processor, the first processor and second Processor backups each other;
If performing fault information acquisition, the first processor is not resetted;
If not performing fault information acquisition, the first processor is entered according to the reset request message Row resets.
2. fault protecting method as claimed in claim 1, it is characterised in that the first processor bag Include reseting register;
In the execution fault information acquisition, in addition to:It is disabled status to set the reseting register;
It is described do not performing fault information acquisition when, in addition to:It is available shape to set the reseting register State;
It is described reset is carried out to first processor to include:According to the state of the reseting register to described first Processor is resetted.
3. fault protecting method as claimed in claim 2, it is characterised in that methods described also includes: Receive the forced resetting message that the second processor is sent;, will after the forced resetting message is received The reseting register is arranged to upstate;Handled according to the state of the reseting register described first Device is resetted.
4. a kind of fault protecting method, including:
Second processor receives the heartbeat message of first processor according to fixed time interval, at described second Reason device and first processor backup each other;
After not receiving the heartbeat message that the first processor is sent in the first preset time, to described the One processor sends reset request message;The reset request message includes:If the first processor is being held Row fault information acquisition, then the first processor is not resetted;If the first processor is not being held Row fault information acquisition, then the first processor is resetted according to the reset request message.
5. fault protecting method as claimed in claim 4, it is characterised in that methods described also includes: When meeting preparatory condition, forced resetting message is sent to the first processor.
6. fault protecting method as claimed in claim 5, it is characterised in that the preparatory condition includes: Reset request message, which is sent, to the first processor have passed through the second preset time;
Or:Reset request message, which is sent, to the first processor have passed through the second preset time, and described the One processor is still performing fault information acquisition.
A kind of 7. first processor, it is characterised in that including:
Message reception module, for receiving the reset request message from second processor, first processing Device backups each other with second processor;
Fault information acquisition module, for performing fault information acquisition;
Reseting module, for when being not carried out fault information acquisition, according to the reset request message to described Processor is resetted.
A kind of 8. second processor, it is characterised in that including:
Heartbeat receiving module, for receiving the heartbeat message from first processor according to fixed time interval, The second processor and first processor backup each other;
Message transmission module, the heartbeat for not receiving first processor transmission in the first preset time disappear After breath, reset request message is sent to the first processor;The reset request message includes:It is if described First processor is performing fault information acquisition, then the first processor is not resetted;If described One processor is not performing fault information acquisition, then according to the reset request message to the first processor Resetted.
9. a kind of network storage equipment, it is characterised in that including first processor and second processor, institute State first processor and second processor backups each other;
The first processor includes the first controller, reset circuit;First controller is used to receive institute The reset for stating second processor transmission requires message, and performs fault information acquisition;The reset circuit is used When fault information acquisition is being not carried out, the first processor is resetted according to reset request message;
The second processor includes second controller, for receiving described first according to fixed time interval The heartbeat message that controller is sent, and do not receive the heart that the first controller is sent in the first preset time After jumping message, the reset request message is sent to the first processor.
10. network storage equipment as claimed in claim 9, it is characterised in that the first processor is also Including reseting register;When first controller performs fault information acquisition, described reset is set to deposit Device is disabled status;
When first controller is not carried out fault information acquisition, it is available shape to set the reseting register State;
The reset circuit is additionally operable to answer the first processor according to the state of the reseting register Position.
11. network storage equipment as claimed in claim 10, it is characterised in that first controller It is additionally operable to receive the forced resetting message that second controller is sent;After the forced resetting message is received, The reseting register is arranged to upstate.
12. the network storage equipment as described in claim any one of 9-11, it is characterised in that described Two controllers are additionally operable to:When meeting preparatory condition, forced resetting message is sent to first controller.
13. network storage equipment as claimed in claim 12, it is characterised in that the preparatory condition bag Include:Reset request message, which is sent, to another processor have passed through the second preset time;
Or:Reset request message, which is sent, to another processor have passed through the second preset time, and another place Manage device and still perform fault information acquisition.
14. a kind of network store system, it is characterised in that including at least one data transfer apparatus and as weighed Profit requires the network storage equipment described in any one of 9-13;The data transfer apparatus is used to write network data Enter in the network storage equipment.
CN201610356375.0A 2016-05-25 2016-05-25 Fault protecting method, first, second processor, network storage equipment and system Pending CN107438010A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610356375.0A CN107438010A (en) 2016-05-25 2016-05-25 Fault protecting method, first, second processor, network storage equipment and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610356375.0A CN107438010A (en) 2016-05-25 2016-05-25 Fault protecting method, first, second processor, network storage equipment and system

Publications (1)

Publication Number Publication Date
CN107438010A true CN107438010A (en) 2017-12-05

Family

ID=60454368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610356375.0A Pending CN107438010A (en) 2016-05-25 2016-05-25 Fault protecting method, first, second processor, network storage equipment and system

Country Status (1)

Country Link
CN (1) CN107438010A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109334590A (en) * 2018-08-31 2019-02-15 百度在线网络技术(北京)有限公司 Pilotless automobile chassis control method, apparatus, equipment and storage medium
CN110198065A (en) * 2019-06-21 2019-09-03 深圳市小兔充充科技有限公司 The detection circuit of charging station and the detection device of charging station

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6438709B2 (en) * 1997-09-18 2002-08-20 Intel Corporation Method for recovering from computer system lockup condition
CN101025700A (en) * 2006-02-21 2007-08-29 中兴通讯股份有限公司 Abnormal reset system protection method and reset protection system
CN101149636A (en) * 2007-10-23 2008-03-26 华为技术有限公司 Repositioning system and method
CN101207408A (en) * 2006-12-22 2008-06-25 中兴通讯股份有限公司 Apparatus and method of synthesis fault detection for main-spare taking turns
CN101488881A (en) * 2008-01-17 2009-07-22 鼎桥通信技术有限公司 A fault processing method
CN203643815U (en) * 2013-12-12 2014-06-11 东风汽车公司 Vehicle controller based on safety function
CN104506364A (en) * 2014-12-29 2015-04-08 迈普通信技术股份有限公司 Master-slave switching method, main control card and network equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6438709B2 (en) * 1997-09-18 2002-08-20 Intel Corporation Method for recovering from computer system lockup condition
CN101025700A (en) * 2006-02-21 2007-08-29 中兴通讯股份有限公司 Abnormal reset system protection method and reset protection system
CN101207408A (en) * 2006-12-22 2008-06-25 中兴通讯股份有限公司 Apparatus and method of synthesis fault detection for main-spare taking turns
CN101149636A (en) * 2007-10-23 2008-03-26 华为技术有限公司 Repositioning system and method
CN101488881A (en) * 2008-01-17 2009-07-22 鼎桥通信技术有限公司 A fault processing method
CN203643815U (en) * 2013-12-12 2014-06-11 东风汽车公司 Vehicle controller based on safety function
CN104506364A (en) * 2014-12-29 2015-04-08 迈普通信技术股份有限公司 Master-slave switching method, main control card and network equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109334590A (en) * 2018-08-31 2019-02-15 百度在线网络技术(北京)有限公司 Pilotless automobile chassis control method, apparatus, equipment and storage medium
CN109334590B (en) * 2018-08-31 2020-05-12 百度在线网络技术(北京)有限公司 Unmanned vehicle chassis control method, device, equipment and storage medium
CN110198065A (en) * 2019-06-21 2019-09-03 深圳市小兔充充科技有限公司 The detection circuit of charging station and the detection device of charging station

Similar Documents

Publication Publication Date Title
CN106789306B (en) Method and system for detecting, collecting and recovering software fault of communication equipment
EP2733611B1 (en) Internal fault handling method, device and system for virtual machine
TWI229796B (en) Method and system to implement a system event log for system manageability
JP6333410B2 (en) Fault processing method, related apparatus, and computer
CN102231681B (en) High availability cluster computer system and fault treatment method thereof
JP2001350651A (en) Method for isolating failure state
CN110581852A (en) Efficient mimicry defense system and method
CN101799776A (en) Fault processing method of multi-core processor, multi-core processor and communication device
CN109308252A (en) A kind of fault location processing method and processing device
CN105959235B (en) Distributed data processing system and method
CN103298013B (en) A kind of method and device carrying out business recovery
CN102364448A (en) Fault-tolerant method for computer fault management system
US20140122421A1 (en) Information processing apparatus, information processing method and computer-readable storage medium
CN110351149A (en) A kind of method and device for safeguarding network data Forwarding plane
CN106155826B (en) For the method and system of mistake to be detected and handled in bus structures
CN100538647C (en) The processing method for service stream of polycaryon processor and polycaryon processor
Araujo et al. Dependability evaluation of a mhealth system using a mobile cloud infrastructure
CN106559288B (en) A kind of quick fault testing method based on icmp packet
CN106681858A (en) Virtual machine data disaster tolerance method and management device
CN107438010A (en) Fault protecting method, first, second processor, network storage equipment and system
CN107291589B (en) Method for improving system reliability in robot operating system
CN104283718A (en) Network device and hardware fault diagnosis method used for network device
CN106502811A (en) A kind of 1553B bus communications fault handling method
CN106502944A (en) The heartbeat detecting method of computer, PCIE device and PCIE device
CN113407374A (en) Fault processing method and device, fault processing equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171205

RJ01 Rejection of invention patent application after publication