CN107438010A - Fault protecting method, first, second processor, network storage equipment and system - Google Patents
Fault protecting method, first, second processor, network storage equipment and system Download PDFInfo
- Publication number
- CN107438010A CN107438010A CN201610356375.0A CN201610356375A CN107438010A CN 107438010 A CN107438010 A CN 107438010A CN 201610356375 A CN201610356375 A CN 201610356375A CN 107438010 A CN107438010 A CN 107438010A
- Authority
- CN
- China
- Prior art keywords
- processor
- information acquisition
- message
- fault information
- request message
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Hardware Redundancy (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention provides a kind of fault protecting method, first, second processor, network storage equipment and system, processor receives the reset request message with its mutually redundant processor, if now processor is performing fault information acquisition, processor is not resetted then, and when processor is not carried out fault information acquisition, processor is resetted according to reset request message, so as to avoid when processor carries out fault information acquisition by undesirable interruption, fault message can completely be collected very much, the fault in-situ being effectively protected, provided convenience for consequent malfunction analysis etc..
Description
Technical field
The present invention relates to computer memory technical field, more particularly to a kind of fault protecting method, first,
Two processors, network storage equipment and system.
Background technology
Current network storage equipment is usually into using the benefit of this dual control framework by two processor groups
It is that another processor can also continue to offer service, can be very good to disappear when a processor hangs dead
The problem of causing thrashing except Single Point of Faliure.Dual control framework is primarily present both of which, Yi Zhongshi
Active-Standby patterns (main-standby mode), another is Active-Active patterns (double host mode).
Active-Standby refers at a time only have a processor externally to provide service, another processor
Stand-by state is completely in, service is not provided externally, it is not resource utilization is high the drawbacks of.And
Two processors of Active-Active can externally provide service simultaneously, can carry out load sharing, the utilization of resources
Rate and traffic handing capacity are relatively higher.The network storage equipment that the present invention describes preferably uses Active-Active
Pattern.
After system electrification, heartbeat mechanism can be established between two processors, by the timing of two processors to
The mode that other side sends heartbeat message determines other side in normal work;When a processor breaks down wherein,
The heartbeat mechanism between processor can be caused to go wrong, i.e., the processor of failure will not be sent out to another processor
Heartbeat message is sent, and the processor of failure often carries out fault information acquisition, existing mechanism is another
Processor does not receive the heartbeat message of processor transmission within a certain period of time, then, now to take reset
Or the mode of lower electric opposite end processor, isolated fault processor, enable a system to externally provide continually and steadily
Service.But this mode can cause to be reset or lower electric processor on information be difficult to collect and preserve,
Further difficulty is brought for follow-up accident analysis.
The content of the invention
The present embodiments relate to a kind of fault protecting method, first, second processor, network storage equipment
And system, solve in the prior art when decision processor breaks down, i.e., processor is resetted or
It is electric under person, cause to be reset or lower electric processor on information the problem of being difficult to collect and preserving.
In order to solve the above-mentioned technical problem, the embodiments of the invention provide a kind of fault protecting method, including:
First processor receives the reset request message from second processor, the first processor and second
Processor backups each other;
If performing fault information acquisition, the first processor is not resetted;
If not performing fault information acquisition, the first processor is entered according to the reset request message
Row resets.
In addition, the embodiment of the present invention additionally provides a kind of fault protecting method, including:Second processor according to
Fixed time interval receives the heartbeat message of first processor, and the second processor and first processor are mutual
For backup;
After not receiving the heartbeat message that the first processor is sent in the first preset time, to described the
One processor sends reset request message;The reset request message includes:If the first processor is being held
Row fault information acquisition, then the first processor is not resetted;If the first processor is not being held
Row fault information acquisition, then the first processor is resetted according to the reset request message.
In addition, the embodiment of the present invention additionally provides a kind of first processor, including:
Message reception module, for receiving the reset request message from second processor, first processing
Device backups each other with second processor;
Fault information acquisition module, for performing fault information acquisition;
Reseting module, for when being not carried out fault information acquisition, according to the reset request message to described
Processor is resetted.
In addition, the embodiment of the present invention additionally provides a kind of second processor, including:
Heartbeat receiving module, for receiving the heartbeat message from first processor according to fixed time interval,
The second processor and first processor backup each other;
Message transmission module, the heartbeat for not receiving first processor transmission in the first preset time disappear
After breath, reset request message is sent with its mutually redundant processor to described;The reset request message package
Include:If the first processor is performing fault information acquisition, the first processor is not resetted;
If the first processor is not performing fault information acquisition, according to the reset request message to described the
One processor is resetted.
In addition, the embodiment of the present invention additionally provides a kind of network storage equipment, including first processor and second
Processor, the first processor and second processor backup each other;
The first processor includes the first controller, reset circuit;First controller is used to receive institute
The reset for stating second processor transmission requires message, and performs fault information acquisition;The reset circuit is used
When fault information acquisition is being not carried out, the first processor is resetted according to reset request message;
The second processor includes second controller, for receiving described first according to fixed time interval
The heartbeat message that controller is sent, and do not receive the heart that the first controller is sent in the first preset time
After jumping message, the reset request message is sent to the first processor.
In addition, the embodiment of the present invention additionally provides a kind of network store system, including the write-in of at least one data
Device and above-mentioned network storage equipment;The data transfer apparatus is used to network data writing network storage
In equipment.
Beneficial effects of the present invention:
The embodiments of the invention provide a kind of fault protecting method, first, second processor, network storage to set
Standby and system, processor receives the reset request message with its mutually redundant processor, if now processor
Fault information acquisition is being performed, then processor is not being resetted, and is being not carried out fault message in processor and adopts
During collection, processor is resetted according to reset request message, failure letter is carried out in processor so as to avoid
By undesirable interruption during breath collection, fault message can be completely collected very much, the failure being effectively protected
Scene, provided convenience for consequent malfunction analysis etc..
Brief description of the drawings
Fig. 1 is a kind of fault protecting method flow chart that one embodiment of the invention provides;
Fig. 2 is a kind of fault protecting method flow chart that one embodiment of the invention provides;
Fig. 3 is a kind of structural representation for first processor that one embodiment of the invention provides;
Fig. 4 is a kind of structural representation for second processor that one embodiment of the invention provides;
Fig. 5 is a kind of structural representation for processor that one embodiment of the invention provides;
Fig. 6 is a kind of structural representation for network storage equipment that one embodiment of the invention provides;
Fig. 7 is a kind of structural representation for network store system that one embodiment of the invention provides;
Fig. 8 is a kind of fault protecting method flow chart that one embodiment of the invention provides;
Fig. 9 is a kind of structural representation for network storage equipment that one embodiment of the invention provides.
Embodiment
Inventive conception is that:In double-control system, if one of processor goes wrong, it can stop
Heartbeat message is sent to another processor, itself can then carry out fault information acquisition, and in fault information acquisition
When, forbid resetting itself or under it is electrically operated, smoothly complete fault information acquisition, so as to for
The offer facility such as follow-up accident analysis.
The specific implementation of the embodiment of the present invention is described further below in conjunction with the accompanying drawings.
First embodiment
A kind of fault protecting method is present embodiments provided, refer to Fig. 1, including:
S101, the reset request message from second processor is received, first processor and second processor are mutual
For backup;
If S102, performing fault information acquisition, first processor is not resetted;
If S103, not performing fault information acquisition, first processor is carried out according to reset request message
Reset.
In double-control system, backuped each other between two processors;Two processors each perform respective industry
Business, under Active-Active patterns, although two institute's reason devices are all working, the content of its work is different,
One of processor is the OWNER (main control end) of a task, and another processor would not be task
OWNER, and can only be the OWNER of other tasks;In double-control system, in order to prevent issuing simultaneously
Service message, it is proposed that OWNER concept, that OWNER is recorded is the id (identity) of processor,
OWNER is an attribute in lun (Logical Unit Number, LUN);At business
During reason, the id of processor corresponding to the IO (input/output) of main frame is specified, then, that is, determine
By which processor it is handled accordingly, if IO has been sent to non-main control end, i.e., non-OWNER
, there is SCSI (Small Computer System Interface, small computer system interface) Target at end
(target) module forwards are handled to OWNER ends.
In order to ensure processor in double-control system can normal work, generally establish the heart between two processors
Jump mechanism, i.e. first processor send heartbeat message according to fixed time interval to second processor;Heartbeat
Mechanism is such a technology, and it sends a heartbeat message by timing, makes other side learn oneself also just
Often work, to ensure the validity connected between the two;So-called heartbeat message is exactly first processor timing
Simple information is sent to second processor, tell it I also in normal work.Code is exactly every certain
Time --- be typically second level --- sends a fix information to second processor, after second processor receives
Reply a fix information;Disappear it is of course also possible to which both of which actively sends fixed news to other side, i.e. heartbeat
The transmitting-receiving of breath is divided into:Two processors send heartbeat message according to fixed time interval mutually, or wherein
One processor sends heartbeat message according to fixed time interval, after another processor receives heartbeat message
A heartbeat message is replied, the transmitting-receiving of both heartbeat message is all feasible.If one of processor
The heartbeat message of another processor transmission is not received within a certain period of time, then is considered as and failure occurred.The heart
It is because it sends out once as heartbeat every the set time to jump message and why be heartbeat message, is come with this
Second processor is told, this first processor also lives.In fact this be in order to keep growing connection, as
The content of this heartbeat message, it is what no special provision, but the typically bag of very little, or
The only empty bag of one comprising packet header.In normally transmitting-receiving heartbeat message, know each other between two processors pair
It is upright in normal work, same industry is handled with OWNER identity simultaneously so as to avoid a processor
The situation of business.
But two processors in double-control system can not possibly be always maintained at normal work, if first processor
It there is a problem, then, second processor can also continue to offer service, and here it is the benefit of double-control system;
When first processor goes wrong, first processor can then stop sending heartbeat message to second processor,
Second processor can not then receive the heartbeat message of first processor;After the regular hour, that is, pass through
After crossing the first preset time, second processor can judge that failure occurs in first processor, then, at second
Reason device can be attempted to reset first processor, i.e., send reset request message to first processor, and reset includes weight
Open or lower electricity.It is worth noting that, second processor, which can not receive heartbeat message, following several situations:
First, there is failure in first processor;This is most common situation, and first processor loses just
The ability often to work, it generally includes system crash, locked or deadlock, can also include the finger in kernel
Situations such as pin is empty;Now, the kernel of first processor is had no idea normal work, will stop sending
Heartbeat message is to second processor;Moreover, this when first processor can corresponding startup separator information adopt
Collection, the troubleshooting for after provide facility;And fault information acquisition can include kdump (kernel unloading),
The operation such as packaging system daily record is gathered, and these operations are relatively time-consuming.
Second, there is failure in heartbeat mechanism;In this case, mutually redundant two processors may
Do not break down, then, due to losing heartbeat contact between two processors, two processors can
It can will be considered that there was only itself processor in system, just occur two processors as at OWNER
The situation of same business is managed, so as to cause the data in storage device the problem of inconsistent or loss occur.
Heartbeat mechanism breaks down, and may be only that heartbeat link failure occurs, or first processor can not be sent out
Send heartbeat message, or second processor can not receive heartbeat message, belong to heartbeat mechanism failure
Situation.
So, when first processor breaks down, first processor can startup separator information gathering;With
Kdump is embodiment, and kdump is a kind of Linux for being based on kexec (quickly rebooting functional part)
(operating system) Kernel Panic catch mechanism, it is to be used for turning when system crash, deadlock or deadlock
An instrument and the service of internal memory operational factor are stored up, is drawn an analogy, if system is once collapsed, then normal
Kernel just have no idea work, one will be produced by kdump during this time and is used for capture (seizure)
The kernel of current operational information, kernel can collect all running statuses in internal memory now and data message
Into dump core (kernel unloading) file in order to analyze crash reason, this process usually requires
Several minutes.
In this case, first processor is performing fault information acquisition, and second processor is pre- first
If the heartbeat message of first processor transmission is all not received by the time, then, second processor is judged as
There is failure in first processor, can not normal work, it is necessary to carry out reset operation.So, second processing
Device will send reset request message to first processor, it is intended to reset first processor;First processor connects
When receiving reset request message, if now first processor is carrying out fault information acquisition, not to first
Processor carries out reset operation, ensures that the fault information acquisition of first processor is not disrupted, can collect
Complete fault message.
Certainly, if first processor receives reset request message, fault information acquisition is not carried out, can
Can be that fault information acquisition is completed, or first processor does not break down, but heartbeat mechanism occurs
Failure, now can directly first processor is resetted, the first processor is put into as early as possible
Work, improve operating efficiency.
First processor includes reseting register;Reset to first processor is resetted by reset circuit etc.
Means are realized, and whether reseting register be then to be used to characterize to reset first processor.
Optionally, when performing fault information acquisition, setting reseting register is disabled status, in reseting register
When being disabled status, reset to first processor is also not carried out even if reset request message is received
Operation;And when being not carried out fault information acquisition, then it is upstate that can set reseting register, can be used
Reseting register under state can perform to first processor when receiving reset request message and reset behaviour
Make, i.e., first processor is resetted according to the state of reseting register.Its specific method is,
When one processor carries out fault information acquisition, driving interface is first called, disables reseting register, more specifically
Say, that is, call BSP (Board Support Package, board suppot package) call back function to close reset and post
Storage, that is, it is 0 to set reseting register value, makes reset link failure, prevents first processor to be reset;
During performing fault information acquisition, holding reseting register is disabled status, even if receiving at second
The reset request message that reason device is sent can not make reseting register reset first processor.First
During computing device fault information acquisition, transmission reset request message that second processor can be multiple
To first processor, and directly answered when first processor completes fault information acquisition according to reset request message
Position first processor, only it can also be sent once when second processor judges that first processor breaks down,
Reset request message is not retransmited afterwards, allows first processor voluntarily to be carried out after fault information acquisition is completed
Reset.
During fault information acquisition is performed, be not each time fault information acquisition be all smoothly,
Fault information acquisition may also go wrong in itself;So, optionally, when meeting preparatory condition, second
Processor sends forced resetting message to first processor;After first processor receives forced resetting message,
Reseting register is arranged to upstate, then, first processor entered according to the state of reseting register
Row resets.For forced resetting message compared with reset request message, its difference is exactly that reset request message is attempt to
First processor is resetted, can not be to the first processing when reseting register be disabled status without mandatory
Device is resetted;No matter forced resetting message is what the state of now reseting register is, will directly reset
Register is arranged to upstate, and then first processor is resetted;This is that first processor carries out event
When hindering information gathering, it is impossible to a kind of remedial measure resetted to it, occur in itself in fault information acquisition
During failure, the form of forced resetting message can also be sent by second processor, makes reseting register to the
One processor is resetted, and avoids that fault information acquisition is undying to go on.
So, preparatory condition can include:Second processor sends reset request message and passed through to first processor
The second preset time is crossed;Or second processor sends reset request message and have passed through to first processor
Second preset time, and first processor is still performing fault information acquisition.Here the second preset time root
According to theoretical calculation or measuring, the maximum time required for fault information acquisition is typically represented;If
When second processor transmission reset request message have passed through the maximum needed for fault information acquisition to first processor
Between, then can thinks that failure occurs in fault information acquisition, now can force to enter first processor
Row resets operation, that is, sends forced resetting message to first processor.Specifically, the setting side of preparatory condition
Formula can be one fault information acquisition timer of setting in second processor, and the time of this timer is not
Less than the maximum time needed for fault information acquisition, i.e. the second preset time;During this period, second processor
The state of first processor can be monitored in real time, including whether it is in execution fault information acquisition, or whether
Resetted;If timer expiry, then just illustrate, there is exception in fault information acquisition, or
It is that the reset of the first processor after the completion of fault information acquisition exception occurs, now, second processor is just
Forced resetting information can be sent to first processor, make first processor force to be resetted, in maximum limit
While the guarantee fault message of degree can be preserved completely, the excessive waste of time it also avoid, and shadow
Normal data processing work is rung.
Second processor, can be with the state of monitoring and reset register when monitoring the state of first processor;If
The state of reseting register is disabled status, illustrates that first processor is carrying out the gatherer process of fault message,
Now reset request message can not cause first processor to reset;If the state of reseting register is upstate,
Illustrate that first processor does not carry out fault information acquisition, including fault information acquisition has been completed, at second
Reason device can directly transmit reset request message to first processor, reset first processor;If timer
After time-out, the state of reseting register is still disabled status, then, second processor can sends strong
Reset message processed directly forces the state of reseting register to be changed to upstate to first processor, and right
First processor is resetted.
When first processor breaks down, fault information acquisition is carried out, then, can be received at second
Manage the reset request message that device is sent;If now carry out fault information acquisition, then just not at first
Reason device is resetted;If do not carry out fault information acquisition now, then, just according to reset request message to
One processor is resetted, and is reset so as to avoid first processor during fault information acquisition,
And cause fault information acquisition to interrupt, first processor can collect the fault message of completion as far as possible,
Provided convenience for the accident analysis positioning of follow-up first processor.
First processor and second processor in the present embodiment, its substantial structure can be it is consistent,
And the relation between both can exchange, in a word, between two first processors, which processor hair
Failure has been given birth to, fault information acquisition will be carried out, and in fault information acquisition, he forbids another processor pair
It is resetted, and after the completion of fault information acquisition, can be resetted.
Second embodiment
A kind of fault protecting method is present embodiments provided, refer to Fig. 2, including:
S201, second processor receive the heartbeat message from first processor according to fixed time interval,
Second processor and first processor backup each other;
S202, after not receiving in the first preset time the heartbeat message of first processor transmission, to first
Processor sends reset request message;Reset request message includes:If first processor is performing fault message
Collection, then do not reset to first processor;If first processor is not performing fault information acquisition,
First processor is resetted according to reset request message.
In double-control system, backuped each other between two processors;Two processors each perform respective industry
Business, under Active-Active patterns, although two processors are all working, its content made is different,
One of processor is the OWNER of a task, and another processor would not be the OWNER of task,
And it can only be the OWNER of other tasks;In double-control system, in order to prevent while issuing service message, carry
OWNER concept is gone out, that OWNER is recorded is the id (identity) of processor, and OWNER is
An attribute in lun (Logical Unit Number, LUN);During business processing,
Specify the id of processor corresponding to the IO (input/output) of main frame, then, that is, determine and handled by which
Device is handled it accordingly, if IO has been sent to non-main control end, i.e., non-OWNER ends, there is SCSI
(Small Computer System Interface, small computer system interface) Target (target) module
OWNER ends are forwarded to be handled.
In order to ensure processor in double-control system can normal work, generally establish the heart between two processors
Jump mechanism, i.e. second processor can receive the heartbeat sent from first processor according to fixed time interval
Message;In normally transmitting-receiving heartbeat message, it is aware of each other between processor just in normal work, so that
Avoid the situation that processor handles same business with OWNER identity simultaneously.
But two processors in double-control system can not possibly be always maintained at normal work, if at one of
Reason device there is a problem, then, another processor can also continue to offer service, and here it is double-control system
Benefit;Second processor passes through the first preset time in the heartbeat message for not receiving first processor transmission
Afterwards, now, second processor judges that there occurs failure for first processor, then, second processor will
Attempt to reset first processor, that is, send reset request message to first processor.
After second processor have sent reset request message to first processor, first processor ought to reset;
But if now first processor is carrying out fault information acquisition, first processor can not be answered
Position;First processor perform fault information acquisition during, second processor can multiple transmission answer
Position request message to first processor, and first processor complete fault information acquisition when directly according to reset
Request message resets first processor, can also be only when second processor judges that first processor breaks down
Send once, i.e. not retransmiting reset requires message afterwards, allows first processor completing fault information acquisition
Voluntarily resetted afterwards.
During fault information acquisition is performed, be not each time fault information acquisition be all smoothly,
Fault information acquisition may also go wrong in itself;So, when meeting preparatory condition, second processor to
First processor sends forced resetting message;After first processor receives forced resetting message, reset is posted
Storage is arranged to upstate, and then, first processor is resetted according to the state of reseting register.
For forced resetting message compared with reset request message, its difference is exactly that reset request message is attempt to reset first
Processor, without mandatory, first processor can not be answered when reseting register is disabled status
Position;No matter forced resetting message is what the state of now reseting register is, directly reseting register is set
Upstate is set to, then first processor is resetted;This is that first processor progress fault message is adopted
During collection, it is impossible to a kind of remedial measure resetted to it, when fault information acquisition breaks down in itself,
The form of forced resetting message can also be sent by second processor, makes reseting register to first processor
Resetted, avoid that fault information acquisition is undying to go on.
So, preparatory condition can include:Second processor sends reset request message and passed through to first processor
The second preset time is crossed;Or second processor sends reset request message and have passed through to first processor
Second preset time, and first processor is still performing fault information acquisition.Here the second preset time,
Typically no less than first processor performs the maximum time of fault information acquisition;If sent in second processor
Reset request message have passed through the maximum time needed for fault information acquisition to first processor, then can
Think that failure occurs in fault information acquisition, can now force to carry out reset operation to first processor, i.e.,
Second processor sends forced resetting message to first processor.Specifically, the set-up mode of preparatory condition can
To be that a timer is set in second processor, the time of this timer is not less than fault information acquisition
Required maximum time, i.e. the second preset time;During this period, second processor can monitor first in real time
Whether whether the state of processor, including first processor are performing fault information acquisition, or answered
Position;If timer expiry, then just illustrate, exception, or failure letter occurs in fault information acquisition
There is exception in the reset of first processor after the completion of breath collection, and now, second processor can is sent
Forced resetting message makes first processor force to be resetted, ensured to greatest extent to first processor
While the preservation that fault message can be done, avoid the excessive waste of time and have impact on normal number
According to processing work.
Second processor, can be with the state of monitoring and reset register when monitoring the state of first processor;If
The state of reseting register is disabled status, illustrates that first processor is carrying out the gatherer process of fault message,
Now reset request message can not reset first processor;If the state of reseting register is upstate,
Illustrate that first processor does not carry out fault information acquisition, including fault information acquisition has been completed, at second
Reason device can directly transmit reset request message to first processor, reset first processor;If timer
After time-out, the state of reseting register is still disabled status, then, second processor can sends strong
Reset message processed directly forces the state of reseting register to be changed to upstate to first processor, and right
First processor is resetted.
Second processor receives the heartbeat message of first processor transmission according to fixed time interval, is not connecing
The heartbeat message of first processor transmission is received after the first preset time, sends and resets to first processor
Request message, when meeting preparatory condition, forced resetting message is sent to first processor, so as to avoid
First processor is reset during fault information acquisition, and causes fault information acquisition to interrupt, and first
Processor can collect the fault message of completion as far as possible, determine for the accident analysis of follow-up first processor
Provide convenience position.
Second processor and first processor in the present embodiment, its substantial structure can be it is consistent,
And the relation between both can exchange, in a word, between two second processors, which second processing
Device does not receive the heartbeat message that another second processor is sent within a certain period of time, and second processor is just to another
One second processor sends reset request message.
3rd embodiment
A kind of first processor is present embodiments provided, refer to Fig. 3, including:
Message reception module 101, for receiving the reset request message from second processor, first processor
10 and second processor backup each other;
Fault information acquisition module 102, for performing fault information acquisition;
Reseting module 103, for when being not carried out fault information acquisition, according to reset request message to processor
Resetted.
In double-control system, backuped each other between two processors;Two processors each perform respective industry
Business, under Active-Active patterns, although two processors are all working, the content of its work is different,
One of processor is the OWNER (main control end) of a task, and another processor would not be task
OWNER, and can only be the OWNER of other tasks;In double-control system, in order to prevent issuing simultaneously
Service message, it is proposed that OWNER concept, that OWNER is recorded is the id (identity) of processor,
OWNER is an attribute in lun (Logical Unit Number, LUN);At business
During reason, the id of processor corresponding to the IO (input/output) of main frame is specified, then, that is, determine
By which processor it is handled accordingly, if IO has been sent to non-main control end, i.e., non-OWNER
, there is SCSI (Small Computer System Interface, small computer system interface) Target at end
(target) module forwards are handled to OWNER ends.
In order to ensure processor in double-control system can normal work, generally establish the heart between two processors
Jump mechanism, i.e. first processor 10 also include heartbeat sending module 104, for according to fixed time interval
Heartbeat message is sent to second processor;In normally transmitting-receiving heartbeat message, know each other between processor pair
It is upright in normal work, same business is handled with OWNER identity simultaneously so as to avoid processor
Situation, two processors can normally distribute business.
When first processor 10 breaks down, the meeting of fault information acquisition module 102 of first processor 10
Startup separator information gathering;Using kdump as embodiment, kdump is in a kind of Linux based on kexec
Nuclear disruption catch mechanism, it is to be used for dump internal memory operational factor when system crash, deadlock or deadlock
An instrument and service, draw an analogy, if system is once collapse, then normal kernel is not just done
Method works, and will produce a kernel for being used for capture current operational informations by kdump during this time,
All running statuses in internal memory now and data message can be collected into a dump core (kernel by kernel
Unloading) in file in order to analyze crash reason, this process usually requires several minutes.
First processor 10 in the present embodiment also includes reseting register 105;To answering for first processor 10
Position is realized by the reset such as reset circuit means, and reseting register 105 is then whether to be used for sign
First processor 10 can be resetted.Optionally, failure letter is gathered in fault information acquisition module 102
During breath, setting reseting register 105 is disabled status, and when being not carried out fault information acquisition, set multiple
Bit register 105 is upstate.When reseting register 105 is disabled status, even if receiving
Reset request message is also not carried out the reset operation to processor;And when being not carried out fault information acquisition,
It is upstate that reseting register 105, which can then be set, and the reseting register 105 under upstate is receiving
During to reset request message, computing device can be resetted and operated, i.e., according to the shape of reseting register 105
State resets to first processor 10.Specifically, carrying out failure letter in fault information acquisition module 102
During breath collection, driving interface is first called, disables reseting register 105, that is, calls BSP call back function to close
Reseting register 105 is closed, it is 0 to set reseting register 105 to be worth, and makes reset link failure, prevents at first
Reason device 10 is reset;, can be more with its mutually redundant processor during fault information acquisition is performed
Secondary sends a request message to processor, and directly please according to reset when processor completes fault information acquisition
Ask message to reset first processor 10, only can also judge that first processor 10 breaks down in second processor
When send once, afterwards i.e. do not retransmit reset request message, allow first processor 10 complete fault message
Voluntarily resetted after collection.
During fault information acquisition is performed, be not each time fault information acquisition be all smoothly,
Fault information acquisition may also go wrong in itself;So, optionally, message reception module 101 is additionally operable to
Receive the forced resetting message that second processor is sent;Accordingly, will after forced resetting message is received
Reseting register 105 is arranged to upstate.In this manner it is possible to the state pair according to reseting register 105
First processor 10 is resetted.
When first processor breaks down, fault information acquisition is carried out, then, can be received at second
Manage the reset request message that device is sent;If now carry out fault information acquisition, then just not at first
Reason device is resetted;If do not carry out fault information acquisition now, then, just according to reset request message to
One processor is resetted, and is reset so as to avoid first processor during fault information acquisition,
And cause fault information acquisition to interrupt, first processor can collect the fault message of completion as far as possible,
Provided convenience for the accident analysis positioning of subsequent processor.
First processor and second processor in the present embodiment, its substantial structure can be it is consistent,
And the relation between both can exchange, in a word, between two processors, which processor there occurs
Failure, fault information acquisition will be carried out, and in fault information acquisition, he forbid another processor to enter it
Row resets, and after the completion of fault information acquisition, can be resetted.
Fourth embodiment
A kind of second processor is present embodiments provided, refer to Fig. 4, including:
Heartbeat receiving module 202, for receiving the heart from first processor 10 according to fixed time interval
Message is jumped, first processor 10 and second processor 20 backup each other;
Message transmission module 201, for not receiving the transmission of first processor 10 in the first preset time
After heartbeat message, reset request message is sent to first processor;Reset request message includes:If at first
Manage device 10 and performing fault information acquisition, then first processor 10 is not resetted;If first processor
10 are not performing fault information acquisition, then first processor 10 are resetted according to reset request message.
In order to ensure processor in double-control system can normal work, generally establish the heart between two processors
Jump mechanism, i.e. heartbeat receiving module 202 can be received from mutually redundant with it according to fixed time interval
The heartbeat message that processor is sent;In normally transmitting-receiving heartbeat message, it is aware of each other just between processor
In normal work, so as to avoid the situation that processor handles same business with OWNER identity simultaneously.
But two processors in double-control system can not possibly be always maintained at normal work, if at one of
Reason device there is a problem, then, another processor can also continue to offer service, and here it is double-control system
Benefit;The heartbeat receiving module 202 of second processor 20 is not receiving the transmission of first processor 10
Heartbeat message is after the first preset time, and now, second processor 20 judges, first processor 10 is sent out
Give birth to failure, then, second processor 20 will be attempted to reset first processor 10, i.e. message transmission module
201 send reset request message to first processor 10.
After message transmission module 201 have sent reset request message to first processor 10, first processor 10
It ought to reset;But if now first processor 10 is carrying out fault information acquisition, can not be to first
Processor 10 is resetted;During first processor 10 performs fault information acquisition, message is sent
Module 201 can be multiple transmission reset request message to first processor 10, and mutually redundant with it
Processor complete fault information acquisition when directly according to reset request message reset with its mutually redundant processor,
Only it can also be sent once when processor judges and broken down with its mutually redundant processor, afterwards i.e. not
Retransmit reset and require message, convey its mutually redundant processor after fault information acquisition is completed from traveling
Row resets.
During fault information acquisition is performed, be not each time fault information acquisition be all smoothly,
Fault information acquisition may also go wrong in itself;So, when meeting preparatory condition, message transmission module
201 can send forced resetting message to first processor 10;First processor 10 receives forced resetting and disappeared
After breath, reseting register 105 is arranged to upstate, then, according to the state of reseting register 105
First processor 10 is resetted.For forced resetting message compared with reset request message, it is exactly multiple that it, which is distinguished,
Position request message is attempt to reset first processor 10, without mandatory, in reseting register 105 to prohibit
First processor 10 can not be resetted with during state;No matter forced resetting message is now reseting register
What 105 state is, reseting register 105 directly is arranged into upstate, then to first processor
10 are resetted;When this is that first processor 10 carries out fault information acquisition, it is impossible to one resetted to it
Kind remedial measure, when fault information acquisition breaks down in itself, can also be sent by second processor 20
The form of forced resetting message, reseting register 105 is resetted first processor 10, avoid event
Information gathering is undying goes on for barrier.
Wherein, preparatory condition can include:Message transmission module 201 sends reset request message at first
Reason device 10 have passed through the second preset time;Or message transmission module 201 sends reset request message to the
One processor 10 have passed through the second preset time, and first processor 10 is still performing fault information acquisition.
Here when the second preset time, typically no less than first processor 10 perform the maximum of fault information acquisition
Between;If it have passed through fault message to first processor 10 in the transmission of second processor 20 reset request message to adopt
Maximum time needed for collection, then can thinks that failure occurs in fault information acquisition, can now force
Carry out reset operation to first processor 10, i.e., second processor 20 sends forced resetting message at first
Manage device 10.Specifically, the set-up mode of preparatory condition can be one timing of setting in second processor 20
Device, the time of this timer is not less than the maximum time needed for fault information acquisition, i.e. the second preset time;
During this period, second processor 20 can monitor the state of first processor 10, including the first processing in real time
Whether whether device 10 is performing fault information acquisition, or resetted;If timer expiry, then
With regard to explanation, there is the first processor after the completion of exception, or fault information acquisition in fault information acquisition
There is exception in 10 reset, and now, the can of second processor 20 sends forced resetting message at first
Device 10 is managed, makes first processor 10 force to be resetted, is ensureing that fault message can be complete to greatest extent
Into preservation while, avoid the excessive waste of time and have impact on normal data processing work.
Second processor 20, can be with monitoring and reset register 105 when monitoring the state of first processor 10
State;If the state of reseting register 105 is disabled status, illustrate that first processor 10 is carrying out event
Hinder the gatherer process of information, now reset request message can not reset first processor 10;If reset deposit
The state of device 105 is upstate, illustrates first processor 10 without progress fault information acquisition, including therefore
Barrier information gathering has been completed, and message transmission module 201 can directly transmit reset request message at first
Device 10 is managed, resets first processor 10;If after timer expiry, the state of reseting register 105 is still
It is disabled status, then, the can of message transmission module 201 sends forced resetting message to first processor
10, directly the state of reseting register 105 is forced to be changed to upstate, and carry out first processor 10
Reset.
Second processor receives the heartbeat message of first processor transmission according to fixed time interval, is not connecing
The heartbeat message of first processor transmission is received after the first preset time, sends and resets to first processor
Request message, when meeting preparatory condition, forced resetting message is sent to first processor, so as to avoid
First processor is reset during fault information acquisition, and causes fault information acquisition to interrupt, and first
Processor can collect the fault message of completion as far as possible, determine for the accident analysis of follow-up first processor
Provide convenience position.
Second processor 20 and first processor 10 in the present embodiment, its substantial structure can be consistent
, and the relation between both can exchange, in a word, between two processors, which processor exists
The heartbeat message that another processor is sent is not received in certain time, processor is just sent to another processor
Reset request message.
In addition, the present embodiment additionally provides a kind of network storage equipment, include first processing of above-described embodiment
Device 10 and second processor 20.
5th embodiment
A kind of processor is present embodiments provided, refer to Fig. 5, including:
Message reception module 101, for receiving the reset request message of another processor 30;
Fault information acquisition module 102, for performing fault information acquisition;
Reseting module 103, for when being not carried out fault information acquisition, according to reset request message to processor
30 are resetted;
Heartbeat module 301, disappear for receiving and dispatching heartbeat between another processor 30 according to fixed time interval
Breath;
Message transmission module 201, the heartbeat for not receiving another processor 30 when heartbeat module 301 disappear
Breath sends reset request message after the first preset time, to another processor 30.
In double-control system, backuped each other between two processors 30;Two processors 30 each perform each
From business, under Active-Active patterns, although two processors 30 are all working, its work
Content is different, and one of processor 30 is the OWNER (main control end) of a task, another processor
30 would not be the OWNER of task, and can only be the OWNER of other tasks;In double-control system,
In order to prevent while issuing service message, it is proposed that OWNER concept, OWNER records are processing
The id (identity) of device, OWNER are in lun (Logical Unit Number, LUN)
One attribute;During business processing, processor corresponding to the IO (input/output) of main frame is specified
Id, then, that is, determine and it handled accordingly by which processor, if IO be sent to it is non-
Main control end, i.e., non-OWNER ends, there are SCSI (Small Computer System Interface, small-sized meter
Calculation machine system interface) Target (target) module forwards are handled to OWNER ends.
In order to ensure processor 30 in double-control system can normal work, generally between two processors 30
Establish heartbeat mechanism, i.e. processor 30 also includes heartbeat module 301, for according to fixed time interval to
Another processor 30 sends heartbeat message;In normally transmitting-receiving heartbeat message, know each other between processor 30
Road other side is simultaneously same with OWNER identity processing so as to avoid processor 30 just in normal work
The situation of business, two processors 30 can normally distribute business.
When processor 30 breaks down, the fault information acquisition module 102 of processor 30 can startup separator
Information gathering;Using kdump as embodiment, kdump is that a kind of linux kernel collapse based on kexec is caught
Mechanism is obtained, is for a work of dump internal memory operational factor when system crash, deadlock or deadlock
Tool and service, draw an analogy, if system is once collapse, then and normal kernel is just had no idea work,
A kernel for being used for capture current operational informations will be produced by kdump during this time, kernel can be by this
When internal memory in all running statuses and data message be collected into a dump core file in order to point
Crash reason is analysed, this process usually requires several minutes.
Processor 30 in the present embodiment also includes reseting register 105;Reset to processor 30 is to pass through
The reset such as reset circuit means realize, and reseting register 105 is then whether be used to characterizing can be to place
Reason device 30 is resetted.Optionally, when fault information acquisition module 102 gathers fault message, set multiple
Bit register 105 is disabled status, and when being not carried out fault information acquisition, reseting register 105 is set
For upstate.When reseting register 105 is disabled status, even if receiving reset request message
Also it is not carried out the reset operation to processor 30;And when being not carried out fault information acquisition, then it can set
Reseting register 105 is upstate, and the reseting register 105 under upstate is receiving reset request
During message, processor 30 can be performed and reset operation, i.e., according to the state of reseting register 105 to processing
Device 30 is resetted.Specifically, when fault information acquisition module 102 is carrying out fault information acquisition, first
Driving interface is called, disables reseting register 105, that is, calls BSP call back function to close reseting register
105, it is 0 to set reseting register 105 to be worth, and makes reset link failure, prevents processor 30 to be reset;
During performing fault information acquisition, another processor 30 repeatedly can send a request message to processor
30, and when processor 30 completes fault information acquisition directly according to reset request message resetting processor 30,
It can also only send once when another decision processor 30 of processor 30 breaks down, no longer send out afterwards
Reset request message is sent, allows processor 30 voluntarily to be resetted after fault information acquisition is completed.
During fault information acquisition is performed, be not each time fault information acquisition be all smoothly,
Fault information acquisition may also go wrong in itself;So, optionally, message reception module 101 is additionally operable to
Receive the forced resetting message that another processor 30 is sent;Accordingly, after forced resetting message is received,
Reseting register 105 is arranged to upstate.In this manner it is possible to the state according to reseting register 105
Processor 30 is resetted.
In addition, likewise, what is broken down is also likely to be another processor 30;The heartbeat module of processor 30
301 are not receiving heartbeat message that another processor 30 sends after the first preset time, now, place
Reason device 30 judges that there occurs failure for another processor 30, then, processor 30 will be attempted to reset another
Processor 30, i.e. message transmission module 201 send reset request message to another processor 30.
After message transmission module 201 have sent reset request message to another processor 30, another processor 30
It ought to reset;But if now another processor 30 is carrying out fault information acquisition, can not be to another
Processor 30 is resetted;During another processor 30 performs fault information acquisition, message is sent
Module 201 can be multiple transmission reset request message to another processor 30, and mutually redundant with it
Processor 30 is completed directly to be resetted and its mutually redundant processing according to reset request message during fault information acquisition
Device 30, only it can also send one when processor 30 judges and broken down with its mutually redundant processor 30
Secondary, i.e. not retransmiting reset requires message afterwards, conveys its mutually redundant processor 30 and completes failure letter
Voluntarily resetted after breath collection.
During fault information acquisition is performed, be not each time fault information acquisition be all smoothly,
Fault information acquisition may also go wrong in itself;So, when meeting preparatory condition, message transmission module
201 can send forced resetting message to another processor 30;Another processor 30 receives forced resetting and disappeared
After breath, reseting register 105 is arranged to upstate, then, according to the state of reseting register 105
Another processor 30 is resetted.For forced resetting message compared with reset request message, it is exactly multiple that it, which is distinguished,
Position request message is attempt to reset another processor 30, without mandatory, in reseting register 105 to prohibit
Another processor 30 can not be resetted with during state;No matter forced resetting message is now reseting register
What 105 state is, reseting register 105 directly is arranged into upstate, then to another processor
30 are resetted;When this is that another processor 30 carries out fault information acquisition, it is impossible to one resetted to it
Kind remedial measure, when fault information acquisition breaks down in itself, processor 30 can also be passed through and send pressure
The form of reset message, reseting register 105 is resetted another processor 30, avoid failure letter
Breath collection is undying to go on.
Wherein, preparatory condition can include:Message transmission module 201 sends reset request message to another place
Reason device 30 have passed through preset time;Or message transmission module 201 sends reset request message to another place
Reason device 30 have passed through preset time, and another processor 30 is still performing fault information acquisition.Here pre-
If the time, typically no less than another processor 30 performs the maximum time of fault information acquisition;If handling
The transmission reset request message of device 30 have passed through the maximum time needed for fault information acquisition to another processor 30,
So can thinks that failure occurs in fault information acquisition, can now force to carry out another processor 30
Operation is resetted, i.e. processor 30 sends forced resetting message to another processor 30.Specifically, preparatory condition
Set-up mode can be one timer of setting on processor 30, time of this timer not less than therefore
Hinder the maximum time needed for information gathering, i.e. preset time;During this period, processor 30 can monitor in real time
Whether the state of another processor 30, including another processor 30 are performing fault information acquisition, either
It is no to be resetted;If timer expiry, then just illustrate, there is exception in fault information acquisition, or
Person is that the reset of another processor 30 after the completion of fault information acquisition exception occurs, now, processor 30
Can sends forced resetting message to another processor 30, makes another processor 30 force to be resetted,
To greatest extent ensure fault message can be done preservation while, avoid the time excessive waste and
It has impact on normal data processing work.
Processor 30, can be with the shape of monitoring and reset register 105 when monitoring the state of another processor 30
State;If the state of reseting register 105 is disabled status, illustrate that another processor 30 is carrying out failure letter
The gatherer process of breath, now reset request message can not reset another processor 30;If reseting register 105
State be upstate, illustrate another processor 30 without carrying out fault information acquisition, including fault message
Collection has been completed, and message transmission module 201 can directly transmit reset request message to another processor 30,
Reset another processor 30;If after timer expiry, the state of reseting register 105 is still disabling shape
State, then, the can of message transmission module 201 sends forced resetting message to another processor 30, directly
The state of reseting register 105 is forced to be changed to upstate, and another processor 30 is resetted.
Processor receives and dispatches heartbeat message according to fixed time interval and another processor, event occurs in processor
During barrier, fault information acquisition is carried out, then, the reset request message sent from another processor can be received;
If now carry out fault information acquisition, then with regard to not resetted to processor, so as to avoid processing
Device is reset during fault information acquisition, and causes fault information acquisition to interrupt, and processor can use up
The possible fault message for collecting completion, provided convenience for the accident analysis positioning of follow-up first processor.
In addition, when another processor 30 breaks down, i.e., processor 30 is not receiving another processor 30
The heartbeat message of transmission sends reset request message after the first preset time, to another processor 30, keeps away
Exempt from another processor 30 to be reset during fault information acquisition, and cause fault information acquisition to interrupt,
First processor 30 can collect the fault message of completion as far as possible.
The present embodiment additionally provides a kind of network storage equipment, refer to Fig. 6, including at least two above-mentioned places
Device 30 is managed, and is backuped each other between the two processors 30.
In addition, additionally provide a kind of network store system, refer to Fig. 7, including network storage equipment 1 and
At least one data transfer apparatus 2, data transfer apparatus 2 are used to network data writing network storage equipment 1
In;Data transfer apparatus 2 can be the various network equipments such as main frame, microcomputer, and these data transfer apparatus 2
All it is connected with network storage equipment.
Processor 30 in the present embodiment includes the various processors 30 that can realize double-control system, such as storage control
Device etc. is numerous;Message reception module 101, fault information acquisition module 102 in the present embodiment, heartbeat
Module 301, message transmission module 201 can be realized by the controller in processor 30, and message connects
Receiving module 101, heartbeat module 301, message transmission module 201 etc. can be realized by same unit, failure
Information acquisition module 102, can be by, in the system failure, one being produced by kdump in processor 30
The individual kernel for capture current operational informations, the kernel can be by all running statuses in internal memory now
It is collected into data message in a dump core file in order to analyzing failure cause;And reseting module 103,
It can be realized by hardware configurations such as reset circuits, reset circuit can have not for different processors 30
Same structure;Whether reseting register 105 is to be used to characterize to reset processor 30, by multiple
Whether bit register 105 is enabled to realize.
Sixth embodiment
A kind of fault protecting method is present embodiments provided, by taking kdump as an example, refer to Fig. 8, including:
S801, storage control A (hereinafter referred to as A controls) and storage control B (hereinafter referred to as B controls)
Electricity and heartbeat mechanism is established on the double-control system of composition;
S802, A control failure triggering kdump fault information acquisitions, start heartbeat between A controls and B controls and lose;
S803, A control call BSP call back function to close opposite end reseting register, that is, set reseting register
It is worth for 0, prevents A controls to be reset;
S804, abnormal information is collected, judge whether the kdump of A controls performs completion, if performing completion,
S805 is performed, otherwise performs S806;
S805, reseting register value is set to be 1, then A controls positive return;
S806, continue to collect abnormal information, it is 0 to keep reseting register value;
S807, B control detect that heartbeat is lost, and after the expired times for reaching setting, B controls are attempted to reset A controls;
S808, B control by EPLD (Erasable Programmable Logic Device, it is erasable to compile
Collect logical device) signal detection, whether the opposite end reseting register value for judging A controls is 0, if 0, is held
Row S809, otherwise go to S810 execution;
S809, the kdump of explanation A controls are not completed, and set a kdump to complete time-out timing in B controls
Device, the time of this timer are greater than the kdump deadlines maximum in theory;
S810, the kdump of explanation A controls have been completed, if A controls are restarted, and are shaken hands with B controls
Success, if then B controls judge that kdump timers are present, timer is deleted, whole flow process is completed;
If timer expiry in S811, B control, illustrate that kdump process exceptions, or kdump are completed
Auto-reset function is abnormal afterwards, then A controls are restarted in B controls initiation.
7th embodiment
A kind of network storage equipment, including first processor 10 and second processor 20 are present embodiments provided,
First processor 10 and second processor 20 backup each other;
First processor 10 includes the first controller 41, reset circuit 1031;First controller 41 is used to connect
Receive the reset that second processor 20 is sent and require message, and perform fault information acquisition;Reset circuit 1031
For when being not carried out fault information acquisition, being resetted according to reset request message to first processor 10;
Second processor 20 includes second controller 42, for receiving the first control according to fixed time interval
The heartbeat message that device 41 is sent, and do not receive the transmission of the first controller 41 in the first preset time
After heartbeat message, reset request message is sent to first processor 10.
In order to ensure processor in double-control system can normal work, generally establish the heart between two processors
Jump mechanism, i.e. the first controller 41 can be received according to fixed time interval and sent from second controller 42
Heartbeat message;In normally transmitting-receiving heartbeat message, it is aware of each other between processor just in normal work,
So as to avoid the situation that processor handles same business with OWNER identity simultaneously.
But two processors in double-control system can not possibly be always maintained at normal work, if at one of
Reason device there is a problem, then, another processor can also continue to offer service, and here it is double-control system
Benefit;The second controller 42 of second processor 20 is not receiving the heartbeat of the first controller 41 transmission
Message is after the first preset time, and now, second processor 20 judges, first processor 10 there occurs
Failure, then, second processor 20 will be attempted to reset first processor 10, i.e. second controller 42 is sent out
Reset request message is sent to the first controller 41.
First processor 10 in the present embodiment also includes reseting register 105;To answering for first processor 10
Position resets means to realize by the grade of reset circuit 1031, and reseting register 105 is then to be used to characterize
Whether first processor 10 can be resetted.Optionally, fault message is gathered in the first controller 41
When, setting reseting register 105 is disabled status, and when being not carried out fault information acquisition, set and reset
Register 105 is upstate.When reseting register 105 is disabled status, even if receiving multiple
Position request message is also not carried out the reset operation to processor;And when being not carried out fault information acquisition, then
It is upstate that reseting register 105, which can be set, and the reseting register 105 under upstate is receiving
During reset request message, computing device can be resetted and operated, i.e., according to the state of reseting register 105
First processor 10 is resetted.Specifically, carrying out fault message in fault information acquisition module 102
During collection, driving interface is first called, disables reseting register 105, that is, calls BSP call back function to close
Reseting register 105, it is 0 to set reseting register 105 to be worth, and makes reset link failure, prevents the first processing
Device 10 is reset;, can be multiple with its mutually redundant processor during fault information acquisition is performed
Send a request message to processor, and when processor completes fault information acquisition directly according to reset request
Message resets first processor 10, only can also judge that event occurs in first processor 10 in second processor 20
Sent once during barrier, do not retransmit reset request message afterwards, allow first processor 10 completing failure letter
Voluntarily resetted after breath collection.
During fault information acquisition is performed, be not each time fault information acquisition be all smoothly,
Fault information acquisition may also go wrong in itself;So, when meeting preparatory condition, second controller 42
Forced resetting message can be sent to the first controller 41;First controller 41 then receives second processor 20
The forced resetting message of transmission;Accordingly, after forced resetting message is received, by reseting register 105
It is arranged to upstate.In this manner it is possible to first processor 10 is entered according to the state of reseting register 105
Row resets.
Wherein, preparatory condition can include:Second controller 42 sends reset request message to the first controller
41 have passed through the second preset time;Or second controller 42 sends reset request message to the first controller
41 have passed through the second preset time, and the first controller 41 is still performing fault information acquisition.
First processor receives and dispatches heartbeat message according to fixed time interval and second processor, goes out in processor
During existing failure, fault information acquisition is carried out, then, the reset request sent from another processor can be received
Message;If now carry out fault information acquisition, then with regard to not resetted to processor, so as to avoid
Processor is reset during fault information acquisition, and causes fault information acquisition to interrupt, processor
The fault message of completion can be collected as far as possible, provided for the accident analysis positioning of follow-up first processor
Facility.
In addition, a kind of network store system is additionally provided, including network storage equipment and at least one data
Writing station, data transfer apparatus are used to write network data in network storage equipment;Data transfer apparatus
Can be the various network equipments such as main frame, microcomputer, and these data transfer apparatus all with network storage equipment phase
Even.
Obviously, it will be understood by those skilled in the art that each module or each step of the invention described above can use it is logical
Computing device realizes that they can be concentrated on single computing device, or be distributed in multiple meters
Calculate on the network that device is formed, alternatively, they can be with the program code that computing device can perform come real
It is existing, filled it is thus possible to be stored in storage medium (ROM/RAM, magnetic disc, CD) by calculating
Put to perform, and in some cases, can be shown or described to be performed different from order herein
Step, they are either fabricated to each integrated circuit modules respectively or by multiple modules in them or
Step is fabricated to single integrated circuit module to realize.So the present invention is not restricted to any specific hardware
Combined with software.
Above content is to combine specific embodiment further description made for the present invention, it is impossible to is recognized
The specific implementation of the fixed present invention is confined to these explanations.For the ordinary skill of the technical field of the invention
For personnel, without departing from the inventive concept of the premise, some simple deduction or replace can also be made,
Protection scope of the present invention should be all considered as belonging to.
Claims (14)
1. a kind of fault protecting method, including:
First processor receives the reset request message from second processor, the first processor and second
Processor backups each other;
If performing fault information acquisition, the first processor is not resetted;
If not performing fault information acquisition, the first processor is entered according to the reset request message
Row resets.
2. fault protecting method as claimed in claim 1, it is characterised in that the first processor bag
Include reseting register;
In the execution fault information acquisition, in addition to:It is disabled status to set the reseting register;
It is described do not performing fault information acquisition when, in addition to:It is available shape to set the reseting register
State;
It is described reset is carried out to first processor to include:According to the state of the reseting register to described first
Processor is resetted.
3. fault protecting method as claimed in claim 2, it is characterised in that methods described also includes:
Receive the forced resetting message that the second processor is sent;, will after the forced resetting message is received
The reseting register is arranged to upstate;Handled according to the state of the reseting register described first
Device is resetted.
4. a kind of fault protecting method, including:
Second processor receives the heartbeat message of first processor according to fixed time interval, at described second
Reason device and first processor backup each other;
After not receiving the heartbeat message that the first processor is sent in the first preset time, to described the
One processor sends reset request message;The reset request message includes:If the first processor is being held
Row fault information acquisition, then the first processor is not resetted;If the first processor is not being held
Row fault information acquisition, then the first processor is resetted according to the reset request message.
5. fault protecting method as claimed in claim 4, it is characterised in that methods described also includes:
When meeting preparatory condition, forced resetting message is sent to the first processor.
6. fault protecting method as claimed in claim 5, it is characterised in that the preparatory condition includes:
Reset request message, which is sent, to the first processor have passed through the second preset time;
Or:Reset request message, which is sent, to the first processor have passed through the second preset time, and described the
One processor is still performing fault information acquisition.
A kind of 7. first processor, it is characterised in that including:
Message reception module, for receiving the reset request message from second processor, first processing
Device backups each other with second processor;
Fault information acquisition module, for performing fault information acquisition;
Reseting module, for when being not carried out fault information acquisition, according to the reset request message to described
Processor is resetted.
A kind of 8. second processor, it is characterised in that including:
Heartbeat receiving module, for receiving the heartbeat message from first processor according to fixed time interval,
The second processor and first processor backup each other;
Message transmission module, the heartbeat for not receiving first processor transmission in the first preset time disappear
After breath, reset request message is sent to the first processor;The reset request message includes:It is if described
First processor is performing fault information acquisition, then the first processor is not resetted;If described
One processor is not performing fault information acquisition, then according to the reset request message to the first processor
Resetted.
9. a kind of network storage equipment, it is characterised in that including first processor and second processor, institute
State first processor and second processor backups each other;
The first processor includes the first controller, reset circuit;First controller is used to receive institute
The reset for stating second processor transmission requires message, and performs fault information acquisition;The reset circuit is used
When fault information acquisition is being not carried out, the first processor is resetted according to reset request message;
The second processor includes second controller, for receiving described first according to fixed time interval
The heartbeat message that controller is sent, and do not receive the heart that the first controller is sent in the first preset time
After jumping message, the reset request message is sent to the first processor.
10. network storage equipment as claimed in claim 9, it is characterised in that the first processor is also
Including reseting register;When first controller performs fault information acquisition, described reset is set to deposit
Device is disabled status;
When first controller is not carried out fault information acquisition, it is available shape to set the reseting register
State;
The reset circuit is additionally operable to answer the first processor according to the state of the reseting register
Position.
11. network storage equipment as claimed in claim 10, it is characterised in that first controller
It is additionally operable to receive the forced resetting message that second controller is sent;After the forced resetting message is received,
The reseting register is arranged to upstate.
12. the network storage equipment as described in claim any one of 9-11, it is characterised in that described
Two controllers are additionally operable to:When meeting preparatory condition, forced resetting message is sent to first controller.
13. network storage equipment as claimed in claim 12, it is characterised in that the preparatory condition bag
Include:Reset request message, which is sent, to another processor have passed through the second preset time;
Or:Reset request message, which is sent, to another processor have passed through the second preset time, and another place
Manage device and still perform fault information acquisition.
14. a kind of network store system, it is characterised in that including at least one data transfer apparatus and as weighed
Profit requires the network storage equipment described in any one of 9-13;The data transfer apparatus is used to write network data
Enter in the network storage equipment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610356375.0A CN107438010A (en) | 2016-05-25 | 2016-05-25 | Fault protecting method, first, second processor, network storage equipment and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610356375.0A CN107438010A (en) | 2016-05-25 | 2016-05-25 | Fault protecting method, first, second processor, network storage equipment and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107438010A true CN107438010A (en) | 2017-12-05 |
Family
ID=60454368
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610356375.0A Pending CN107438010A (en) | 2016-05-25 | 2016-05-25 | Fault protecting method, first, second processor, network storage equipment and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107438010A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109334590A (en) * | 2018-08-31 | 2019-02-15 | 百度在线网络技术(北京)有限公司 | Pilotless automobile chassis control method, apparatus, equipment and storage medium |
CN110198065A (en) * | 2019-06-21 | 2019-09-03 | 深圳市小兔充充科技有限公司 | The detection circuit of charging station and the detection device of charging station |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6438709B2 (en) * | 1997-09-18 | 2002-08-20 | Intel Corporation | Method for recovering from computer system lockup condition |
CN101025700A (en) * | 2006-02-21 | 2007-08-29 | 中兴通讯股份有限公司 | Abnormal reset system protection method and reset protection system |
CN101149636A (en) * | 2007-10-23 | 2008-03-26 | 华为技术有限公司 | Repositioning system and method |
CN101207408A (en) * | 2006-12-22 | 2008-06-25 | 中兴通讯股份有限公司 | Apparatus and method of synthesis fault detection for main-spare taking turns |
CN101488881A (en) * | 2008-01-17 | 2009-07-22 | 鼎桥通信技术有限公司 | A fault processing method |
CN203643815U (en) * | 2013-12-12 | 2014-06-11 | 东风汽车公司 | Vehicle controller based on safety function |
CN104506364A (en) * | 2014-12-29 | 2015-04-08 | 迈普通信技术股份有限公司 | Master-slave switching method, main control card and network equipment |
-
2016
- 2016-05-25 CN CN201610356375.0A patent/CN107438010A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6438709B2 (en) * | 1997-09-18 | 2002-08-20 | Intel Corporation | Method for recovering from computer system lockup condition |
CN101025700A (en) * | 2006-02-21 | 2007-08-29 | 中兴通讯股份有限公司 | Abnormal reset system protection method and reset protection system |
CN101207408A (en) * | 2006-12-22 | 2008-06-25 | 中兴通讯股份有限公司 | Apparatus and method of synthesis fault detection for main-spare taking turns |
CN101149636A (en) * | 2007-10-23 | 2008-03-26 | 华为技术有限公司 | Repositioning system and method |
CN101488881A (en) * | 2008-01-17 | 2009-07-22 | 鼎桥通信技术有限公司 | A fault processing method |
CN203643815U (en) * | 2013-12-12 | 2014-06-11 | 东风汽车公司 | Vehicle controller based on safety function |
CN104506364A (en) * | 2014-12-29 | 2015-04-08 | 迈普通信技术股份有限公司 | Master-slave switching method, main control card and network equipment |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109334590A (en) * | 2018-08-31 | 2019-02-15 | 百度在线网络技术(北京)有限公司 | Pilotless automobile chassis control method, apparatus, equipment and storage medium |
CN109334590B (en) * | 2018-08-31 | 2020-05-12 | 百度在线网络技术(北京)有限公司 | Unmanned vehicle chassis control method, device, equipment and storage medium |
CN110198065A (en) * | 2019-06-21 | 2019-09-03 | 深圳市小兔充充科技有限公司 | The detection circuit of charging station and the detection device of charging station |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106789306B (en) | Method and system for detecting, collecting and recovering software fault of communication equipment | |
EP2733611B1 (en) | Internal fault handling method, device and system for virtual machine | |
TWI229796B (en) | Method and system to implement a system event log for system manageability | |
JP6333410B2 (en) | Fault processing method, related apparatus, and computer | |
CN102231681B (en) | High availability cluster computer system and fault treatment method thereof | |
JP2001350651A (en) | Method for isolating failure state | |
CN110581852A (en) | Efficient mimicry defense system and method | |
CN101799776A (en) | Fault processing method of multi-core processor, multi-core processor and communication device | |
CN109308252A (en) | A kind of fault location processing method and processing device | |
CN105959235B (en) | Distributed data processing system and method | |
CN103298013B (en) | A kind of method and device carrying out business recovery | |
CN102364448A (en) | Fault-tolerant method for computer fault management system | |
US20140122421A1 (en) | Information processing apparatus, information processing method and computer-readable storage medium | |
CN110351149A (en) | A kind of method and device for safeguarding network data Forwarding plane | |
CN106155826B (en) | For the method and system of mistake to be detected and handled in bus structures | |
CN100538647C (en) | The processing method for service stream of polycaryon processor and polycaryon processor | |
Araujo et al. | Dependability evaluation of a mhealth system using a mobile cloud infrastructure | |
CN106559288B (en) | A kind of quick fault testing method based on icmp packet | |
CN106681858A (en) | Virtual machine data disaster tolerance method and management device | |
CN107438010A (en) | Fault protecting method, first, second processor, network storage equipment and system | |
CN107291589B (en) | Method for improving system reliability in robot operating system | |
CN104283718A (en) | Network device and hardware fault diagnosis method used for network device | |
CN106502811A (en) | A kind of 1553B bus communications fault handling method | |
CN106502944A (en) | The heartbeat detecting method of computer, PCIE device and PCIE device | |
CN113407374A (en) | Fault processing method and device, fault processing equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171205 |
|
RJ01 | Rejection of invention patent application after publication |