CN107404393A - A kind of method and system for judging link failure - Google Patents

A kind of method and system for judging link failure Download PDF

Info

Publication number
CN107404393A
CN107404393A CN201610340855.8A CN201610340855A CN107404393A CN 107404393 A CN107404393 A CN 107404393A CN 201610340855 A CN201610340855 A CN 201610340855A CN 107404393 A CN107404393 A CN 107404393A
Authority
CN
China
Prior art keywords
sender
timeout
fpga
cpu
recipient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201610340855.8A
Other languages
Chinese (zh)
Inventor
徐晓丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xinwei Telecom Technology Inc
Original Assignee
Beijing Xinwei Telecom Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xinwei Telecom Technology Inc filed Critical Beijing Xinwei Telecom Technology Inc
Priority to CN201610340855.8A priority Critical patent/CN107404393A/en
Publication of CN107404393A publication Critical patent/CN107404393A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Abstract

The invention discloses a kind of method and system for judging link failure.A kind of method for judging link failure, including:Sender sends the detection messages for carrying first state position by FPGA to recipient, waits the reply message of recipient;If sender receives reply message by FPGA in default time-out time Timeout, judge whether first state position is default normal status value, if, it is determined that communication is normal;If it is not, and determine that it is default abnormal condition value to send preset times detection messages to receive first state position, it is determined that receiver system operation irregularity;If sender does not receive reply message by FPGA in Timeout, it is determined that link is abnormal.The present invention can effectively carry out breakdown judge and accident analysis in cpu fault, and it is receiver system failure rather than link failure that can effectively help the side of detection to judge in receiver system failure state.

Description

A kind of method and system for judging link failure
Technical field
The present embodiments relate to link failure judgement and link failure analytical technology, more particularly to a kind of judgement The method and system of link failure.
Background technology
With the development of network and information technology, to ensure that communication receiver's (core net) is reliable and can effectively receive To the data of sender (base station) passback, sender (base station) needs whether just receiving terminal system can be understood in real time Often, therefore how to detect opposite equip. state more and more important to data sender.
The communication failure of traditional IP is using transmission ICMPEcho request.
But because the link that message transmits in a network can automatically switch, time delay also in real-time change, therefore Physical link time delay occurs that erroneous judgement or live effect are bad when uncertain, and ICMP can not enter exactly Row accident analysis, the processing of message can also take overhead, can seriously be dropped when especially upstream request is too many Low CPU efficiency, or even make system congestion.
The content of the invention
The present invention provides a kind of method and system for judging link failure, and one kind is provided to be embodied as communicating pair The method that real-time and quick link failure judges.
In a first aspect, the invention provides a kind of method for judging link failure, including:
Sender sends the detection messages for carrying first state position by FPGA to recipient, waits institute State the reply message of recipient;
The recipient receives the detection messages by FPGA, according to recipient CPU work shape State updates the first state position, and replys the reply with the first state position to described sender Message;
If described sender receives the reply message by FPGA in default time-out time Timeout, Judge whether the first state position is default normal status value, if, it is determined that communication is normal;If it is not, And it is to preset improper shape to determine that detection messages described in transmission preset times receive the first state position State value, it is determined that receiver system operation irregularity;If described sender by FPGA in Timeout not Receive the reply message, it is determined that link is abnormal.
Preferably, described sender sends a detection for carrying first state position by FPGA to recipient Message, the reply message of the recipient is waited, is specifically included:Sender is sent out by FPGA to recipient Send one to carry the detection messages of first state position, and start timing, wait the reply message of the recipient;
If the described sender receives the reply message by FPGA in Timeout, specific bag Include:Described sender receives the reply message by FPGA in default time-out time Timeout, Terminate timing and obtain round-trip delay t1, t1 is sent to sender CPU to calculate renewal Timeout, will Timeout after renewal feeds back sender to FPGA.
Preferably, it is described that t1 is sent to sender CPU to calculate renewal Timeout, after renewal Timeout feeds back sender to FPGA, is specially:First upper electricity is determine whether by sender CPU Or link change, if so, t1 then is sent into sender CPU to calculate renewal Timeout, will update Timeout feedback senders afterwards when link is constant if not, do not update Timeout to FPGA to FPGA Value.
Preferably, it is described that t1 is sent to sender CPU to calculate renewal Timeout, specifically include: T1 is sent to sender CPU, is calculated more by the sender CPU δ set according to t1 and user New Timeout=t1+t1x δ.
Preferably, described sender sends a detection for carrying first state position by FPGA to recipient Message, specifically include:
Described sender is periodically sent by the predetermined period DetectInterval in FPGA to recipient One detection messages for carrying first state position.
Preferably, it is described to determine that communication is normal, specifically include:By the letter that the receiver system is working properly Breath is sent to sender CPU, then puts Timeout timer completely, into next detection cycle;
The determination receiver system operation irregularity, it is specially:By TimeOut timer clear 0, simultaneously By DetectInterval timer clear 0, the information of the receiver system operation irregularity is sent to transmission Square CPU;
The determination link is abnormal, is specially:By TimeOut timer clear 0, simultaneously will DetectInterval timer clear 0, sender CPU is sent to by doubtful link failure information.
Preferably, after the determination link exception, in addition to:Described sender CPU is to the reception Fang Faqi ICMP echo request messages, pass through obtained error message detecting link abort situation.
Second aspect, the embodiment of the present invention additionally provide a kind of system for judging link failure, and the system includes:
Sender and recipient, described sender include the first FPGA module, the second FPGA module with And sender CPU, the recipient include the 3rd FPGA module and recipient CPU;
First FPGA module, for sending the detection messages for carrying first state position to recipient, etc. Treat the reply message of the recipient;
3rd FPGA module, for receiving the detection messages, according to the work of the recipient CPU Make state and update the first state position, and replied to described sender with described in the first state position Reply message;
Second FPGA module, for receiving the reply message in default time-out time Timeout, Judge whether the first state position is default normal status value, if, it is determined that communication is normal;If it is not, And it is to preset improper shape to determine that detection messages described in transmission preset times receive the first state position State value, it is determined that receiver system is working properly;If described sender does not receive described in Timeout Reply message, it is determined that link is abnormal.
Preferably, the first FPGA module, it is specifically used for:One, which is sent, to recipient carries first state position Detection messages, and start timing, wait the reply message of the recipient;
Second FPGA module, is specifically used for:Received in default time-out time Timeout described Message is replied, terminates timing and obtains round-trip delay t1, t1 is sent to sender CPU, judge described the Whether one mode bit is default normal status value, if, it is determined that communication is normal;If it is not, and determine to send It is default abnormal condition value that detection messages described in preset times, which receive the first state position, it is determined that Receiver system is working properly;If described sender does not receive the reply message in Timeout, Determine link exception;;
Described sender CPU, Timeout is updated for being calculated according to t1, the Timeout after renewal is anti- Feedback sender give second FPGA module.
Preferably, described sender CPU, for determining whether first upper electricity or link change, if so, Then renewal Timeout will be calculated according to t1, the Timeout after renewal is fed back into sender to FPGA, When link is constant if not, Timeout values are not updated to FPGA.
Preferably, described sender CPU, it is specifically used for:It is calculated according to the δ that t1 and user are set The Timeout=t1+t1x δ of renewal.
Preferably, first FPGA module, is specifically used for:Pass through predetermined period DetectInterval Periodically the detection messages for carrying first state position are sent to recipient.
Preferably, second FPGA module, is specifically used for:It is inscribed in default time-out time Timeout The reply message is received, judges whether the first state position is default normal status value, if so, then will Receiver system information working properly is sent to sender CPU, then by Timeout timer Put completely, into next detection cycle;If it is not, and determine that detection messages described in transmission preset times receive It is default abnormal condition value to the first state position, then by TimeOut timer clear 0, simultaneously will DetectInterval timer clear 0, the information of the receiver system operation irregularity is sent to sender CPU;, will if described sender does not receive the reply message by FPGA in Timeout TimeOut timer clear 0, while by DetectInterval timer clear 0, by doubtful link failure Information is sent to sender CPU.
Preferably, described sender CPU, it is specifically used for:Renewal Timeout is calculated, after renewal Timeout feedback senders give second FPGA module, after getting the doubtful link failure information, ICMP echo request messages are initiated to the recipient, pass through obtained error message detecting link failure Position.
The present invention performs the agreement between sender and recipient, passback side's hair by using FPGA hardware Link detecting request is played, the request message that recipient will be initiated passback side after receiving carries out reply confirmation, leads to Hardware is crossed to perform agreement, cpu load can be effectively reduced, increased operation rate;Can be in CPU events Accident analysis is effectively carried out during barrier, application program occur deadlock, interference cause program fleet and Other failures of CPU and causing can effectively help in the state of the machine of delaying the side of detection judge to be the system failure and Non- link failure.
Brief description of the drawings
Fig. 1 is a kind of method flow diagram of the method for judgement link failure in the embodiment of the present invention one;
Fig. 2 is a kind of method flow diagram one of the method for judgement link failure in the embodiment of the present invention two;
Fig. 3 is a kind of method flow diagram two of the method for judgement link failure in the embodiment of the present invention two;
Fig. 4 is the flow chart of the sender of the example in the embodiment of the present invention two and example IV;
Fig. 5 is the flow chart of the recipient of the example in the embodiment of the present invention two and example IV;
Fig. 6 is to consult the flow chart of interaction the time of the example in the embodiment of the present invention two and example IV;
Fig. 7 is a kind of structured flowchart of the system of judgement link failure in the embodiment of the present invention three;
Fig. 8 is a kind of structured flowchart of the system of judgement link failure in the embodiment of the present invention four.
Embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this The specific embodiment of place description is used only for explaining the present invention, rather than limitation of the invention.Also need in addition It is noted that for the ease of description, part related to the present invention illustrate only in accompanying drawing and not all knot Structure.
Embodiment one
A kind of flow chart of the method for judgement link failure that Fig. 1 provides for the embodiment of the present invention one, this implementation Example is applicable to the interaction of two equipment in IP network, and be particularly suitable for use in base station and core net in IP network Interaction scenario, this method can be by the recipient'ss including the sender with FPGA and the FPGA carried IP network performs, and specifically comprises the following steps:
S110, sender send the detection messages for carrying first state position by FPGA to recipient, Wait the reply message of recipient.
Wherein, sender includes sender system and sender's FPGA moulds with sender CPU Block, sender can be base station, including the system of base station CPU and base station FPGA module.First shape State position is that first state position is default value in this step for representing recipient's CPU working conditions.Tool Body, sender sends the detection messages for carrying first state position by FPGA to recipient, starts The default time-out time Timeout timers of acquiescence start timing, wait the reply message of recipient.
S120, recipient receive detection messages by FPGA, according to recipient CPU working condition First state position is updated, and the reply message with first state position is replied to sender.
Wherein, recipient includes receiver system and recipient's FPGA moulds with recipient CPU Block, recipient can be core net, including core net CPU system and core net FPGA module. First state position is is updated according to core net CPU working conditions, if core net CPU working conditions are just Often, then first state position is updated to default normal status value, if core net CPU working conditions are abnormal, First state position is updated to default abnormal condition value.
S130:If sender receives reply message by FPGA in default time-out time Timeout, Judge whether first state position is default normal status value, if, it is determined that communication is normal,;If it is not, and It is determined that it is default abnormal condition value to send preset times detection messages to receive first state position, it is determined that Receiver system operation irregularity, the information of receiver system operation irregularity is sent to sender CPU;Send Fang Ruo does not receive reply message by FPGA in Timeout, it is determined that link is abnormal.
Such as:If sender receives reply message by FPGA in default time-out time Timeout, Judge whether first state position is default normal status value, if, it is determined that communication is normal, is by recipient Information working properly of uniting is sent to sender CPU;If it is not, and determine that transmission preset times detection messages are equal It is default abnormal condition value to receive first state position, it is determined that receiver system operation irregularity, is predicated The system failure, the information of receiver system operation irregularity is sent to sender CPU;If sender passes through FPGA does not receive reply message in Timeout, it is determined that link is abnormal, and doubtful link failure is believed Breath is sent to sender CPU.
Wherein, if it is not, and determining that it is default non-to send preset times detection messages to receive first state position Normal status value, then the information of receiver system operation irregularity is sent to sender CPU, is specially:If It is no, then the information of the doubtful operation irregularity of receiver system is sent to sender CPU, by FPGA to connecing Debit sends one and carries the detection messages of first state position again, if the of the reply message received again One mode bit is default abnormal condition numerical value, sends an inspection for carrying first state position to recipient again Text is observed and predicted, until transmission times exceeds preset times, then stops sending and sent to sender CPU receiving The information of system operation irregularity, and the information of receiver system operation irregularity is sent to sender CPU.
In wherein default time-out time Timeout, refer to that sender sends one in S110 and carries the The detection messages of one mode bit started in the period of timing.
The technical scheme of the present embodiment, performed by using FPGA hardware between sender and recipient Agreement, sender initiate link detecting request, and the request message that recipient will be initiated sender after receiving enters Row, which is replied, to be confirmed, is performed agreement by hardware, can be effectively reduced cpu load, increase operation rate; And accident analysis can be effectively carried out in cpu fault.Application program occur deadlock, interference cause Program fleet and the other failures of CPU and effectively the side of detection can be helped to sentence in the state of causing the machine of delaying Disconnected is the system failure rather than link failure.
On the basis of above-mentioned technical proposal, S110 preferably can be specially:Sender by FPGA to Recipient sends one and carries the detection messages of first state position, and starts timing, waits the reply of recipient Message.Wherein, start timing and refer to start recording round-trip delay, and S120 can be specially preferably: Sender is received back to multiple message by FPGA in default time-out time Timeout, and end timing obtains past Time delay t1 is returned, t1 is sent to sender CPU to calculate renewal Timeout, by the Timeout after renewal Sender is fed back to FPGA.S110 and S120, which is so set, to be advantageous in that and receives reply message every time Timeout is recalculated by sender CPU, and is allocated to FPGA module, is avoided in reality Chain-circuit time delay occurs that erroneous judgement or live effect are bad when uncertain.
On the basis of above-mentioned technical proposal, in S110:T1 is sent to sender CPU to calculate more New Timeout, preferably:T1 is sent to sender CPU, by sender CPU according to t1 and The Timeout=t1+t1x δ of renewal are calculated in the δ that user is set.
On the basis of above-mentioned technical proposal, in S110:T1 is sent to sender CPU to calculate more New Timeout, preferably:First upper electricity or link change are determine whether by sender CPU, if It is that t1 is then sent to sender CPU to calculate renewal Timeout, the Timeout after renewal is fed back Sender when link is constant if not, does not update Timeout values to FPGA to FPGA.Such benefit It is to receive reply message every time to judge whether link change Determines are carried out by sender CPU Again Timeout, and FPGA module is allocated to, reduction computes repeatedly, more efficient, and avoids in reality Border chain-circuit time delay occurs that erroneous judgement or live effect are bad when uncertain, one can be obtained in link change The time delay reference value being consistent with physical link time delay.
Embodiment two
A kind of flow chart of the method for judgement link failure that Fig. 2 provides for the embodiment of the present invention two, this implementation For example on the basis of the various embodiments described above, further optimize sender sends and receives step.
S210:Sender is periodically sent out by the predetermined period DetectInterval in FPGA to recipient The detection messages for carrying first state position are sent, wait the reply message of recipient.
S220:Recipient receives detection messages by FPGA, according to recipient CPU working condition First state position is updated, and the reply message with first state position is replied to sender.
S230:If sender receives reply message by FPGA in default time-out time Timeout, Judge whether first state position is default normal status value, if so, then by receiver system letter working properly Breath is sent to sender CPU, then puts Timeout timer completely, into next detection cycle; If it is not, and determine that it is default abnormal condition value to send preset times detection messages to receive first state position, By TimeOut timer clear 0, while by DetectInterval timer clear 0, be judged as system therefore Barrier, sender CPU is sent to by the information of receiver system operation irregularity;If sender passes through FPGA Do not receive reply message in Timeout, then by TimeOut timer clear 0, simultaneously will DetectInterval timer clear 0, sender CPU is sent to by doubtful link failure information.
Wherein, Timeout timer is put completely, as entered again in next detection cycle as needed Downlink probe is prepared, when such as sending the detection messages for carrying first state position again according to S210, Start the timer for putting full Timeout;TimeOut timer clear 0, as Timeout time Wait is used up to the time limit, by DetectInterval timer clear 0, as DetectInterval timing The device time is used up and carries first state for one to recipient's transmission, it is necessary to perform S210 and perform next cycle The operation of the detection messages of position.
As shown in Figure 4, Figure 5 and Figure 6, the present embodiment is explained by following instance:In main survey side user Configure an acceptable error time delay coefficient δ and detection time interval D etectInterval.δ:When be delayed Poor coefficient, scope are 0~5, default time 1;DetectInterval:The time interval of fault detect, Can be 1ms~1s, default time 1s with scope;To wait timeout interval, default time is TimeOut 1s。
1) sender eNode is sent out by FPGA by DetectInterval gap periods to tested measurement equipment State=0 DetectRequest messages are sent, and starts TimeOut (being defaulted as 1s) timer and opens Beginning timing, then wait recipient EPC reply message.
2) after recipient EPC receives the message, by State positions if recipient's EPC system is normal 1 is put, otherwise State positions 0, and DetectReply message is replied to initiation sender eNode.
3) after sender eNode receives DetectReply by FPGA, it is past that stopping counting recording this Time delay t1 is returned, and the side of returning back CPU sends t1 and State.
The δ that sender eNodeCPU is set according to t1 and user is calculated, and obtains error real at one Meet the desired wait timeout interval TimeOut of client, and feed back to passback side FPGA: TimeOut=t1+t1x δ, while carry out breakdown judge.
If sender eNode receives reply report by FPGA in default time-out time Timeout Text simultaneously judges State=1, then it is assumed that receiving device system is normal, and TimeOut is put completely.
If sender eNode receives reply report by FPGA in default time-out time Timeout The State=0 of text, doubtful receiver system are delayed machine, by TimeOut clear 0, while by DetectInterval Clear 0, immediately to tested second of detection of initiation, if still receiving returning for State=0 after third time detects It is multiple, then it is assumed that recipient's EPC system is delayed machine.To CPU sending and receiving devices failure messages, and stop Backup receiving device is switched to its passback or by return path.
If sender eNode does not receive any reply by FPGA in default time-out time Timeout, Then doubtful link is unreachable, by TimeOut clear 0, while by DetectInterval clear 0.Notify CPU, ICMP echo request messages are initiated from sender eNodeCPU to it, are visited according to corresponding error message Surveyor's chain road abort situation.
The technical scheme of the present embodiment, time delay negotiation is carried out by hardware (FPGA), effectively reduces CPU Load, increases operation rate;The method that passage time consults determines the waiting-timeout being consistent with a business time, Can effectively evade chain-circuit time delay it is unstable, fixing band do not come failure erroneous judgement influence;Can be in cpu fault When effectively carry out accident analysis.Application program occur deadlock, interference cause program fleet and CPU Other failures and cause that can effectively to help in the state of the machine of delaying the side of detection to judge be the system failure rather than chain Road failure.
On the basis of above-mentioned each embodiment, after S230 determination link exception, in addition to S240: Sender CPU initiates ICMP echo request messages to recipient, is detected by obtained error message Link failure position.The benefit of the S240 is that can find link failure position by step, pass through S210-S240 can be with the abort situation of quick diagnosis outgoing link.
Embodiment three
Fig. 7 show a kind of structural representation of the system of judgement link failure of the offer of the embodiment of the present invention three, The present embodiment is applicable to the interaction of any two equipment in IP network, and be particularly suitable for use in base station in IP network It is as follows with the interaction scenario of core net, the concrete structure of the system:
The system includes sender 310 and recipient 320, and sender 310 includes the first FPGA module 311st, the second FPGA module 312 and sender CPU313, recipient 320 include the 3rd FPGA Module 321 and recipient CPU322.Sender CPU313 connect respectively the first FPGA module 311, Second FPGA module 312, the FPGA module 321 of recipient CPU322 connections the 3rd, the first FPGA Module 311, the second FPGA module and the 3rd FPGA module 321 are complete by IP network link connection respectively Into interactive communication.
First FPGA module 311, for sending the detection messages for carrying first state position to recipient, Wait the reply message of recipient.
Wherein, sender includes sender system, the first FPGA module with sender CPU313 311 and second FPGA module 312, the first FPGA module 311 and the second FPGA module 312 Can be same fpga chip on hardware, the first sender can be base station, including base station CPU System and base station FPGA module.First state position is for representing recipient's CPU322 working conditions , first state position is default value in this step.Specifically, the first FPGA module 311 is sent out to recipient The detection messages for carrying first state position are sent, start the default time-out time Timeout timers of acquiescence Start timing, wait the reply message of recipient.
3rd FPGA module 321, for receiving detection messages, according to recipient CPU working condition First state position is updated, and the reply message with first state position is replied to sender.
Wherein, recipient 320 includes the receiver system with recipient CPU322 and recipient FPGA module 321, recipient 320 can be core net, include core net CPU system and core Net FPGA module.First state position is is updated according to core net CPU working conditions, if core net CPU working conditions are normal, then first state position is updated to default normal status value, if core net CPU works Make abnormal state, then first state position is updated to default abnormal condition value.
Second FPGA module 312, for receiving reply message in default time-out time Timeout, Judge whether first state position is default normal status value, if, it is determined that communication is normal;If it is not, and really Surely it is default abnormal condition value to send preset times detection messages and receive first state position, it is determined that is connect Debit's system is working properly;If sender does not receive reply message in Timeout, it is determined that link is different Often.Wherein, in the second FPGA module 312, if first state position is not default normal status value, and really Surely it is default abnormal condition value to send preset times detection messages and receive first state position, then will receive The information of method, system operation irregularity is sent to sender CPU313, is specially:If first state position is not pre- If normal status value, then the information of the doubtful operation irregularity of receiver system is sent to sender CPU313, The detection messages for carrying first state position are sent again to recipient, if the reply message received again First state position be default abnormal condition numerical value, send one to recipient again and carry first state position Detection messages, until transmission times exceed preset times, then stop sending and to sender CPU313 hairs The information of reception system operation irregularity is sent, and the information of receiver system operation irregularity is sent to sender CPU313。
Wherein, preset in time-out time Timeout, refer to that sender 310 passes through the first FPGA module 311, which send the detection messages with first state position, started in the period of timing.
Such as:For receiving reply message in default time-out time Timeout, first state position is judged Whether it is default normal status value, if, it is determined that communication is normal, by the letter that receiver system is working properly Breath is sent to sender CPU;If it is not, and determine that sending preset times detection messages receives first state Position is default abnormal condition value, it is determined that receiver system is working properly, by receiver system operation irregularity Information be sent to sender CPU;If sender does not receive reply message in Timeout, it is determined that Link is abnormal.
The technical scheme of the present embodiment, performed by using FPGA hardware between sender and recipient Agreement, passback side initiate link detecting request, and recipient will enter after receiving to the request message that passback side initiates Row, which is replied, to be confirmed, is performed agreement by hardware, can be effectively reduced cpu load, increase operation rate; Furthermore, it is possible to accident analysis is effectively carried out in cpu fault.There is deadlock, interference in application program Cause program fleet and the other failures of CPU and can effectively help to detect in the state of causing the machine of delaying It is the system failure rather than link failure that side, which judges,.
On the basis of above-mentioned technical proposal, the first FPGA module 311 preferably can be specially:To reception Side sends one and carries the detection messages of first state position, and starts timing, waits the reply message of recipient. Wherein, start timing and refer to start recording round-trip delay, and the second FPGA module 312 can preferably have Body is used for:Reply message is received in default time-out time Timeout, terminates timing and obtains round-trip delay T1, t1 is sent to sender CPU, judges whether first state position is default normal status value, if so, Receiver system information working properly is then sent to sender CPU;If it is not, and determine to send default time It is default abnormal condition value that number detection messages, which receive first state position, then receiver system works different Normal information is sent to sender CPU;, will if sender does not receive reply message in Timeout Doubtful link failure information is sent to sender CPU.
And sender CPU311, for calculating renewal Timeout according to t1, by the Timeout after renewal Feedback sender give the second FPGA module 312.First FPGA module 311, the second FPGA module 312 So being set with sender CPU311 is advantageous in that receiving reply message every time passes through sender CPU To recalculate a Timeout, and FPGA module is allocated to, avoids not knowing in physical link time delay When occur erroneous judgement or live effect it is bad.
On the basis of above-mentioned technical proposal, sender CPU311 is preferably specifically used for:According to t1 and use The Timeout=t1+t1x δ of renewal are calculated in the δ that family is set.
The parameter δ set by user, the method that passage time consults determine a time-out being consistent with business Stand-by period Timeout, can effectively evade chain-circuit time delay it is unstable, fixing band do not come failure erroneous judgement influence.
On the basis of above-mentioned technical proposal, sender CPU311 is preferably specifically used for:T1 is sent to Sender CPU updates Timeout to calculate, preferably:First upper electricity or link change are determine whether, If so, t1 is then sent to sender CPU to calculate renewal Timeout, by the Timeout after renewal Sender is fed back to FPGA, when link is constant if not, does not update Timeout values to FPGA.It is such Benefit is to receive to reply whether message judges link change Determines by sender CPU every time Timeout, and be allocated to FPGA module again is carried out, reduction computes repeatedly, more efficient, and avoids Occur that erroneous judgement or live effect are bad when physical link time delay is uncertain, can be obtained in link change One time delay reference value being consistent with physical link time delay.
Example IV
Fig. 8 is a kind of structural representation of the system for judgement link failure that the embodiment of the present invention four provides, this Embodiment is on the basis of the various embodiments described above, preferably by the first FPGA module and the second FPGA module Further optimization.The system includes sender 410 and recipient 420, and sender 410 includes first FPGA module 411, the second FPGA module 412 and sender CPU413, wrap in recipient 420 Include the 3rd FPGA module 421 and recipient CPU422.Sender CPU413 connects first respectively FPGA module 411, the second FPGA module 412, the FPGA module of recipient CPU422 connections the 3rd 421, the first FPGA module 411, the second FPGA module 412 are distinguished with the 3rd FPGA module 421 Interactive communication is completed by IP network link connection.
First FPGA module 411, is specifically used for:By predetermined period DetectInterval periodically to connecing Debit sends the detection messages for carrying first state position, waits the reply message of recipient.
3rd FPGA module 421, for receiving detection messages, according to recipient CPU working condition First state position is updated, and the reply message with first state position is replied to sender.
Second FPGA module 412, is specifically used for:Reply is received in default time-out time Timeout Message, judge whether first state position is default normal status value, if so, then by receiver system work just Normal information is sent to sender CPU, and Timeout timer is put completely, into next detection cycle; If it is not, and determine that it is default abnormal condition value to send preset times detection messages to receive first state position, Then by TimeOut timer clear 0, while by DetectInterval timer clear 0, it is by recipient The information of system operation irregularity is sent to sender CPU;If not receiving reply message in Timeout, By TimeOut timer clear 0, while by DetectInterval timer clear 0, by doubtful link therefore Barrier information is sent to sender CPU.
Wherein, Timeout timer is put full, as re-starts Timeout's as needed Time waits, and such as sends a detection report for carrying first state position again by the second FPGA module 412 Wen Shi, open the timer for putting full Timeout;TimeOut timer clear 0, as Timeout Time wait use up to the time limit, by DetectInterval timer clear 0, as DetectInterval Timer periods use up, it is necessary to the second FPGA module 412 perform next cycle to recipient send one The operation of the individual detection messages with first state position.
The present embodiment is explained by following instance:In the main acceptable error time delay of survey side's user configuration one Coefficient δ and detection time interval D etectInterval.δ:Time delay error coefficient, scope are 0~5, acquiescence Time is 1;DetectInterval:The time interval of fault detect, can be 1ms~1s with scope, during acquiescence Between be 1s;TimeOut is to wait timeout interval, default time 1s.
1) sender eNode the first FPGA module by DetectInterval gap periods to be detected Equipment sends State=0 DetectRequest messages, and it is fixed to start TimeOut (being defaulted as 1s) When device start timing, then wait recipient reply message.
2) after the 3rd FPGA module of recipient EPC receives the message, if receiver system is normal Then by State positions 1, otherwise State positions 0, and to one DetectReply of device replied for initiating passback Message.
3) after sender eNode the second FPGA module passback side receives DetectReply, meter is stopped Number scale records this round-trip delay t1, and the side of returning back CPU sends t1 and State.
The δ that sender eNode is set according to t1 and user is calculated, and real error meets when obtaining one The desired wait timeout interval TimeOut of client, and feed back to sender eNode the 2nd FPGA Module:TimeOut=t1+t1x δ, while carry out breakdown judge.
If sender eNode is received by the second FPGA module in default time-out time Timeout Reply message and judge State=1, then it is assumed that recipient's EPC system is normal, and TimeOut is put completely.
If sender eNode is received by the 2nd FPG modules As in default time-out time Timeout To the State=0 for replying message, doubtful receiver system is delayed machine, and TimeOut clear 0 simultaneously will DetectInterval clear 0, immediately to tested second of detection of initiation, if still received after third time detects State=0 reply, then it is assumed that receiver system is delayed machine.Send and receive to sender eNode CPU Equipment fault message, and stop being switched to backup receiving device to its passback or by return path.
Sender eNode is not received in default time-out time Timeout any by the second FPGA module Reply, then doubtful link is unreachable, by TimeOut clear 0, while by DetectInterval clear 0.Notice CPU, ICMP echo request messages are initiated to it from sender eNode CPU, according to corresponding Error message detecting link abort situation.
The technical scheme of the present embodiment, time delay negotiation is carried out by hardware (FPGA), effectively reduces CPU Load, increases operation rate;The method that passage time consults determines a waiting-timeout being consistent with specific business Time, it can effectively evade unstable chain-circuit time delay, link switching and erroneous effects caused by subjective judgement; Time delay through consultation can quickly carry out breakdown judge in real time, and the method consulted using hardware can be Accident analysis is effectively carried out during cpu fault, application program occur deadlock, interference cause program fleet, And the other failures of CPU and cause that can effectively to help in the state of the machine of delaying the side of detection to judge be system therefore Barrier rather than link failure.
On the basis of above-mentioned each embodiment, sender CPU413, it is specifically used for:Calculate renewal Timeout, give the Timeout feedback senders after renewal to the second FPGA module, get doubtful chain After the fault message of road, ICMP echo request messages are initiated to recipient, are visited by obtained error message Surveyor's chain road abort situation.Sender CPU413 benefit is that sender CPU413 passes through ICMP echo Request messages can find link failure position, can be with quick diagnosis by the system of this judgement link failure The abort situation of outgoing link.
In summary, the present invention carries out link failure judgement by way of hardware, effectively reduces CPU Load, increases operation rate, and system is delayed can make detection side in the case of machine effectively to detection side's answering system state It is determined that it is the non-transmitting link failure of the system failure.The method that passage time consults determines to wait time-out time, energy Effectively evading unstable chain-circuit time delay, link switching and false judgment caused by subjective judgement influences, and passes through The time delay of negotiation can quickly carry out breakdown judge in real time.And the method consulted using hardware can be in CPU Accident analysis is effectively carried out during failure, application program occur deadlock, interference cause program fleet and Other failures of CPU and causing can effectively help in the state of the machine of delaying the side of detection judge to be the system failure and Non- link failure.The present invention can count automatically in the IP network of redundancy when logical links changes The waiting-timeout time Timout being consistent with actual transmissions is calculated and adjusts, with real-time carry out breakdown judge, And breakdown judge and accident analysis also can be effectively carried out in cpu fault.
Pay attention to, above are only presently preferred embodiments of the present invention and institute's application technology principle.Those skilled in the art It will be appreciated that the invention is not restricted to specific embodiment described here, can enter for a person skilled in the art Row is various significantly to be changed, readjust and substitutes without departing from protection scope of the present invention.Therefore, though So the present invention is described in further detail by above example, but the present invention be not limited only to Upper embodiment, without departing from the inventive concept, other more equivalent embodiments can also be included, And the scope of the present invention is determined by scope of the appended claims.

Claims (14)

  1. A kind of 1. method for judging link failure, it is characterised in that including:
    Sender sends the detection messages for carrying first state position by FPGA to recipient, waits institute State the reply message of recipient;
    The recipient receives the detection messages by FPGA, according to recipient CPU work shape State updates the first state position, and replys the reply with the first state position to described sender Message;
    If described sender receives the reply message by FPGA in default time-out time Timeout, Judge whether the first state position is default normal status value, if, it is determined that communication is normal;If it is not, And it is to preset improper shape to determine that detection messages described in transmission preset times receive the first state position State value, it is determined that receiver system operation irregularity;If described sender by FPGA in Timeout not Receive the reply message, it is determined that link is abnormal.
  2. 2. according to the method for claim 1, it is characterised in that
    Described sender sends the detection messages for carrying first state position by FPGA to recipient, etc. The reply message of the recipient is treated, is specifically included:Sender sends a band by FPGA to recipient There are the detection messages of first state position, and start timing, wait the reply message of the recipient;
    If the described sender receives the reply message by FPGA in Timeout, specific bag Include:Described sender receives the reply message by FPGA in default time-out time Timeout, Terminate timing and obtain round-trip delay t1, t1 is sent to sender CPU to calculate renewal Timeout, will Timeout after renewal feeds back sender to FPGA.
  3. 3. according to the method for claim 2, it is characterised in that described that t1 is sent to sender CPU To calculate renewal Timeout, the Timeout after renewal is fed back into sender to FPGA, is specially:It is logical Cross sender CPU and determine whether first upper electricity or link change, if so, t1 then is sent into sender CPU calculates renewal Timeout, the Timeout after renewal is fed back into sender to FPGA, if not chain When road is constant, Timeout values are not updated to FPGA.
  4. 4. according to the method for claim 2, it is characterised in that described that t1 is sent to sender CPU updates Timeout to calculate, and specifically includes:T1 is sent to sender CPU, passes through sender The Timeout=t1+t1x δ of renewal are calculated in the δ that CPU is set according to t1 and user.
  5. 5. according to the method for claim 1, it is characterised in that described sender by FPGA to Recipient sends the detection messages for carrying first state position, specifically includes:
    Described sender is periodically sent by the predetermined period DetectInterval in FPGA to recipient One detection messages for carrying first state position.
  6. 6. according to the method for claim 1, it is characterised in that it is described to determine that communication is normal, specifically Including:Receiver system information working properly is sent to sender CPU, then by Timeout Timer put it is full, into next detection cycle;
    The determination receiver system operation irregularity, it is specially:By TimeOut timer clear 0, simultaneously By DetectInterval timer clear 0, the information of the receiver system operation irregularity is sent to transmission Square CPU;
    The determination link is abnormal, is specially:By TimeOut timer clear 0, simultaneously will DetectInterval timer clear 0, sender CPU is sent to by doubtful link failure information.
  7. 7. according to the method for claim 1, it is characterised in that after the determination link exception, Also include:Described sender CPU initiates ICMP echo request messages to the recipient, by The error message detecting link abort situation arrived.
  8. A kind of 8. system for judging link failure, it is characterised in that including:Sender and recipient, institute Stating sender includes the first FPGA module, the second FPGA module and sender CPU, described to connect Debit includes the 3rd FPGA module and recipient CPU;
    First FPGA module, for sending the detection messages for carrying first state position to recipient, etc. Treat the reply message of the recipient;
    3rd FPGA module, for receiving the detection messages, according to the work of the recipient CPU Make state and update the first state position, and replied to described sender with described in the first state position Reply message;
    Second FPGA module, for receiving the reply message in default time-out time Timeout, Judge whether the first state position is default normal status value, if, it is determined that communication is normal;If it is not, And it is to preset improper shape to determine that detection messages described in transmission preset times receive the first state position State value, it is determined that receiver system is working properly;If described sender does not receive described in Timeout Reply message, it is determined that link is abnormal.
  9. 9. system according to claim 8, it is characterised in that the first FPGA module is specific to use In:One is sent to recipient and carries the detection messages of first state position, and starts timing, is connect described in wait The reply message of debit;
    Second FPGA module, is specifically used for:Received in default time-out time Timeout described Message is replied, terminates timing and obtains round-trip delay t1, t1 is sent to sender CPU, judge described the Whether one mode bit is default normal status value, if, it is determined that communication is normal;If it is not, and determine to send It is default abnormal condition value that detection messages described in preset times, which receive the first state position, it is determined that Receiver system is working properly;If described sender does not receive the reply message in Timeout, Determine link exception;
    Described sender CPU, Timeout is updated for being calculated according to t1, the Timeout after renewal is anti- Feedback sender give second FPGA module.
  10. 10. system according to claim 9, it is characterised in that described sender CPU, for sentencing Whether fixed be first upper electricity or link change, if so, then renewal Timeout will be calculated according to t1, will more Timeout feedback senders after new when link is constant if not, do not update Timeout to FPGA to FPGA Value.
  11. 11. system according to claim 8, it is characterised in that described sender CPU, it is specific to use In:The Timeout=t1+t1x δ of renewal are calculated in the δ set according to t1 and user.
  12. 12. system according to claim 8, it is characterised in that
    First FPGA module, is specifically used for:By predetermined period DetectInterval periodically to Recipient sends the detection messages for carrying first state position.
  13. 13. system according to claim 8, it is characterised in that
    Second FPGA module, is specifically used for:Received in default time-out time Timeout described Message is replied, judges whether the first state position is default normal status value, if so, then by the reception Method, system information working properly is sent to sender CPU, then puts completely Timeout timer, enters Enter next detection cycle;If it is not, and detection messages described in determining to send preset times receive described the One mode bit is default abnormal condition value, then by TimeOut timer clear 0, simultaneously will DetectInterval timer clear 0, the information of the receiver system operation irregularity is sent to sender CPU;, will if described sender does not receive the reply message by FPGA in Timeout TimeOut timer clear 0, while by DetectInterval timer clear 0, by doubtful link failure Information is sent to sender CPU.
  14. 14. system according to claim 8, it is characterised in that described sender CPU, it is specific to use In:Renewal Timeout is calculated, gives the Timeout feedback senders after renewal to the 2nd FPGA moulds Block, after getting the doubtful link failure information, ICMP echo request are initiated to the recipient Message, pass through obtained error message detecting link abort situation.
CN201610340855.8A 2016-05-20 2016-05-20 A kind of method and system for judging link failure Withdrawn CN107404393A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610340855.8A CN107404393A (en) 2016-05-20 2016-05-20 A kind of method and system for judging link failure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610340855.8A CN107404393A (en) 2016-05-20 2016-05-20 A kind of method and system for judging link failure

Publications (1)

Publication Number Publication Date
CN107404393A true CN107404393A (en) 2017-11-28

Family

ID=60389537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610340855.8A Withdrawn CN107404393A (en) 2016-05-20 2016-05-20 A kind of method and system for judging link failure

Country Status (1)

Country Link
CN (1) CN107404393A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108908402A (en) * 2018-07-06 2018-11-30 浙江国自机器人技术有限公司 A kind of detection method and system of robot hardware
CN109327333A (en) * 2018-09-30 2019-02-12 潍柴动力股份有限公司 A kind of message stops paying out method and device
CN110138657A (en) * 2019-05-13 2019-08-16 北京东土军悦科技有限公司 Aggregated link switching method, device, equipment and the storage medium of inter-exchange
CN110519096A (en) * 2019-08-29 2019-11-29 西安电子工程研究所 RocketIO communication link detects automatically and restoration methods
CN112688826A (en) * 2019-10-18 2021-04-20 中车株洲电力机车研究所有限公司 Link diagnosis method, terminal device, link diagnosis system, and storage medium
CN116155774A (en) * 2022-12-20 2023-05-23 中国联合网络通信集团有限公司 Link detection method, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050141563A1 (en) * 1996-03-29 2005-06-30 Cisco Technology, Inc., A California Corporation Communication server apparatus providing XDSL services and method
CN1925429A (en) * 2006-09-30 2007-03-07 杭州华为三康技术有限公司 Method and equipment for realizing fast detection
CN102548011A (en) * 2011-01-04 2012-07-04 中国移动通信集团公司 Semi-persistent scheduling and receiving method, system and device of relaying access link
CN104917624A (en) * 2014-03-10 2015-09-16 华耀(中国)科技有限公司 Health check system and method for link aggregation path

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050141563A1 (en) * 1996-03-29 2005-06-30 Cisco Technology, Inc., A California Corporation Communication server apparatus providing XDSL services and method
CN1925429A (en) * 2006-09-30 2007-03-07 杭州华为三康技术有限公司 Method and equipment for realizing fast detection
CN102548011A (en) * 2011-01-04 2012-07-04 中国移动通信集团公司 Semi-persistent scheduling and receiving method, system and device of relaying access link
CN104917624A (en) * 2014-03-10 2015-09-16 华耀(中国)科技有限公司 Health check system and method for link aggregation path

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108908402A (en) * 2018-07-06 2018-11-30 浙江国自机器人技术有限公司 A kind of detection method and system of robot hardware
CN109327333A (en) * 2018-09-30 2019-02-12 潍柴动力股份有限公司 A kind of message stops paying out method and device
CN110138657A (en) * 2019-05-13 2019-08-16 北京东土军悦科技有限公司 Aggregated link switching method, device, equipment and the storage medium of inter-exchange
CN110138657B (en) * 2019-05-13 2021-11-09 北京东土军悦科技有限公司 Aggregation link switching method, device, equipment and storage medium between switches
CN110519096A (en) * 2019-08-29 2019-11-29 西安电子工程研究所 RocketIO communication link detects automatically and restoration methods
CN112688826A (en) * 2019-10-18 2021-04-20 中车株洲电力机车研究所有限公司 Link diagnosis method, terminal device, link diagnosis system, and storage medium
CN112688826B (en) * 2019-10-18 2022-05-20 中车株洲电力机车研究所有限公司 Link diagnosis method, terminal device, link diagnosis system, and storage medium
CN116155774A (en) * 2022-12-20 2023-05-23 中国联合网络通信集团有限公司 Link detection method, device and storage medium
CN116155774B (en) * 2022-12-20 2024-04-16 中国联合网络通信集团有限公司 Link detection method, device and storage medium

Similar Documents

Publication Publication Date Title
CN107404393A (en) A kind of method and system for judging link failure
JP5249950B2 (en) Method and system for utility network outage detection
CN100571173C (en) A kind of detection method of network node abnormality
CN102143522A (en) Method and equipment for processing radio link failure
CN103957538A (en) Method and device for detecting network quality
CN102064981A (en) Bidirectional forwarding detection (BFD) method and system
CN104272659B (en) The user equipment of the relative set in the method and communication network of data is transmitted in the communication network towards bag
CN106301986A (en) Chain circuit detecting method and device
CN102118278B (en) Method and system for measuring network conditions as well as method for monitoring network coverage
CN103036696A (en) Achievement method and system and corresponding device of online business
CN109728967A (en) Communication quality detection method, communication equipment and system
CN103684818A (en) Method and device for detecting failures of network channel
CN103957552B (en) The method for improving data communication reliability in automatic weather station
CN107404735A (en) A kind of uplink data transmission method and system, user equipment and base station
CN102137420A (en) Voice channel detection method and base station controller
CN102271067B (en) Network detecting method, apparatus and system
CN100563201C (en) A kind of method for detecting route unit fault and device
CN104243199A (en) Data transmission method and protection device of packet transport network
CN110971459B (en) Session fault detection method and device, terminal equipment and readable storage medium
CN101155078A (en) Method for fast locating IP network fault
CN102014054A (en) Sending method and equipment for keep-alive messages
CN106549784A (en) A kind of data processing method and equipment
EP3869740A1 (en) Network reliability testing method and apparatus
CN109474940A (en) Quality of service detection method and device
CN106982127B (en) Message detection and distribution method in convergence charging and tandem proxy device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20171128