CN107995018A - Fault detection method, LPU and distributed network communication equipment - Google Patents

Fault detection method, LPU and distributed network communication equipment Download PDF

Info

Publication number
CN107995018A
CN107995018A CN201610958144.7A CN201610958144A CN107995018A CN 107995018 A CN107995018 A CN 107995018A CN 201610958144 A CN201610958144 A CN 201610958144A CN 107995018 A CN107995018 A CN 107995018A
Authority
CN
China
Prior art keywords
information
business task
lpu
value
flag bit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610958144.7A
Other languages
Chinese (zh)
Inventor
何三波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Maipu Communication Technology Co Ltd
Original Assignee
Maipu Communication Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Maipu Communication Technology Co Ltd filed Critical Maipu Communication Technology Co Ltd
Priority to CN201610958144.7A priority Critical patent/CN107995018A/en
Publication of CN107995018A publication Critical patent/CN107995018A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Abstract

The invention discloses a kind of fault detection method, LPU and distributed network communication equipment, it is related to distributed software field, is detected for breaking down to LPU.The fault detection method, for carrying out fault detect to the business task of LPU, this method includes:Before a business task control message processing, the first information is recorded, the first information indicates that the business task will enter process flow;After the business task has handled the control message, the second information is recorded, second information indicates that the business task completes the process flow;If detected by continuous n times, it is only able to detect the first information and can not detects second information, then judge that the business task breaks down.The embodiment of the present invention is applied to distributed network communication equipment.

Description

Fault detection method, LPU and distributed network communication equipment
Technical field
The present invention relates to distributed software field, more particularly to a kind of fault detection method, LPU and distributed network communication Equipment.
Background technology
With reference to shown in Fig. 1, distributed network communication equipment includes MPU (English full name:Master process unit, Chinese full name:Master control processing unit, referred to as:Main control card) and multiple distributed elements LPU (English full name:line process Unit, Chinese full name:Line Processing Unit, referred to as:Line card).In the prior art, LPU handles the control message stream that MPU is sent After breaking down in journey, it is necessary to by hand to business task into line trace.This is undoubtedly one for general technical staff Bigger challenge.In addition, in the case of unattended, after LPU breaks down, LPU communications may be caused to interrupt for a long time, needed Position Failure And Recovery failure.
Based on this, LPU is upper, and it is necessary to provide a kind of machine of the control message failure of automatic detection business task processing MPU System, the automatic failure for detecting business task.
The content of the invention
The embodiment of the present invention provides a kind of fault detection method, LPU and distributed network communication equipment, for LPU Failure is detected.
To reach above-mentioned purpose, the embodiment of the present invention adopts the following technical scheme that:
First aspect, there is provided a kind of fault detection method, for carrying out fault detect, the party to the business task of LPU Method includes:
Before a business task control message processing, the first information is recorded, the first information indicates the business Task will enter process flow;
After the business task has handled the control message, the second information is recorded, second information indicates institute State business task and complete the process flow;
By setting the detection of number, if being consecutively detected the first information and second information is inconsistent, Judge that the business task breaks down.
Second aspect, there is provided a kind of LPU, it is characterised in that for carrying out failure inspection to multiple business tasks of LPU Survey, which includes:
Recording unit, for before a business task control message processing, recording the first information, the first information Indicate that the business task will enter process flow;
The recording unit, is additionally operable to after the business task has handled the control message, records the second information, Second information indicates that the business task completes the process flow;
Judging unit, for the detection by setting number, if being consecutively detected the first information and described second Information is inconsistent, then judges that the business task breaks down.
The third aspect, there is provided a kind of distributed network communication equipment, including LPU and MPU as described in second aspect, institute MPU is stated to be used to send control message to the LPU.
Fault detection method, LPU and the distributed network communication equipment that the embodiment of the present invention provides, by LPU Business task records first message before control message processing is carried out, and the postscript of control message processing is completed in the business task Second message is recorded, if being only able to detect first message by repeated detection and can not detect second message, illustrates the industry Business task can not complete the processing of control message, it can be considered that the business task is there occurs failure, it is achieved thereby that to LPU Failure is detected.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the structure diagram for the distributed network communication equipment that the embodiment of the present invention provides;
Fig. 2 is the flow diagram for the fault detection method that the embodiment of the present invention provides;
Fig. 3 is the structure diagram for the LPU that the embodiment of the present invention provides.
Embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work Embodiment, belongs to the scope of protection of the invention.
With reference to shown in Fig. 1, an embodiment of the present invention provides a kind of distributed network communication equipment, which includes:MPU 11 and LPU 12.MPU 11 and LPU 12 is communicated by intercard communication passage.There are multiple business modules on LPU 12, each Business module corresponds to a business task, and MPU 11 sends control message to some business module of LPU 12, led to by LPU 12 Cross control message classification to be sent to corresponding business task and handled, each business task on LPU 12 is sent to MPU 11 Message or protocol massages.Meanwhile monitor task is also run on LPU 12, for carrying out failure inspection to the business task of LPU 12 Survey, MPU 11 is additionally operable to after the failure message of the transmissions of LPU 12 is received, and is handled according to troubleshooting strategy on fault. Monitor task priority is arranged to highest, even if so business task can be caused endless loop occur, monitor task remains able to Operation.
The fault detection method of the embodiment of the present invention offer, device and system, are being carried out by the business task on LPU Instruction message is generated before and after control message processing respectively, if being only able to detect control message processing by repeated detection Instruction message before and can not detect the instruction message after control message processing, then illustrate that the business task can not be completed The processing of control message, it can be considered that the business task there occurs failure, is examined it is achieved thereby that breaking down to LPU Survey.
Embodiment 1,
The embodiment provides a kind of fault detection method, applied to above-mentioned distributed network communication equipment LPU In monitor task, with reference to shown in Fig. 2, this method includes:
S101, before a business task control message processing, record the first information, the first information indicate the business appoint Business will enter process flow.
After LPU receives the control message of MPU, the receiving queue of corresponding business task is put into after control message is classified.
Corresponding business task obtains control message from receiving queue, after getting control message, is carrying out handling it Before, the first information is recorded, represents that the process flow to the control message will be entered.The flow of specific control message processing The technology of realization has many kinds, if generally requiring the calling for the function for carrying out dried layer, details are not described herein.
First message can have many forms, as long as can show that the business task will be into process flow i.e. Can.For example, first message can be indicated with the count value of the first counter, when the first counter adds M i.e. expression record First message, M are positive integer, are preferably that the value of 1, M can be set based on experience value.
First message can also be indicated with the inversion operation of the first flag bit, represented when being negated to the first flag bit Record first message.First flag bit and the second flag bit can be set by global variable.
Those skilled in the art will also be appreciated that other modes to record first message, and the present invention is not limited thereto.
S102, after the business task has handled above-mentioned control message, record the second information, the second information indicates the industry Business task completes above-mentioned process flow.
Corresponding with first message, second message can also have many forms, as long as can show that the business task Complete above-mentioned process flow., equally can be with for example, when by the count value to the first counter to indicate first message Indicate second message with the count value of the second counter, when the second counter equally adds M i.e. expression record second message.
, equally can taking with the second flag bit when by the inversion operation to the first flag bit to indicate first message Inverse operations indicates second message, when being negated to the second flag bit i.e. expression record second message.Those skilled in the art are also It is contemplated that other modes record second message, the present invention is not limited thereto.
It should be noted that record first message needs to use identical processing mode with second message, and work as and use During same treatment mode, used default value should be identical.For example, when all using to count, the first counter Default value it is identical with the default value of the second counter;When all using flag bit inversion operation, the default value of the first flag bit It is identical with the default value of the second flag bit.
S103, the detection by setting number, if being consecutively detected the first information and the second information is inconsistent, judge The business task breaks down.
If after the complete first information of business task record, when being controlled Message Processing, there is such as internal storage access Cross the border, obtain less than semaphore, endless loop when catastrophe failure, process flow can not be exited all the time, therefore second cannot be recorded Information, i.e., can not update the value of the second counter or perform the operation that the second flag bit negates, monitor task can continuous n times Detection, if the value of the continuous n times detection first information and the second information is inconsistent and the first information ratio big M of second value of information, Think that failure occurs in the business task.The value of N can be set based on experience value.
Specifically, when indicating first message by the counting to the first counter, pass through the counting to the second counter During indicating second message, by setting the detection of number, if being consecutively detected the value of the first counter than the second counter The more M of value, then judge that the business task breaks down.
When indicating first message by the inversion operation to the first flag bit, pass through the inversion operation to the second flag bit During indicating second message, by setting the detection of number, if being consecutively detected the value of the first flag bit not equal to the second mark The value of will position, then judge that the business task breaks down.
After monitor task judges that the business task breaks down, function call can also be carried out to the business task The HTM method (trace), to navigate to the function to break down, and function call hierarchical relationship recorded locally or with event The form of barrier message is sent to MPU, by MPU after the failure message of LPU transmissions is received, according to troubleshooting strategy to the event Barrier is handled, such as LPU is restarted.
Fault detection method provided in an embodiment of the present invention, control message processing is being carried out by the business task on LPU First message is recorded before, completes to record second message after control message processing in the business task, if by repeatedly inspection Survey is only able to detect first message and can not detect second message, then illustrates that the business task can not complete the place of control message Reason, it can be considered that the business task there occurs failure, is detected it is achieved thereby that breaking down to LPU.
Embodiment 2,
The embodiment provides a kind of LPU, for above-mentioned fault detection method, with reference to shown in Fig. 3, the LPU Including:
Recording unit 1201, for before a business task control message processing, recording the first information, the first information One business task of instruction will enter process flow;
Recording unit 1201, is additionally operable to after business task has handled control message, records the second information, the second information Instruction business task completes process flow;
Judging unit 1202, for the detection by setting number, if being consecutively detected the record of recording unit 1201 The first information and the second information are inconsistent, then judge that the business task breaks down.
Optionally, in a kind of possible design:
Recording unit 1201, specifically for adding M, M to be positive integer in the first counter;
Recording unit 1201, specifically for adding M to the second counter, the default value of the second counter and the first counter Default value is identical;
Judging unit 1202, specifically for the detection by setting number, if being consecutively detected the value of the first counter Than the second counter value more than M, then judge that the business task breaks down.
Optionally, in a kind of possible design:
Recording unit 1201, specifically for being negated to the first flag bit;
Recording unit 1201, specifically for being negated to the second flag bit, the default value of the second flag bit and the first flag bit Default value it is identical;
Judging unit 1202, specifically for the detection by setting number, if being consecutively detected the value of the first flag bit Not equal to the value of the second flag bit, then judge that the business task breaks down.
Optionally, in a kind of possible design, with reference to shown in Fig. 3, LPU is further included:
Tracking cell 1203, for after judging unit 1202 judges that the business task breaks down, appointing to the business Business carries out function call the HTM method, and function call hierarchical relationship recorded local or be sent to MPU.
Since the LPU in the embodiment of the present invention can be applied to above-mentioned fault detection method, it can be obtained skill Art effect also refers to above method embodiment, and details are not described herein for the embodiment of the present invention.
It should be noted that recording unit, judging unit and tracking cell can be the processor individually set up, can also It is integrated in some processor of controller and realizes, in addition it is also possible to is stored in depositing for controller in the form of program code In reservoir, called by some processor of controller and perform the function of above recording unit, judging unit and tracking cell. Processor described here can be a central processing unit (English full name:Central processing unit, English letter Claim:), or specific integrated circuit (English full name CPU:Application specific integrated circuit, English abbreviation:ASIC), or be arranged to implement the embodiment of the present invention one or more integrated circuits.
It is to be understood that in various embodiments of the present invention, the size of the sequence number of above-mentioned each flow is not meant to perform suitable The priority of sequence, the execution sequence of each flow should be determined with its function and internal logic, without the implementation of the reply embodiment of the present invention Flow forms any restriction.
Those of ordinary skill in the art may realize that each exemplary list described with reference to the embodiments described herein Member and algorithm steps, can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually Performed with hardware or software mode, application-specific and design constraint depending on technical solution.Professional technician Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed The scope of the present invention.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific workflow of device and unit, may be referred to the correspondence flow in preceding method embodiment, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, apparatus and method, can be with Realize by another way.For example, apparatus embodiments described above are only schematical, for example, the unit Division, is only a kind of division of logic function, can there is other dividing mode, such as multiple units or component when actually realizing Another system can be combined or be desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be the indirect coupling by some interfaces, equipment or unit Close or communicate to connect, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit The component shown may or may not be physical location, you can with positioned at a place, or can also be distributed to multiple In network unit.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units integrate in a unit.
If the function is realized in the form of SFU software functional unit and is used as independent production marketing or in use, can be with It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words The part to contribute to the prior art or the part of the technical solution can be embodied in the form of software product, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be People's computer, server, or network equipment etc.) perform all or part of step of each embodiment the method for the present invention. And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (English full name:Read-only memory, English letter Claim:ROM), random access memory (English full name:Random access memory, English abbreviation:RAM), magnetic disc or light Disk etc. is various can be with the medium of store program codes.
The above description is merely a specific embodiment, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims (10)

1. a kind of fault detection method, it is characterised in that for carrying out fault detect, the method bag to the business task of LPU Include:
Before a business task control message processing, the first information is recorded, the first information indicates the business task Process flow will be entered;
After the business task has handled the control message, the second information is recorded, second information indicates the industry Business task completes the process flow;
By setting the detection of number, if being consecutively detected the first information and second information is inconsistent, judge The business task breaks down.
2. according to the method described in claim 1, it is characterized in that,
The record first information, including:First counter adds M, M to be positive integer;
The second information of the record, including:Second counter adds M, and the default value of second counter is counted with described first The default value of device is identical;
It is described to be consecutively detected the first information and second information is inconsistent, including:It is consecutively detected first meter Number devices value than second counter value more than M.
3. according to the method described in claim 1, it is characterized in that,
The record first information, including:First flag bit negates;
The second information of the record, including:Second flag bit negates, the default value of second flag bit and the described first mark The default value of position is identical;
It is described to be consecutively detected the first information and second information is inconsistent, including:It is consecutively detected first mark The value of will position is not equal to the value of second flag bit.
4. according to the method described in claim 1, it is characterized in that, it is described judge that the business task breaks down after, The method further includes:
Function call the HTM method is carried out to the business task, and function call hierarchical relationship recorded local or send To MPU.
5. a kind of LPU, it is characterised in that for carrying out fault detect to the business task of LPU, the LPU includes:
Recording unit, for before a business task control message processing, recording the first information, the first information instruction One business task will enter process flow;
The recording unit, is additionally operable to after the business task has handled the control message, records the second information, described Second information indicates that the business task completes the processing procedure flow;
Judging unit, for the detection by setting number, if being consecutively detected the first information and second information It is inconsistent, then judge that the business task breaks down.
6. LPU according to claim 5, it is characterised in that
The recording unit, specifically for adding M, M to be positive integer in the first counter;
The recording unit, specifically for adding M to the second counter, the default value of second counter is counted with described first The default value of device is identical;
The judging unit, specifically for the detection by setting number, if being consecutively detected the value of first counter Than second counter value more than M, then judge that the business task breaks down.
7. LPU according to claim 5, it is characterised in that
The recording unit, specifically for being negated to the first flag bit;
The recording unit, specifically for being negated to the second flag bit, the default value of second flag bit and the described first mark The default value of will position is identical;
The judging unit, specifically for the detection by setting number, if being consecutively detected the value of first flag bit Not equal to the value of second flag bit, then judge that the business task breaks down.
8. LPU according to claim 5, it is characterised in that the LPU is further included:
Tracking cell, for after the judging unit judges that the business task breaks down, to the business task into Line function calls the HTM method, and function call hierarchical relationship recorded local or be sent to MPU.
The equipment 9. a kind of distributed network communicates, including LPU and MPU as any one of claim 5-8,
The MPU is used to send control message to the LPU.
The equipment 10. distributed network as claimed in claim 9 communicates, it is characterised in that the MPU is additionally operable to receiving After the failure message for stating LPU transmissions, handled according to troubleshooting strategy on fault.
CN201610958144.7A 2016-10-27 2016-10-27 Fault detection method, LPU and distributed network communication equipment Pending CN107995018A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610958144.7A CN107995018A (en) 2016-10-27 2016-10-27 Fault detection method, LPU and distributed network communication equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610958144.7A CN107995018A (en) 2016-10-27 2016-10-27 Fault detection method, LPU and distributed network communication equipment

Publications (1)

Publication Number Publication Date
CN107995018A true CN107995018A (en) 2018-05-04

Family

ID=62029359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610958144.7A Pending CN107995018A (en) 2016-10-27 2016-10-27 Fault detection method, LPU and distributed network communication equipment

Country Status (1)

Country Link
CN (1) CN107995018A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101291201A (en) * 2007-12-05 2008-10-22 福建星网锐捷网络有限公司 Heart beat information transmission system and method
CN101610212A (en) * 2009-07-27 2009-12-23 迈普通信技术股份有限公司 Realize the method and the integrated circuit board of reliable data plane communication
CN101707536A (en) * 2009-11-25 2010-05-12 成都市华为赛门铁克科技有限公司 Fault detection method, line card and main control card
CN102143014A (en) * 2010-11-03 2011-08-03 华为数字技术有限公司 Single board failure detection method, single board and router
CN104836679A (en) * 2014-07-18 2015-08-12 中兴通讯股份有限公司 Communication abnormity processing method and network element equipment
CN105337765A (en) * 2015-10-10 2016-02-17 上海新炬网络信息技术有限公司 Distributed hadoop cluster fault automatic diagnosis and restoration system
CN105632040A (en) * 2015-12-30 2016-06-01 深圳泓数科技有限公司 Medical self-service printing terminal and printing medium output monitoring method and system thereof
CN105656715A (en) * 2015-12-30 2016-06-08 中国银联股份有限公司 Method and device for monitoring state of network device under cloud computing environment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101291201A (en) * 2007-12-05 2008-10-22 福建星网锐捷网络有限公司 Heart beat information transmission system and method
CN101610212A (en) * 2009-07-27 2009-12-23 迈普通信技术股份有限公司 Realize the method and the integrated circuit board of reliable data plane communication
CN101707536A (en) * 2009-11-25 2010-05-12 成都市华为赛门铁克科技有限公司 Fault detection method, line card and main control card
CN102143014A (en) * 2010-11-03 2011-08-03 华为数字技术有限公司 Single board failure detection method, single board and router
CN104836679A (en) * 2014-07-18 2015-08-12 中兴通讯股份有限公司 Communication abnormity processing method and network element equipment
CN105337765A (en) * 2015-10-10 2016-02-17 上海新炬网络信息技术有限公司 Distributed hadoop cluster fault automatic diagnosis and restoration system
CN105632040A (en) * 2015-12-30 2016-06-01 深圳泓数科技有限公司 Medical self-service printing terminal and printing medium output monitoring method and system thereof
CN105656715A (en) * 2015-12-30 2016-06-08 中国银联股份有限公司 Method and device for monitoring state of network device under cloud computing environment

Similar Documents

Publication Publication Date Title
CN103201724B (en) Providing application high availability in highly-available virtual machine environments
CN100405311C (en) Error monitoring of partitions in a computer system using supervisor partitions
US10445220B2 (en) System and methods for application activity capture, error identification, and error correction
CN106452846A (en) Fault processing method, virtual architecture management system and business management system
CN105551550B (en) A kind of nuclear power plant's intellectuality accident treatment code operating method and system
CN107168841A (en) The remote test method and device of a kind of mobile device
CN106452818A (en) Resource scheduling method and resource scheduling system
CN102402395A (en) Quorum disk-based non-interrupted operation method for high availability system
CN109189640A (en) Monitoring method, device, computer equipment and the storage medium of server
JP5692414B2 (en) Detection device, detection program, and detection method
KR20130101548A (en) Improving reliability in distributed environments
CN106445292A (en) Doppelganger management method and system for application program
CN108762966A (en) System exception hold-up interception method, device, computer equipment and storage medium
CN103559124A (en) Fast fault detection method and device
CN107861797A (en) A kind of method for early warning and device based on JVM
CN104765672B (en) Error code monitoring method, device and equipment
CN109213658A (en) A kind of method for inspecting and device
US11782753B2 (en) Node-local-unscheduler for scheduling remediation
CN107656847A (en) Node administration method, system, device and storage medium based on distributed type assemblies
CN106878096A (en) VNF state-detections notifying method, device and system
US20140053028A1 (en) Anomaly detection at the level of run time data structures
CN108279993A (en) The method and device and electronic equipment that realization business degrades
CN107995018A (en) Fault detection method, LPU and distributed network communication equipment
CN108733536A (en) Monitoring management system and method
CN114936106A (en) Method, device and medium for processing host fault

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180504