CN107204879B - A kind of distributed system adaptive failure detection method based on index rolling average - Google Patents

A kind of distributed system adaptive failure detection method based on index rolling average Download PDF

Info

Publication number
CN107204879B
CN107204879B CN201710413817.5A CN201710413817A CN107204879B CN 107204879 B CN107204879 B CN 107204879B CN 201710413817 A CN201710413817 A CN 201710413817A CN 107204879 B CN107204879 B CN 107204879B
Authority
CN
China
Prior art keywords
heartbeat
time
delay
value
delayed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710413817.5A
Other languages
Chinese (zh)
Other versions
CN107204879A (en
Inventor
姜晓红
代长波
李金昌
杜定益
陈广
吴朝晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201710413817.5A priority Critical patent/CN107204879B/en
Publication of CN107204879A publication Critical patent/CN107204879A/en
Application granted granted Critical
Publication of CN107204879B publication Critical patent/CN107204879B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The distributed system adaptive failure detection method based on index rolling average that the invention discloses a kind of includes four steps: time series data is collected, heartbeat prediction, exports diagnostic value and fault distinguishing.The fault detection method can be used for the fault detection in distributed system, find system failure hidden danger in time, reduce system failure risk.The present invention utilizes history heart sequence, the diagnostic value of one dynamic accumulative at any time of output, the threshold value set when according to system initialization, judge system interior joint whether failure.When calculating heartbeat predicted value, the influence weight of each history heartbeat message is calculated based on index rolling average, make influence weight at any time be incremented by and exponential decrease, while by variance ratio reduce be mutated history heartbeat influence weight.

Description

A kind of distributed system adaptive failure detection method based on index rolling average
Technical field
The invention belongs to distributed system technical fields, and in particular to a kind of distributed system based on index rolling average Adaptive failure detection method.
Background technique
With the development of distributed computing technology, distributed system is just being applied to the every aspect of people's daily life, electronics quotient The industries such as business, cloud storage, network communication and bank and security all by the building of its core business in distributed system with to visitor Family provides the service of fast and stable and safety.Fault detection is the basic component part of distributed system, is that guarantee system is reliable One of with the necessary means of stable operation;With being continuously increased for system scale and complexity, the difficulty of fault detection is also more next It is higher.
Adaptive failure detector can be according to system or network state, and dynamic adjusts detection parameters, when such as heartbeat timeout Between etc., the tracer fixed relative to tradition has better detection effect.Currently, adaptive failure detection technique is ground Studying carefully have been relatively mature, and many is gone out based on the adaptive tracer of heartbeat, and can substantially be summarized as two classes: one kind is logical History heart sequence is crossed, calculate the predicted value of next heartbeat using algorithms of different and detection time-out time is set according to predicted value, The testing result of this tracer is with two-value or failure or normally;In addition one kind be by the monitoring of failure with The adaptive failure detector of power of interpretation separation, the detector export one and change over time also with Heart-Beat Technology Accumulation decision value, user by setting prediction to determine whether failure, this detector can in integrated system not Different detection effect is generated with application, there is higher flexibility.
Summary of the invention
The present invention provides a kind of distributed system adaptive failure detection method based on index rolling average, Neng Gou Reduce failure detection time simultaneously with guaranteeing detection accuracy, improves fault detection efficiency, and there is stronger applicability.
A kind of distributed system adaptive failure detection method based on index rolling average, includes the following steps:
(1) the tested node every the set time into system sends heartbeat message and receives the response message of its return, To maintain to update the heartbeat time-delayed sequence that a designated length is n, n is the natural number greater than 1;
(2) according to the heartbeat time-delayed sequence, next heartbeat delay is calculated in the last heartbeat response arrival time Predicted value EIA0
(3) the predicted value EIA being delayed according to next heartbeat0Calculate the diagnostic value an of cumulative growth at any time And fault distinguishing is carried out to tested node according to the diagnostic value.
Heartbeat time-delayed sequence in the step (1) is by n heartbeat delay IA1~IAnChronologically from closely to remote arrangement group At the arrival time that any heartbeat delay in sequence is equal to its corresponding heartbeat response subtracts its preceding heartbeat response Arrival time;If heartbeat time-delayed sequence has been expired, it is being stored in newest heartbeat delay while is removing farthest heartbeat delay.
The predicted value EIA of next heartbeat delay is calculated in the step (2)0, detailed process is as follows:
2.1 for any heartbeat delay IA in heartbeat time-delayed sequencei, it is calculated under using the index method of moving average The influence weight φ of one heartbeat delayi
2.2 use Variance ratio method to influence weight φiIt is adjusted optimization, obtains heartbeat delay IAiNext heartbeat is prolonged When final influence weight θi
2.3 make the error mean of the delay of heartbeat in heartbeat time-delayed sequence and its predicted value as more than the safety predicted next time α is measured, and influences weight θ according to finaliCalculate the predicted value EIA of next heartbeat delay0
The calculation expression of the index method of moving average is as follows in the step 2.1:
Wherein:Expression rounds up, and i is natural number and 1≤i≤n.
Using Variance ratio method to influence weight φ in the step 2.2iIt is adjusted optimization, specific calculating process is as follows:
Wherein: μ and δ is respectively the mean value and standard deviation of heartbeat time-delayed sequence, vi=IAi- μ, Ψ (vi) it is that heartbeat is delayed IAiCorresponding variance ratio.
The safe clearance α predicted next time is calculated in the step 2.3 according to the following formula:
Wherein: EIAiFor heartbeat delay IAiPredicted value.
The predicted value EIA of next heartbeat delay is calculated in the step 23 according to the following formula0:
Diagnostic value is calculated according to the following formula in the step (3)
Wherein: TlastFor the arrival time of the last heartbeat response, t is the time.
Compared with existing Faults in Distributed Systems detection technique, Faults in Distributed Systems detection method of the present invention is based on index The method of moving average predicts next diagnostic value heartbeat delay and accumulated at any time using this predicted value as input, output one, is protecting Failure detection time is reduced to card detection accuracy simultaneously, improves fault detection efficiency;Especially lost in internet message In the environment that rate allows, fault detection method of the present invention has stronger applicability, can find system failure hidden danger in time, drops Low system failure risk.
Detailed description of the invention
Fig. 1 is the flow diagram of Faults in Distributed Systems detection method of the present invention.
Specific embodiment
In order to more specifically describe the present invention, with reference to the accompanying drawing and specific embodiment is to technical solution of the present invention It is described in detail.
Distributed system is abstracted into the cluster comprising two nodes { p, q } by present embodiment, and wherein p is as detection section Point, q is as detected node;As shown in Figure 1, the distributed system adaptive failure detection method includes the following steps:
(1) time series data is collected.
Detection node p sends heartbeat message to detected node q every the η time, and q returns to sound when receiving heartbeat message Answer message.Node p is in the internal heartbeat time series for maintaining a designated length n, when p receives the heartbeat response message of q, more Data in new local time series.
Delay { the IA of the heartbeat of nearest n is stored in time series1, IA2..., IAi..., IAn, wherein IA1It indicates most Nearly heartbeat delay, IAnIndicate farthest heartbeat delay in sequence.Assuming that i-th of heartbeat response arrival time is Ti, then:
IAi=Ti-Ti-1
When the local heartbeat time series of node p has been expired, oldest data are removed before being stored in newest heartbeat delay, are guaranteed The timeliness of data.
(2) heartbeat is predicted.
Heartbeat prediction specific steps can further be divided into: index rolling average weight computing, Variance ratio method weighed value adjusting, Safe clearance α is calculated.
2-1 index rolling average weight computing: the influence power that history heartbeat delay is delayed to next heartbeat refers at any time Number successively decreases, and the current closer point influence power of distance is bigger, on the contrary, smaller apart from remoter point influence power.For heartbeat message sequence Arrange { IA1, IA2..., IAi..., IAn, influencing weight is { φ1(β), φ2(β) ..., φi(β) ..., φn(β) }, wherein β is the constant for reconciling weight, between 0~1.
Influence weight φi(β) is defined as:
φi(β)=β (1- β)i-1, 1≤i≤n
It can be seen that 0 < φi(β) < 1, and the influence weight that heartbeat is delayed in time series exponentially successively decreases at any time.
2-2 Variance ratio method weighed value adjusting: the weight φ that the index method of moving average is calculatedi(β) is advanced optimized, it is assumed that It is poor that variable μ and δ respectively indicate heartbeat is delayed in time series mean value and SS, it may be assumed that
Then variance ratio ψ (vi) is defined as:
Wherein, νi=IAiBigger ψ (the ν of fluctuation of-μ, i.e. history heartbeat delayi) smaller, and ψ (νi)≤1。
The influence weight θ of final each history heartbeat delayiAre as follows:
θii(β)*ψ(νi), 1≤i≤n
2-3 safe clearance α is calculated: using the error mean of heartbeat Delay Forecast in time series as predicting next time Safe clearance, calculation formula are as follows:
Finally, calculating the predicted value of next heartbeat delay, calculation formula is as follows:
(3) diagnostic value is exported
By heartbeat Delay ForecastAnd current time t is as input, diagnostic valueCalculation formula are as follows:
Wherein: TlastFor the last heartbeat arrival time,0~+∞ of value range.
(4) fault distinguishing.
Node p is calculated in t momentIfThen assert that node q is normal, otherwise assert section Point q failure.
The above-mentioned description to embodiment is for that can understand and apply the invention convenient for those skilled in the art. Person skilled in the art obviously easily can make various modifications to above-described embodiment, and described herein general Principle is applied in other embodiments without having to go through creative labor.Therefore, the present invention is not limited to the above embodiments, ability Field technique personnel announcement according to the present invention, the improvement made for the present invention and modification all should be in protection scope of the present invention Within.

Claims (1)

1. a kind of distributed system adaptive failure detection method based on index rolling average, includes the following steps:
(1) the tested node every the set time into system sends heartbeat message and receives the response message of its return, thus It maintains to update the heartbeat time-delayed sequence that a designated length is n, n is the natural number greater than 1;
The heartbeat time-delayed sequence is by n heartbeat delay IA1~IAnChronologically from closely to far rearranging, any heart in sequence It jumps delay and is equal to the arrival time that the arrival time that its corresponding heartbeat responds subtracts its preceding heartbeat response;If heartbeat Time-delayed sequence has been expired, then is being stored in newest heartbeat delay while removing farthest heartbeat delay;
(2) according to the heartbeat time-delayed sequence, the pre- of next heartbeat delay is calculated in the last heartbeat response arrival time Measured value EIA0, detailed process is as follows:
2.1 for any heartbeat delay IA in heartbeat time-delayed sequencei, it is calculated under wholeheartedly using the index method of moving average Jump the influence weight φ of delayi, specific calculation expression is as follows:
Wherein:Expression rounds up, and i is natural number and 1≤i≤n;
2.2 use Variance ratio method to influence weight φiIt is adjusted optimization, obtains heartbeat delay IAiFor next heartbeat delay It is final to influence weight θi, specific calculation expression is as follows:
Wherein: μ and δ is respectively the mean value and standard deviation of heartbeat time-delayed sequence, vi=IAi- μ, Ψ (vi) it is heartbeat delay IAiIt is corresponding Variance ratio;
2.3 make the error mean of the delay of heartbeat in heartbeat time-delayed sequence and its predicted value as the safe clearance α predicted next time, Specific calculation expression is as follows:
Wherein: EIAiFor heartbeat delay IAiPredicted value;
And then weight θ is influenced according to finaliIt is calculated by the following formula out the predicted value EIA of next heartbeat delay0
(3) the predicted value EIA being delayed according to next heartbeat0It is calculated by the following formula out the diagnosis an of cumulative growth at any time ValueAnd fault distinguishing is carried out to tested node according to the diagnostic value;
Wherein: TlastFor the arrival time of the last heartbeat response, t is the time.
CN201710413817.5A 2017-06-05 2017-06-05 A kind of distributed system adaptive failure detection method based on index rolling average Active CN107204879B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710413817.5A CN107204879B (en) 2017-06-05 2017-06-05 A kind of distributed system adaptive failure detection method based on index rolling average

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710413817.5A CN107204879B (en) 2017-06-05 2017-06-05 A kind of distributed system adaptive failure detection method based on index rolling average

Publications (2)

Publication Number Publication Date
CN107204879A CN107204879A (en) 2017-09-26
CN107204879B true CN107204879B (en) 2019-09-20

Family

ID=59906687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710413817.5A Active CN107204879B (en) 2017-06-05 2017-06-05 A kind of distributed system adaptive failure detection method based on index rolling average

Country Status (1)

Country Link
CN (1) CN107204879B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019178714A1 (en) * 2018-03-19 2019-09-26 华为技术有限公司 Fault detection method, apparatus, and system
CN115190051B (en) * 2021-04-01 2023-09-05 中国移动通信集团河南有限公司 Heartbeat data identification method and electronic device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009062090A1 (en) * 2007-11-08 2009-05-14 Genetic Finance Holdings Limited Distributed network for performing complex algorithms
CN103117901B (en) * 2013-02-01 2016-06-15 华为技术有限公司 A kind of distributed heartbeat detection method, Apparatus and system
CN107133478A (en) * 2017-05-10 2017-09-05 南京航空航天大学 A kind of high speed incremental formula aero-engine method for detecting abnormality

Also Published As

Publication number Publication date
CN107204879A (en) 2017-09-26

Similar Documents

Publication Publication Date Title
Wang et al. LDPA: A local data processing architecture in ambient assisted living communications
US20020178275A1 (en) Algorithm for prioritization of event datum in generic asynchronous telemetric streams
CN107204879B (en) A kind of distributed system adaptive failure detection method based on index rolling average
CN111198808A (en) Method, device, storage medium and electronic equipment for predicting performance index
CN112615742A (en) Method, device, equipment and storage medium for early warning
Rodner et al. Data-driven generation of rule-based behavior models for an ambient assisted living system
Dewasurendra et al. Evidence filtering
Yürür et al. Energy-efficient and context-aware smartphone sensor employment
Banerjee et al. Decentralized sequential change detection with ordered CUSUMs
Nguyen et al. Applications of anomaly detection using deep learning on time series data
Nelus et al. Privacy-preserving variational information feature extraction for domestic activity monitoring versus speaker identification
CN101237357A (en) Online failure detection method for industrial wireless sensor network
Abid et al. Centralized KNN anomaly detector for WSN
Maksimović et al. Comparative analysis of data mining techniques applied to wireless sensor network data for fire detection
CN116702006A (en) Abnormality determination method, abnormality determination device, computer device, and storage medium
CN110138812B (en) Network Safety Analysis system
CN107730845B (en) Article losing-proof method, apparatus and terminal device
Yu et al. Data anomaly detection and data fusion based on incremental principal component analysis in fog computing
Zhuikov et al. Integration of context-aware control system in microgrid
Linets et al. Green technologies in identification systems of transport telecommunication networks
Kang et al. Alarm notification of body sensors utilising activity recognition and smart device application
Sun Research on Intelligent Predictive Analysis System Based on Embedded Wireless Communication Network
CN110887652A (en) Interactive multi-model detection method for vibration detection and displacement extraction of accelerometer
Liu et al. Reliability evaluation for wireless sensor network based on hierarchical weighted voting system
Dima et al. A network reliability oriented event detection scheme for Wireless Sensors and Actors Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant