CN105553766A - Monitoring method of abnormal node dynamic tracking cluster node state - Google Patents

Monitoring method of abnormal node dynamic tracking cluster node state Download PDF

Info

Publication number
CN105553766A
CN105553766A CN201510927404.XA CN201510927404A CN105553766A CN 105553766 A CN105553766 A CN 105553766A CN 201510927404 A CN201510927404 A CN 201510927404A CN 105553766 A CN105553766 A CN 105553766A
Authority
CN
China
Prior art keywords
module
abnormal
cluster
node
clustered node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510927404.XA
Other languages
Chinese (zh)
Inventor
崔维力
武新
寇韦韦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Original Assignee
TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd filed Critical TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Priority to CN201510927404.XA priority Critical patent/CN105553766A/en
Publication of CN105553766A publication Critical patent/CN105553766A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • H04L43/103Active monitoring, e.g. heartbeat, ping or trace-route with adaptive polling, i.e. dynamically adapting the polling rate

Landscapes

  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention discloses a monitoring method of an abnormal node dynamic tracking cluster node state. A cluster node state service module, a cluster node module, a cluster abnormal node detection module, and a cluster abnormal node queue module are comprised. The method concretely comprises the steps that: (A) the cluster node module sends a node report to the cluster node state service module and checks the existence of an offline abnormality, (B) the cluster abnormal node detection module carries out abnormal node detection on the cluster node module and transmits an abnormal node to the cluster abnormal node queue module, (C) the cluster abnormal node queue module sends the abnormal nodes in a queue to the cluster node state service module in a sequence and adjusts an abnormal judgement, (D) the cluster node state service module receives the abnormal judgement of the cluster abnormal node queue module and completes the monitoring of a cluster node state. According to the monitoring method, the cost of monitoring is saved, the sensitivity of the monitoring is raised, the stability of the monitoring service is ensured, the applicability is good, and the practicability is high.

Description

The monitoring method of abnormal nodes dynamic tracing clustered node state
Technical field
The invention belongs to distributed data base technique field, be specifically related to a kind of monitoring method of abnormal nodes dynamic tracing clustered node state.
Background technology
Along with the increase of data volume, the number of nodes of large-scale cluster often will reach more than 100 nodes, the timing of rank traditional second poll or the point being monitored according to the order of sequence mode of reporting can not adapt to the monitoring of large-scale cluster due to the reason that too much can consume resources such as networks, so just need a kind of new monitoring method namely will ensure that certain monitoring sensitivity also will accomplish the less consumption of the resources such as Multi net voting, under this requires, active poll and abnormal nodes just can be adopted to follow the trail of combination, namely longer polling interval is adopted to ensure not consume cluster resource with taking place frequently, set up again and follow the trail of queue, to occurring that the node of off-line exception does emphasis and follows the trail of, this tracking accomplishes that second rank is to ensure sensitivity.
In a lot of schemes of current existence, the mode of initiatively poll is only adopted to monitor, but maximum shortcoming is exactly bad setting interval time, overhead is excessive, interval time is oversize, and system sensitivity is too low, if interval time is too short, needs repeatedly to create a large amount of socket, just think on the super large cluster of 300 computing nodes, constantly create and delete tcp and connect and can cause how large impact to whole system.Also only adopt the mode of abnormal nodes dynamic tracing in some scheme, this mode is only followed the trail of abnormal nodes, although overhead is smaller, system sensitivity is high, but node cannot be pinpointed the problems when not having human hair socket, sometimes also can cause the problems such as deadlock.
Summary of the invention
In order to solve the problems of the technologies described above, it is good and greatly put forward the monitoring method of highly sensitive abnormal nodes dynamic tracing clustered node state that the present invention is to provide a kind of stability in use.
The technical scheme realizing the object of the invention is: a kind of monitoring method of abnormal nodes dynamic tracing clustered node state, comprise clustered node status service module, clustered node module, cluster detection of anomaly node module, cluster abnormal nodes Queue module, concrete grammar step is:
A, clustered node module are reported to clustered node status service module sending node, and whether self-inspection has off-line abnormal;
B, cluster detection of anomaly node module carry out detection of anomaly node to clustered node module, and confirm to judge in conjunction with the abnormal exception of carrying out of self-inspection off-line of clustered node module, when confirming to there is abnormal nodes, by this abnormal nodes to the transmission of cluster abnormal nodes Queue module;
C, cluster abnormal nodes Queue module sequentially to abnormal nodes in clustered node status service module transmit queue, and regulate with long abnormal judgement;
The exception that D, clustered node status service module receive cluster abnormal nodes Queue module judges, and judges as required to start or stop the abnormal tracking of second rank, thus completes the monitoring to clustered node state.
In step D, when cluster abnormal nodes Queue module is detected as long abnormality, clustered node status service module starts or continues the abnormal tracking of rank second; When the detection of cluster abnormal nodes Queue module is not long abnormality, clustered node status service module is suspended abnormal tracking of rank second and is recovered abnormal nodes poll simultaneously.
The quantity at least two of described clustered node status service module and be provided with handover module between clustered node status service module.
The quantity at least two of described cluster detection of anomaly node module and cluster detection of anomaly node module are all connected with cluster abnormal nodes Queue module.
The present invention has positive effect: the present invention adopts driving wheel source to ask and the clustered node of abnormal nodes dynamic tracing is monitored, thus greatly save the cost of monitoring and substantially increase the sensitivity of monitoring, ensure that the stability of Monitoring Service, applicability is good, practical.
Accompanying drawing explanation
In order to make content of the present invention be more likely to be clearly understood, below according to specific embodiment also by reference to the accompanying drawings, the present invention is further detailed explanation, wherein:
Fig. 1 is structured flowchart of the present invention;
Fig. 2 is concrete steps block diagram of the present invention;
Fig. 3 is that length of the present invention judges FB(flow block) extremely.
Embodiment
(embodiment 1)
Fig. 1 to Fig. 3 shows a kind of embodiment of the present invention, and wherein Fig. 1 is structured flowchart of the present invention; Fig. 2 is concrete steps block diagram of the present invention; Fig. 3 is that length of the present invention judges FB(flow block) extremely.
See Fig. 1 to Fig. 3, a kind of monitoring method of abnormal nodes dynamic tracing clustered node state, comprise clustered node status service module 1, clustered node module 2, cluster detection of anomaly node module 3, cluster abnormal nodes Queue module 4, concrete grammar step is:
A, clustered node module are reported to clustered node status service module sending node, and whether self-inspection has off-line abnormal;
B, cluster detection of anomaly node module carry out detection of anomaly node to clustered node module, and confirm to judge in conjunction with the abnormal exception of carrying out of self-inspection off-line of clustered node module, when confirming to there is abnormal nodes, by this abnormal nodes to the transmission of cluster abnormal nodes Queue module;
C, cluster abnormal nodes Queue module sequentially to abnormal nodes in clustered node status service module transmit queue, and regulate with long abnormal judgement;
The exception that D, clustered node status service module receive cluster abnormal nodes Queue module judges, and judges as required to start or stop the abnormal tracking of second rank, thus completes the monitoring to clustered node state.
In step D, when cluster abnormal nodes Queue module is detected as long abnormality, clustered node status service module starts or continues the abnormal tracking of rank second; When the detection of cluster abnormal nodes Queue module is not long abnormality, clustered node status service module is suspended abnormal tracking of rank second and is recovered abnormal nodes poll simultaneously.
The quantity at least two of described clustered node status service module and be provided with handover module 5 between clustered node status service module.
The quantity at least two of described cluster detection of anomaly node module and cluster detection of anomaly node module are all connected with cluster abnormal nodes Queue module.
The present invention adopts driving wheel source to ask and the clustered node of abnormal nodes dynamic tracing is monitored, thus greatly saves the cost of monitoring and substantially increase the sensitivity of monitoring, and ensure that the stability of Monitoring Service, applicability is good, practical.
Obviously, the above embodiment of the present invention is only for example of the present invention is clearly described, and is not the restriction to embodiments of the present invention.For those of ordinary skill in the field, can also make other changes in different forms on the basis of the above description.Here exhaustive without the need to also giving all execution modes.And these belong to connotation of the present invention the apparent change of extending out or variation still belong to protection scope of the present invention.

Claims (4)

1. a monitoring method for abnormal nodes dynamic tracing clustered node state, comprises clustered node status service module, clustered node module, cluster detection of anomaly node module, cluster abnormal nodes Queue module, it is characterized in that: concrete grammar step is:
A, clustered node module are reported to clustered node status service module sending node, and whether self-inspection has off-line abnormal;
B, cluster detection of anomaly node module carry out detection of anomaly node to clustered node module, and confirm to judge in conjunction with the abnormal exception of carrying out of self-inspection off-line of clustered node module, when confirming to there is abnormal nodes, by this abnormal nodes to the transmission of cluster abnormal nodes Queue module;
C, cluster abnormal nodes Queue module sequentially to abnormal nodes in clustered node status service module transmit queue, and regulate with long abnormal judgement;
The exception that D, clustered node status service module receive cluster abnormal nodes Queue module judges, and judges as required to start or stop the abnormal tracking of second rank, thus completes the monitoring to clustered node state.
2. the monitoring method of abnormal nodes dynamic tracing clustered node state according to claim 1, it is characterized in that: in step D, when cluster abnormal nodes Queue module is detected as long abnormality, clustered node status service module starts or continues the abnormal tracking of rank second; When the detection of cluster abnormal nodes Queue module is not long abnormality, clustered node status service module is suspended abnormal tracking of rank second and is recovered abnormal nodes poll simultaneously.
3. the monitoring method of abnormal nodes dynamic tracing clustered node state according to claim 2, is characterized in that: the quantity at least two of described clustered node status service module and be provided with handover module between clustered node status service module.
4. the monitoring method of abnormal nodes dynamic tracing clustered node state according to claim 3, is characterized in that: the quantity at least two of described cluster detection of anomaly node module and cluster detection of anomaly node module are all connected with cluster abnormal nodes Queue module.
CN201510927404.XA 2015-12-12 2015-12-12 Monitoring method of abnormal node dynamic tracking cluster node state Pending CN105553766A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510927404.XA CN105553766A (en) 2015-12-12 2015-12-12 Monitoring method of abnormal node dynamic tracking cluster node state

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510927404.XA CN105553766A (en) 2015-12-12 2015-12-12 Monitoring method of abnormal node dynamic tracking cluster node state

Publications (1)

Publication Number Publication Date
CN105553766A true CN105553766A (en) 2016-05-04

Family

ID=55832705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510927404.XA Pending CN105553766A (en) 2015-12-12 2015-12-12 Monitoring method of abnormal node dynamic tracking cluster node state

Country Status (1)

Country Link
CN (1) CN105553766A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106130761A (en) * 2016-06-22 2016-11-16 北京百度网讯科技有限公司 The recognition methods of the failed network device of data center and device
CN106817700A (en) * 2017-03-02 2017-06-09 中国人民解放军信息工程大学 Detection of anomaly node method based on multiple integrality remote proving
CN107231359A (en) * 2017-06-08 2017-10-03 山东超越数控电子有限公司 A kind of high-availability cluster node security method for monitoring state and system
CN110716842A (en) * 2019-10-09 2020-01-21 北京小米移动软件有限公司 Cluster fault detection method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101640610A (en) * 2009-09-02 2010-02-03 中兴通讯股份有限公司 Method and system for automatic discovery of Ethernet link
CN101938504A (en) * 2009-06-30 2011-01-05 深圳市融创天下科技发展有限公司 Cluster server intelligent dispatching method and system
CN102404390A (en) * 2011-11-07 2012-04-04 广东电网公司电力科学研究院 Intelligent dynamic load balancing method for high-speed real-time database
CN104460650A (en) * 2014-10-24 2015-03-25 中国科学院遥感与数字地球研究所 Fault diagnosis device and method for remote sensing satellite receiving system
CN104639347A (en) * 2013-11-07 2015-05-20 北大方正集团有限公司 Multi-cluster monitoring method and device, and system
KR101572672B1 (en) * 2012-01-05 2015-12-04 한국전자통신연구원 Method for monitoring node failure on communication network and system thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101938504A (en) * 2009-06-30 2011-01-05 深圳市融创天下科技发展有限公司 Cluster server intelligent dispatching method and system
CN101640610A (en) * 2009-09-02 2010-02-03 中兴通讯股份有限公司 Method and system for automatic discovery of Ethernet link
CN102404390A (en) * 2011-11-07 2012-04-04 广东电网公司电力科学研究院 Intelligent dynamic load balancing method for high-speed real-time database
KR101572672B1 (en) * 2012-01-05 2015-12-04 한국전자통신연구원 Method for monitoring node failure on communication network and system thereof
CN104639347A (en) * 2013-11-07 2015-05-20 北大方正集团有限公司 Multi-cluster monitoring method and device, and system
CN104460650A (en) * 2014-10-24 2015-03-25 中国科学院遥感与数字地球研究所 Fault diagnosis device and method for remote sensing satellite receiving system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106130761A (en) * 2016-06-22 2016-11-16 北京百度网讯科技有限公司 The recognition methods of the failed network device of data center and device
CN106130761B (en) * 2016-06-22 2019-06-18 北京百度网讯科技有限公司 The recognition methods of the failed network device of data center and device
CN106817700A (en) * 2017-03-02 2017-06-09 中国人民解放军信息工程大学 Detection of anomaly node method based on multiple integrality remote proving
CN106817700B (en) * 2017-03-02 2019-06-28 中国人民解放军信息工程大学 Detection of anomaly node method based on multiple integrality remote proving
CN107231359A (en) * 2017-06-08 2017-10-03 山东超越数控电子有限公司 A kind of high-availability cluster node security method for monitoring state and system
CN110716842A (en) * 2019-10-09 2020-01-21 北京小米移动软件有限公司 Cluster fault detection method and device
CN110716842B (en) * 2019-10-09 2023-11-21 北京小米移动软件有限公司 Cluster fault detection method and device

Similar Documents

Publication Publication Date Title
CN106453648B (en) Equipment state determination method and device for intelligent household equipment
CN109274557B (en) Intelligent CMDB management and cloud host monitoring method in cloud environment
US9141491B2 (en) Highly available server system based on cloud computing
CN105553766A (en) Monitoring method of abnormal node dynamic tracking cluster node state
CN106357469B (en) A kind of dynamic adjusting method and device of monitoring resource mode
CN104320311A (en) Heartbeat detection method of SCADA distribution type platform
CN104022904A (en) Unified management platform for IT devices in distributed computer rooms
CN103986604A (en) Method and device for locating network fault
CN107360239A (en) A kind of client connection status detection method and system
WO2015188553A1 (en) Link backup and power source backup method, device and system, and storage medium
WO2016065552A1 (en) Heartbeat cycle setting method and terminal
US11539609B2 (en) Method and apparatus for reporting power down events in a network node without a backup energy storage device
CN105739668A (en) Power management method and power management system of notebook computers
CN104468201A (en) Automatic deleting method and device for offline network equipment
CN104320285A (en) Website running status monitoring method and device
CN103684897A (en) Method, system and device for detecting network connectivity in client
CN105530145A (en) Agentless equipment monitoring network based on ZABBIX framework, networking method and monitoring method
CN109639640B (en) Message sending method and device
CN108933685A (en) A kind of the monitoring maintaining method and system of Intelligent hardware product
CN103841047A (en) Link aggregation method and device
CN103825765A (en) Method and device for device state polling
CN103941843A (en) Mode switching method and device
CN107682906B (en) Routing inspection data communication method and system in machine room
CN203984097U (en) Inclination of transmission line tower detection system based on stelliform connection topology configuration
EP4072106A1 (en) Dynamic environment monitoring

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160504