CN106656682B - Cluster heartbeat detecting method, system and device - Google Patents

Cluster heartbeat detecting method, system and device Download PDF

Info

Publication number
CN106656682B
CN106656682B CN201710106792.4A CN201710106792A CN106656682B CN 106656682 B CN106656682 B CN 106656682B CN 201710106792 A CN201710106792 A CN 201710106792A CN 106656682 B CN106656682 B CN 106656682B
Authority
CN
China
Prior art keywords
node
data node
abnormal
back end
query result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710106792.4A
Other languages
Chinese (zh)
Other versions
CN106656682A (en
Inventor
林仁杰
王剑雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wangsu Science and Technology Co Ltd
Original Assignee
Wangsu Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wangsu Science and Technology Co Ltd filed Critical Wangsu Science and Technology Co Ltd
Priority to CN201710106792.4A priority Critical patent/CN106656682B/en
Publication of CN106656682A publication Critical patent/CN106656682A/en
Application granted granted Critical
Publication of CN106656682B publication Critical patent/CN106656682B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/044Network management architectures or arrangements comprising hierarchical management structures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning

Abstract

The invention discloses a kind of cluster heartbeat detecting method, system and devices, belong to computer communication technology field.Wherein, this method comprises the following steps: monitoring node pre-generatmg Polygon Topology structure, and is sent to back end according to the Polygon Topology Structure Creating configuration file, and by the configuration file;The back end periodically carries out heartbeat inquiry according to the configuration file, to its adjacent data node, and the abnormal heartbeats query result of generation is reported to the monitoring node;The monitoring node is according to the abnormal heartbeats query result and the back end total number of the abnormal heartbeats query result is reported to determine malfunctioning node, wherein, the back end for reporting the abnormal heartbeats query result has same adjacent data node.The present invention carries out mutually heartbeat detection by back end back end adjacent thereto, and the load of monitoring node can be greatly lowered.

Description

Cluster heartbeat detecting method, system and device
Technical field
The present invention relates to computer communication technology field more particularly to a kind of cluster heartbeat detecting methods, system and device.
Background technique
In large-scale cluster or service monitoring system, heartbeat is to detect the whether normal online weight of cluster interior nodes Want means.
The typical method of current cluster heartbeat detection is the detection method using star-like heart beat configuration, that is, there is a monitoring Node, the monitoring node is responsible to carry out heartbeat communication with other all detected data nodes, once it monitors node and is detected This back end is labeled as malfunctioning node by the communication disruption of measured data node, monitoring node.
This scheme is very simple, but has a problem that: monitoring node needs to carry out heartbeat with a large amount of back end Communication may cannot respond to the heartbeat message from back end when monitoring that node is busy, cause back end overtime Erroneous judgement, and then may cause cluster crash.
Summary of the invention
In order to solve problems in the prior art, the embodiment of the invention provides a kind of cluster heartbeat detecting method, system and Device.The technical solution is as follows:
On the one hand, a kind of cluster heartbeat detecting method is provided, is included the following steps:
Monitor node pre-generatmg Polygon Topology structure, and according to the Polygon Topology Structure Creating configuration file, and The configuration file is sent to back end;
The back end periodically carries out heartbeat inquiry according to the configuration file, to its adjacent data node, and will The abnormal heartbeats query result of generation reports to the monitoring node;
The monitoring node is according to the abnormal heartbeats query result and reports described in the abnormal heartbeats query result Back end total number determines malfunctioning node, wherein the back end for reporting the abnormal heartbeats query result is equal There is same adjacent data node.
Further, described to be specifically included according to the step of Polygon Topology Structure Creating configuration file:
The monitoring node distributes the ID value of the vertex node of the Polygon Topology structure to the back end;
The monitoring node is according to distribution to the ID value of the back end and the top of the Polygon Topology structure Neighbouring relations between point node, create the configuration file.
Further, the Polygon Topology structure is honeycomb hexagonal topology structure.
Further, the configuration file includes at least heartbeat polling interval time, response time-out time, heartbeat issuer Formula, the ID value of adjacent data node.
Further, the back end periodically carries out heartbeat to adjacent data node and looks into according to the configuration file It askes, and the step of abnormal heartbeats query result of generation is reported to the monitoring node specifically includes:
The back end is every the heartbeat polling interval time, according to the heartbeat inquiry mode to the configuration text Adjacent data node described in part carries out heartbeat inquiry;
If the back end is in the response time-out time, to adjacent data node described in the configuration file Heartbeat inquiry failure then generates abnormal heartbeats query result, and the abnormal heartbeats query result is reported to the monitoring and is saved Point.
Further, the abnormal heartbeats query result includes at least the data section for reporting abnormal heartbeats query result The ID value of the ID value of point and the abnormal data node of heartbeat inquiry failure.
Further, the monitoring node according to the abnormal heartbeats query result and reports the abnormal heartbeats inquiry knot The back end total number of fruit determines malfunctioning node, wherein the number for reporting the abnormal heartbeats query result There is the step of same adjacent data node according to node specifically:
In abnormal heartbeats query result described in the monitoring node statistics, the abnormal data node is by its different consecutive number The adjacent data node number reported according to node, when the adjacent data node number reaches setting number and is less than described different When the adjacent data node total number of regular data node, the monitoring node initiates fault inquiry to the abnormal data node, Judge whether the abnormal data node is the malfunctioning node;
In abnormal heartbeats query result described in the monitoring node statistics, the abnormal data node is by its different consecutive number The adjacent data node number reported according to node, when the adjacent data node number is equal to the adjacent of the abnormal data node When back end total number, judge the abnormal data node for the malfunctioning node.
On the other hand, a kind of palmus detection system, including monitoring node and back end, the monitoring node packet are provided Include topography module, configuration module, judgment module;The back end includes enquiry module, result-generation module;Wherein,
The topography module is used for pre-generatmg Polygon Topology structure;
The configuration module, for according to the Polygon Topology Structure Creating configuration file, and by the configuration file It is sent to the back end;
The enquiry module, for periodically carrying out heartbeat inquiry to its adjacent data node according to the configuration file;
The result-generation module, for the abnormal heartbeats query result of generation to be reported to the monitoring node;
The judgment module, for according to the abnormal heartbeats query result and reporting the abnormal heartbeats query result The back end total number determines malfunctioning node, wherein the data section for reporting the abnormal heartbeats query result Point has same adjacent data node.
Further, the configuration module is also used to the ID value of the vertex node of the Polygon Topology structure point It is assigned to the back end, and according to ID value and the Polygon Topology structure of the distribution to the back end Neighbouring relations between the node of vertex create the configuration file.
Further, the Polygon Topology structure is honeycomb hexagonal topology structure.
Further, the configuration file includes at least heartbeat polling interval time, response time-out time, heartbeat issuer Formula, the ID value of adjacent data node;
The enquiry module was also used to every the heartbeat polling interval time, according to the heartbeat inquiry mode to described Adjacent data node described in configuration file carries out heartbeat inquiry;
The result-generation module is also used in the response time-out time, and the enquiry module is to the configuration text When the inquiry failure of adjacent data nodes heart beat described in part, abnormal heartbeats query result is generated.
Further, the abnormal heartbeats query result includes at least the data section for reporting abnormal heartbeats query result The ID value of the ID value of point and the abnormal data node of heartbeat inquiry failure.
Further, the judgment module is also used to:
It counts in the abnormal heartbeats query result, the abnormal data node is reported by its different adjacent data node Adjacent data node number, when the adjacent data node number reaches setting number and is less than the abnormal data node When adjacent data node total number, Xiang Suoshu abnormal data node initiates fault inquiry, whether judges the abnormal data node For the malfunctioning node;
It counts in the abnormal heartbeats query result, the abnormal data node is reported by its different adjacent data node Adjacent data node number, when the adjacent data node that the adjacent data node number is equal to the abnormal data node is always a When number, judge the abnormal data node for the malfunctioning node.
In another aspect, providing a kind of heartbeat detection device, including monitoring node, the monitoring node include:
Topography module is used for pre-generatmg Polygon Topology structure;
Configuration module, for being sent according to the Polygon Topology Structure Creating configuration file, and by the configuration file To back end;
Judgment module, for executing the abnormal heartbeats query result reported after the configuration file according to the back end And the back end total number of the abnormal heartbeats query result is reported to determine malfunctioning node, wherein described to report institute The back end for stating abnormal heartbeats query result has same adjacent data node.
Further, the Polygon Topology structure is honeycomb hexagonal topology structure.
Further, the judgment module is also used to:
It counts in the abnormal heartbeats query result, the abnormal data node is reported by its different adjacent data node Adjacent data node number, when the adjacent data node number reaches setting number and is less than the abnormal data node When adjacent data node total number, Xiang Suoshu abnormal data node initiates fault inquiry, whether judges the abnormal data node For the malfunctioning node;
It counts in the abnormal heartbeats query result, the abnormal data node is reported by its different adjacent data node Adjacent data node number, when the adjacent data node that the adjacent data node number is equal to the abnormal data node is always a When number, judge the abnormal data node for the malfunctioning node.
Technical solution provided in an embodiment of the present invention has the benefit that
The present invention carries out mutually heartbeat detection by back end back end adjacent thereto, and then obtains abnormal heartbeats and look into It askes result and reports to monitoring node, monitoring node according to abnormal heartbeats query result and reports same abnormal data node Adjacent data node number determines malfunctioning node, and then the load of monitoring node is greatly lowered, simultaneously as each data section The adjacent data node of point is seldom, and then the load of each back end is extremely low, improves heartbeat detection between back end Stability and accuracy.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is the flow chart for the cluster heartbeat detecting method that the embodiment of the present invention one provides;
Fig. 2 is the detailed substeps flow chart of step S101 in the embodiment of the present invention one;
Fig. 3 is the honeycomb hexagonal topology structural schematic diagram that the embodiment of the present invention one provides;
Fig. 4 is the detailed substeps flow chart of step S102 in the embodiment of the present invention one;
Fig. 5 is the detailed substeps flow chart of step S103 in the embodiment of the present invention one;
Fig. 6 is the structural schematic diagram of palmus detection system provided by Embodiment 2 of the present invention;
Fig. 7 is the structural schematic diagram for the heartbeat detection device that the embodiment of the present invention three provides.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.
Embodiment one
The embodiment of the invention provides a kind of cluster heartbeat detecting methods to include the following steps: referring to Fig. 1
S101: monitoring node pre-generatmg Polygon Topology structure, and text is configured according to the Polygon Topology Structure Creating Part, and the configuration file is sent to back end.
In the present embodiment, which is a special back end, can be carried out with all back end Communication, including heartbeat communication.
In the present embodiment, described to be specifically included according to the step of Polygon Topology Structure Creating configuration file Two sub-steps of S1011-S1012, as shown in Figure 2.
S1011: monitoring node distributes ID (Identity, the identity) value of the vertex node of the Polygon Topology structure To the back end.
In the present embodiment, the preferred embodiment of the Polygon Topology structure is honeycomb hexagonal topology structure, each A minor structure is hexagonal structure, and there are six vertex nodes for tool.Due to each vertex node of honeycomb hexagonal topology structure At most there are three adjacent vertex nodes, and stable structure, marshalling are matched convenient for creating according to the neighbouring relations of vertex node File is set, this guarantees each back end at most only to detect three adjacent data nodes, the inspection of each back end It is extremely low to survey load, improves the stability and accuracy of heartbeat detection between back end.Above-mentioned Polygon Topology structure can also be with It is triangle topological structure, pentagon topological structure, heptagon topological structure, these topological structures can also complete back end Between heartbeat detection, such as triangle topological structure, each of which vertex node is and then each at most there are six adjacent vertex node A back end at most only detects six adjacent data nodes.For the specific structure of above-mentioned Polygon Topology structure, herein not It limits.
After the Polygon Topology structure to be generated, monitoring node receives back end registration, by the Polygon Topology structure The ID value of vertex node is distributed to each back end, which has uniqueness, and ID value can be string number, can also be with It is a string of letters, without limitation to its form, such as the ID value of the Polygon Topology structure some vertex node is 100001.
Referring to Fig. 3, Polygon Topology structure is honeycomb hexagonal topology structure, in honeycomb hexagonal topology structure, prison The A-H vertex node location that totally eight back end are distributed is enumerated depending on node example, certainly, other do not indicate the vertex of letter Node is also assigned back end.
S1012: monitoring node is according to the ID value and the Polygon Topology structure distributed to the back end Neighbouring relations between the node of vertex create the configuration file.
In the present embodiment, in the Polygon Topology structure, vertex that each vertex node has several adjacent Node, that is to say, that the back end that each corresponding back end of vertex node has several adjacent, these are adjacent Back end, referred to as the adjacent data node of the back end.Such as when the Polygon Topology structure is honeycomb hexagonal topology When structure, each back end and three or two adjacent data nodes monitor node according to the phase between the node of vertex Adjacent relationship obtains the neighbouring relations of these corresponding back end of vertex node, and then creates configuration file.
In the present embodiment, the configuration file is looked into including at least heartbeat polling interval time, response time-out time, heartbeat Inquiry mode, the ID value of adjacent data node.
Specifically, the configuration file defines the specific inquiry mode for mutually carrying out heartbeat inquiry between back end.By It is cyclic activity in heartbeat inquiry, time rule back end in heartbeat polling interval periodically carries out the interval of heartbeat inquiry Time, for example, the heartbeat polling interval time is set as 6-9s if system performance requirements are higher, system performance requirements are general When, the heartbeat polling interval time is set as 30-40s.When response time-out time is the response of the target data node of heartbeat inquiry Between, if target data node returns to heartbeat message in response time-out time, this heartbeat successful inquiring, if number of targets Heartbeat message is not returned in response time-out time according to node, then this time heartbeat inquiry failure.Heartbeat inquiry mode is data section Heartbeat message sends and receives mode between point, and which specify the type of transmission mode, such as transmission control protocol (Transmission Control Protocol, TCP) transmission mode, heartbeat message transmission connects between alsoing specify back end The specific port received.The ID value of adjacent data node is the ID value of the back end adjacent with each back end, meanwhile, These adjacent data nodes are also the target data node of back end heartbeat inquiry.
Further, it monitors node pre-generatmg Polygon Topology structure, and is matched according to the Polygon Topology Structure Creating File is set, creation mode can be according to the specific heartbeat inquiry mode and its adjacent data node of each back end ID value creates corresponding configuration file for each back end, and the configuration file is sent to corresponding back end, this When, the size meeting very little of each configuration file, and then the configuration file is easy to be updated.
Above-mentioned monitoring node pre-generatmg Polygon Topology structure, and text is configured according to the Polygon Topology Structure Creating Part, can also be by way of creating unified configuration file, at this point, the configuration file includes the corresponding heart of all back end Inquiry mode is jumped, and the configuration file is sent to all back end, it is easy to be obtained, the size of the configuration file can be very Greatly, when and then updating to the configuration file, more resources can be occupied.
S102: the back end periodically carries out heartbeat inquiry according to the configuration file, to its adjacent data node, And the abnormal heartbeats query result of generation is reported into the monitoring node.
In the present embodiment, the back end periodically carries out the heart to adjacent data node according to the configuration file Inquiry is jumped, and the step S102 that the abnormal heartbeats query result of generation reports to the monitoring node is specifically included into S1021- Two sub-steps of S1022, as shown in Figure 4.
S1021: the back end is every the heartbeat polling interval time, according to the heartbeat inquiry mode to described Adjacent data node described in configuration file carries out heartbeat inquiry.
In the present embodiment, which initiates heartbeat inquiry to its adjacent data node, which connects After receiving heartbeat inquiry, heartbeat response is made in response time-out time, after which receives heartbeat response, explanation The adjacent node is normally online, therefore does not need to report any information, at this point, waiting next heartbeat to look into dormant state Interval time arrival is ask, heartbeat request is re-initiated.Meanwhile the back end also can carry out heartbeat inquiry as its adjacent node Target data node, back end needs make heartbeat response in response time-out time, to show that itself is normal online.
S1022: if the back end in the response time-out time, to consecutive number described in the configuration file It inquires and fails according to nodes heart beat, then generate abnormal heartbeats query result, and the abnormal heartbeats query result reported to described Monitor node.
In the present embodiment, if the back end initiates heartbeat inquiry to its adjacent data node, not in response time-out Heartbeat response is received in time, shows that its adjacent data node does not make heartbeat response in response time-out time, i.e. the phase Neighbors becomes abnormal data node, may go offline, and then the back end inquires its adjacent data nodes heart beat and fails, At this point, generating abnormal heartbeats query result.
Further, the heartbeat polling interval time arrives instantly, and the above-mentioned back end continues to its adjacent data section Point initiates heartbeat inquiry, if not receiving heartbeat response in response time-out time yet, at this point, there are many processing modes.Example Such as, back end can continue to generate abnormal heartbeats query result, report to monitoring node, and then on the abnormal data node Repeatedly, still, within each heartbeat polling interval time, which reports number at most primary to report.For another example Back end stops generating abnormal heartbeats query result, or only generates abnormal heartbeats query result, does not report monitoring node, this When, which reports number only primary, when back end detects that the abnormal data node midway is temporarily restored When the normal exception of continuation again, which reports abnormal heartbeats query result again.
Further, when the abnormal data node can make heartbeat response, show that the abnormal data node restores normal, At this point, the abnormal data node active reporting heartbeat message, shows that itself is normal online.
In the present embodiment, the abnormal heartbeats query result includes at least the number for reporting abnormal heartbeats query result According to the ID value of the ID value of node and the abnormal data node of heartbeat inquiry failure.Specifically, back end is inquiring exception When back end, the ID value of the abnormal data node inquired is added in abnormal heartbeats query result, it can also be by itself ID value be added in abnormal heartbeats query result, convenient for being positioned to the abnormal data node and number being reported to count.
It should be noted that the adjacent node of above-mentioned abnormal data node or other back end, at this point, other data Node can equally find the abnormal data node, and the abnormal heartbeats for generating the ID value including the abnormal data node inquire knot Fruit.
Heartbeat detection is carried out mutually by back end back end adjacent thereto, and monitoring node can be greatly lowered Load, simultaneously as the adjacent data node of each back end is seldom, and then the load of each back end is extremely low, mentions The stability and accuracy of heartbeat detection between back end are risen.
S103: the monitoring node is according to the abnormal heartbeats query result and reports the abnormal heartbeats query result The back end total number determines malfunctioning node, wherein the data section for reporting the abnormal heartbeats query result Point has same adjacent data node.
In the present embodiment, all back end for inquiring abnormal data node, can be by the abnormal heart of each self-generating It jumps query result and reports to monitoring node.In this step, the back end of the abnormal heartbeats query result is reported to have Same adjacent data node, that is to say, that the back end for reporting the abnormal heartbeats query result, is abnormal data When all adjacent data nodes of the adjacent data node of node, only abnormal data node report the abnormal data node, Monitoring node determines that the abnormal data node is malfunctioning node.
In the present embodiment, the monitoring node according to the abnormal heartbeats query result and reports the abnormal heartbeats to look into The back end total number for asking result determines malfunctioning node, wherein the institute for reporting the abnormal heartbeats query result Stating back end has the step S103 of same adjacent data node to specifically include two sub-steps of S1031-S1032, such as Fig. 5 institute Show.
S1031: in abnormal heartbeats query result described in the monitoring node statistics, the abnormal data node is by its difference The adjacent data node number that adjacent data node reports, when the adjacent data node number reaches setting number and is less than When the adjacent data node total number of the abnormal data node, the monitoring node initiates failure to the abnormal data node Inquiry, judges whether the abnormal data node is the malfunctioning node.
In the present embodiment, judging the rule of malfunctioning node is, when back end is reported by its different adjacent data node When, which becomes abnormal data node, when reporting the adjacent data node number of the abnormal data node to reach setting Number or be equal to adjacent data node total number when, carry out failure determine.For example, working as the adjacent data of the abnormal data node Node number reaches the half of adjacent data node total number, i.e. setting number, since not every adjacent data node is equal The abnormal data node is reported, at this point, monitoring node actively initiates fault inquiry to the abnormal data node, and then determines that this is different Whether regular data node is normal online, if not online, illustrates that the abnormal data node is malfunctioning node.
Referring again to Fig. 3, in honeycomb hexagonal topology structure, it is assumed that abnormal data node is back end C, due to number There was only tri- adjacent data nodes of back end A, D, H according to node C.When back end C exception, back end A, D, H are examined Measuring back end C is abnormal data node, and reports to monitoring node, and for some reason, monitoring node receives only The abnormal heartbeats query result of back end A, D, at this point, monitoring node statistics obtain, back end C is by its 2 different phases Adjacent back end reports, more than the half of back end C adjacent data node total number 3, at this point, monitoring node is to back end A carries out fault inquiry.
It should be noted that above-mentioned setting number can be set, setting principle is, only when abnormal data node quilt It giving the correct time on its most of adjacent data node, monitoring node, which can just be intervened, judges whether the abnormal data node is malfunctioning node, Be conducive to save system resource.For example, setting number can be set as phase when topological structure is honeycomb hexagonal topology structure Adjacent back end total number half;When topological structure is triangle topology structure, since each back end at most has six A adjacent data node, setting number can be set as 2/3rds of adjacent data node total number.
S1032: in abnormal heartbeats query result described in the monitoring node statistics, the abnormal data node is by its difference The adjacent data node number that adjacent data node reports, when the adjacent data node number is equal to the abnormal data node Adjacent data node total number when, judge the abnormal data node for the malfunctioning node.
In the present embodiment, it is given the correct time on all normal online adjacent data nodes when abnormal data node by its, i.e. table The bright abnormal data node failure, which is malfunctioning node.
It should be noted that the total number of above-mentioned adjacent data node, is total of normal online adjacent data node Number.
By the further screening to abnormal heartbeats query result, the accuracy of determining malfunctioning node is improved, is contracted simultaneously It is small further judged by monitoring node abnormal nodes whether be malfunctioning node range, saved system resource.
Embodiment two
Referring to Fig. 6, the embodiment of the invention provides a kind of palmus detection systems, can execute the collection provided in embodiment one Group's heartbeat detecting method, including monitoring node 21 and back end 22, wherein monitoring node 21 is a special back end 22, it can be communicated with all back end 22, including heartbeat communication.In group system, 21 sum number of node is monitored It is a host in cluster according to node 22.In service monitoring system, monitors node 21 and back end 22, be A process in system.
In the present embodiment, monitoring node 21 includes topography module 211, configuration module 212, judgment module 213;Data section Point 22 includes enquiry module 221, result-generation module 222.
Topography module 211 is used for pre-generatmg Polygon Topology structure.
Configuration module 212, for being sent out according to the Polygon Topology Structure Creating configuration file, and by the configuration file It send to back end 22.
Enquiry module 221, for periodically carrying out heartbeat inquiry to adjacent data node according to the configuration file.
Result-generation module 222, for the abnormal heartbeats query result of generation to be reported to the monitoring node 21.
Judgment module 213, for according to the abnormal heartbeats query result and reporting the abnormal heartbeats query result The back end total number determines malfunctioning node, wherein the data section for reporting the abnormal heartbeats query result Point has same adjacent data node.
In the present embodiment, configuration module 212 be also used to distribute the ID value of the vertex node of Polygon Topology structure to The back end, and according to distribution to the ID value of the back end and the vertex node of the Polygon Topology structure Between neighbouring relations, create configuration file, and the configuration file be sent to back end 22.
Above-mentioned Polygon Topology structure is honeycomb hexagonal topology structure, each minor structure is hexagonal structure, is had Six vertex nodes.Above-mentioned Polygon Topology structure is also possible to triangle topological structure, pentagon topological structure, heptagon and opens up Structure is flutterred, it is not limited here.After the Polygon Topology structure to be generated, monitoring node 21 receives the registration of back end 22, will The ID value of the Polygon Topology structure vertex node is distributed to each back end 22.
In the present embodiment, the configuration file is looked into including at least heartbeat polling interval time, response time-out time, heartbeat Inquiry mode, the ID value of adjacent data node.The configuration file mutually carries out heartbeat inquiry between defining back end 22 Specific inquiry mode.
In the present embodiment, enquiry module 221 is also used to look into every the heartbeat polling interval time according to the heartbeat Inquiry mode carries out heartbeat inquiry to adjacent data node described in the configuration file.
In the present embodiment, result-generation module 222 is also used in the response time-out time, the enquiry module When the inquiry failure of adjacent data nodes heart beat described in 221 pairs of configuration files, abnormal heartbeats query result is generated.
In the present embodiment, the abnormal heartbeats query result includes at least the number for reporting abnormal heartbeats query result According to the ID value of the ID value of node and the abnormal data node of heartbeat inquiry failure.
Heartbeat detection is carried out mutually by the back end adjacent thereto of back end 22, and monitoring node can be greatly lowered 21 load, simultaneously as the adjacent data node of each back end 22 is seldom, and then the load of each back end 22 It is extremely low, improve the stability and accuracy of 22 heartbeat detections of back end.
In the present embodiment, the judgment module 213 is also used to, and is counted in the abnormal heartbeats query result, described different The adjacent data node number that regular data node is reported by its different adjacent data node, when the adjacent data node number reaches To setting number and when being less than the adjacent data node total number of the abnormal data node, Xiang Suoshu abnormal data node hair Fault inquiry is played, judges whether the abnormal data node is the malfunctioning node.
In the present embodiment, the judgment module 213 is also used to, and is counted in the abnormal heartbeats query result, described different The adjacent data node number that regular data node is reported by its different adjacent data node, when described adjacent data node number etc. When the adjacent data node total number of the abnormal data node, judge the abnormal data node for the malfunctioning node.
By the further screening to abnormal heartbeats query result, the accuracy of determining malfunctioning node is improved, is contracted simultaneously It is small further judged by monitoring node abnormal nodes whether be malfunctioning node range, saved system resource.
Embodiment three
Referring to Fig. 7, the embodiment of the invention provides a kind of heartbeat detection device, including monitoring node 31, the monitoring section Putting 31 includes:
Topography module 311 is used for pre-generatmg Polygon Topology structure;
Configuration module 312, for being sent to according to Polygon Topology Structure Creating configuration file, and by the configuration file Back end;
Judgment module 313, for executing the abnormal heartbeats reported after the configuration file inquiry according to the back end As a result and the back end total number of the abnormal heartbeats query result is reported to determine malfunctioning node, wherein on described The back end of the abnormal heartbeats query result is reported to have same adjacent data node.
In the prior art, monitoring node 31 needs to carry out heartbeat communication with each back end, to each data Node is detected, and in the present embodiment, monitoring node 31 carries out heartbeat detection to back end in the following ways.Monitoring section The abnormal heartbeats query result that point 31 is provided by receiving back end, detects the abnormal data node emphasis screened, And then final malfunctioning node is determined in abnormal data node, this just reduces detection range, greatly reduces monitoring node 31 load.
In the present embodiment, Polygon Topology structure is honeycomb hexagonal topology structure.
In the present embodiment, judgment module 313 is also used to count in the abnormal heartbeats query result, the abnormal data The adjacent data node number that node is reported by its different adjacent data node, when the adjacent data node number reaches setting Number and be less than the abnormal data node adjacent data node total number when, Xiang Suoshu abnormal data node initiate failure Inquiry, judges whether the abnormal data node is the malfunctioning node.
In the present embodiment, judgment module 313 is also used to count in the abnormal heartbeats query result, the abnormal data The adjacent data node number that node is reported by its different adjacent data node, when the adjacent data node number is equal to described When the adjacent data node total number of abnormal data node, judge the abnormal data node for the malfunctioning node.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (16)

1. a kind of cluster heartbeat detecting method, which comprises the steps of:
Node pre-generatmg Polygon Topology structure is monitored, and according to the Polygon Topology Structure Creating configuration file, and by institute It states configuration file and is sent to back end;
The back end periodically carries out heartbeat inquiry according to the configuration file, to its adjacent data node, and will generate Abnormal heartbeats query result report to the monitoring node;
The monitoring node is according to the abnormal heartbeats query result and the data for reporting the abnormal heartbeats query result Node total number determines malfunctioning node, wherein the back end for reporting the abnormal heartbeats query result has together One adjacent data node;
The monitoring node is according to the abnormal heartbeats query result and the data for reporting the abnormal heartbeats query result Node total number determines malfunctioning node, comprising:
In abnormal heartbeats query result described in the monitoring node statistics, abnormal data node is by its different adjacent data node The adjacent data node number of report;When the adjacent data node number reaches setting number and is less than the abnormal data section When the adjacent data node total number of point, the monitoring node is to abnormal data node initiation fault inquiry, described in judgement Whether abnormal data node is the malfunctioning node.
2. cluster heartbeat detecting method according to claim 1, which is characterized in that described according to the Polygon Topology knot The step of structure creation configuration file, specifically includes:
The monitoring node distributes the ID value of the vertex node of the Polygon Topology structure to the back end;
The monitoring node is saved according to the ID value of distribution to the back end and the vertex of the Polygon Topology structure Neighbouring relations between point create the configuration file.
3. cluster heartbeat detecting method according to claim 2, which is characterized in that the Polygon Topology structure is honeycomb Hexagonal topology structure.
4. cluster heartbeat detecting method according to claim 3, which is characterized in that the configuration file includes at least heartbeat The polling interval time, response time-out time, heartbeat inquiry mode, adjacent data node ID value.
5. cluster heartbeat detecting method according to claim 4, which is characterized in that the back end is according to the configuration File periodically carries out heartbeat inquiry to adjacent data node, and the abnormal heartbeats query result of generation is reported to the prison It is specifically included depending on the step of node:
The back end is every the heartbeat polling interval time, according to the heartbeat inquiry mode in the configuration file The adjacent data node carries out heartbeat inquiry;
If the back end is in the response time-out time, to adjacent data nodes heart beat described in the configuration file Inquiry failure, then generate abnormal heartbeats query result, and the abnormal heartbeats query result is reported to the monitoring node.
6. cluster heartbeat detecting method according to claim 5, which is characterized in that the abnormal heartbeats query result is at least ID including reporting the ID value of the back end of abnormal heartbeats query result and the abnormal data node of heartbeat inquiry failure Value.
7. cluster heartbeat detecting method according to claim 6, which is characterized in that the monitoring node is according to the exception Heartbeat query result and the back end total number of the abnormal heartbeats query result is reported to determine malfunctioning node, wherein The back end for reporting the abnormal heartbeats query result has the step of same adjacent data node specifically:
In abnormal heartbeats query result described in the monitoring node statistics, the abnormal data node is by its different adjacent data section The adjacent data node number that point reports, when the adjacent data node number is equal to the adjacent data of the abnormal data node When node total number, judge the abnormal data node for the malfunctioning node.
8. a kind of palmus detection system, including monitoring node and back end, which is characterized in that the monitoring node includes topology Module, configuration module, judgment module;The back end includes enquiry module, result-generation module;
The topography module is used for pre-generatmg Polygon Topology structure;
The configuration module, for being sent according to the Polygon Topology Structure Creating configuration file, and by the configuration file To the back end;
The enquiry module, for periodically carrying out heartbeat inquiry to its adjacent data node according to the configuration file;
The result-generation module, for the abnormal heartbeats query result of generation to be reported to the monitoring node;
The judgment module, for according to the abnormal heartbeats query result and reporting described in the abnormal heartbeats query result Back end total number determines malfunctioning node, wherein the back end for reporting the abnormal heartbeats query result is equal There is same adjacent data node;
The judgment module is also used to count in the abnormal heartbeats query result, and abnormal data node is by its different consecutive number The adjacent data node number reported according to node;When the adjacent data node number reaches setting number and is less than described different When the adjacent data node total number of regular data node, Xiang Suoshu abnormal data node initiates fault inquiry, judges the exception Whether back end is the malfunctioning node.
9. palmus detection system according to claim 8, which is characterized in that the configuration module is also used to will be described polygon The ID value of the vertex node of shape topological structure is distributed to the back end, and according to the distribution to the back end ID value and the Polygon Topology structure the vertex node between neighbouring relations, create the configuration file.
10. palmus detection system according to claim 9, which is characterized in that the Polygon Topology structure is honeycomb six Side shape topological structure.
11. palmus detection system according to claim 10, which is characterized in that the configuration file is looked into including at least heartbeat Ask the ID value of interval time, response time-out time, heartbeat inquiry mode, adjacent data node;
The enquiry module was also used to every the heartbeat polling interval time, according to the heartbeat inquiry mode to the configuration Adjacent data node described in file carries out heartbeat inquiry;
The result-generation module is also used in the response time-out time, and the enquiry module is in the configuration file When the adjacent data nodes heart beat inquiry failure, abnormal heartbeats query result is generated.
12. palmus detection system according to claim 11, which is characterized in that the abnormal heartbeats query result at least wraps Include the ID value of the abnormal data node of the ID value for reporting the back end of abnormal heartbeats query result and heartbeat inquiry failure.
13. palmus detection system according to claim 12, which is characterized in that the judgment module is also used to:
It counts in the abnormal heartbeats query result, the abnormal data node is reported adjacent by its different adjacent data node Back end number, when the adjacent data node number is equal to the adjacent data node total number of the abnormal data node When, judge the abnormal data node for the malfunctioning node.
14. a kind of heartbeat detection device, including monitoring node, which is characterized in that the monitoring node includes:
Topography module is used for pre-generatmg Polygon Topology structure;
Configuration module, for being sent to number according to the Polygon Topology Structure Creating configuration file, and by the configuration file According to node;
Judgment module, for executed according to the back end abnormal heartbeats query result reported after the configuration file and Report the back end total number of the abnormal heartbeats query result to determine malfunctioning node, wherein it is described report it is described different The back end of normal heartbeat query result has same adjacent data node;
The judgment module is also used to count in the abnormal heartbeats query result, and abnormal data node is by its different consecutive number The adjacent data node number reported according to node;When the adjacent data node number reaches setting number and is less than described different When the adjacent data node total number of regular data node, Xiang Suoshu abnormal data node initiates fault inquiry, judges the exception Whether back end is the malfunctioning node.
15. heartbeat detection device according to claim 14, which is characterized in that the Polygon Topology structure is honeycomb six Side shape topological structure.
16. heartbeat detection device according to claim 15, which is characterized in that the judgment module is also used to:
It counts in the abnormal heartbeats query result, the abnormal data node is reported adjacent by its different adjacent data node Back end number, when the adjacent data node number is equal to the adjacent data node total number of the abnormal data node When, judge the abnormal data node for the malfunctioning node.
CN201710106792.4A 2017-02-27 2017-02-27 Cluster heartbeat detecting method, system and device Expired - Fee Related CN106656682B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710106792.4A CN106656682B (en) 2017-02-27 2017-02-27 Cluster heartbeat detecting method, system and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710106792.4A CN106656682B (en) 2017-02-27 2017-02-27 Cluster heartbeat detecting method, system and device

Publications (2)

Publication Number Publication Date
CN106656682A CN106656682A (en) 2017-05-10
CN106656682B true CN106656682B (en) 2019-10-25

Family

ID=58846801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710106792.4A Expired - Fee Related CN106656682B (en) 2017-02-27 2017-02-27 Cluster heartbeat detecting method, system and device

Country Status (1)

Country Link
CN (1) CN106656682B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11212204B2 (en) 2017-06-30 2021-12-28 Xi'an Zhongxing New Software Co., Ltd. Method, device and system for monitoring node survival state
CN109257195B (en) 2017-07-12 2021-01-15 华为技术有限公司 Fault processing method and equipment for nodes in cluster
CN107566219B (en) * 2017-09-27 2020-09-18 华为技术有限公司 Fault diagnosis method applied to cluster system, node equipment and computer equipment
CN109697193A (en) * 2017-10-24 2019-04-30 中兴通讯股份有限公司 A kind of method, node and the computer readable storage medium of determining abnormal nodes
CN108235800B (en) * 2017-12-19 2021-08-03 达闼机器人有限公司 Network fault detection method, control center equipment and computer storage medium
CN108092857A (en) * 2018-01-15 2018-05-29 郑州云海信息技术有限公司 A kind of distributed system heartbeat detecting method and relevant apparatus
CN110365936B (en) * 2018-04-11 2021-03-16 杭州海康威视系统技术有限公司 Code stream obtaining method, device and system
CN111225224A (en) * 2018-11-27 2020-06-02 玲珑视界科技(北京)有限公司 System and method for monitoring state of grid node
CN109474694A (en) * 2018-12-04 2019-03-15 郑州云海信息技术有限公司 A kind of management-control method and device of the NAS cluster based on SAN storage array
CN110611603B (en) * 2019-09-09 2021-08-31 苏州浪潮智能科技有限公司 Cluster network card monitoring method and device
CN112988463B (en) * 2021-02-23 2022-08-30 新华三大数据技术有限公司 Fault node isolation method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1893370A (en) * 2005-06-29 2007-01-10 国际商业机器公司 Server cluster recovery and maintenance method and system
CN103117901A (en) * 2013-02-01 2013-05-22 华为技术有限公司 Distributed heartbeat detection method, device and system
CN106452952A (en) * 2016-09-29 2017-02-22 华为技术有限公司 Method for detecting communication state of cluster system and gateway cluster

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070294600A1 (en) * 2006-05-08 2007-12-20 Inventec Corporation Method of detecting heartbeats and device thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1893370A (en) * 2005-06-29 2007-01-10 国际商业机器公司 Server cluster recovery and maintenance method and system
CN103117901A (en) * 2013-02-01 2013-05-22 华为技术有限公司 Distributed heartbeat detection method, device and system
CN106452952A (en) * 2016-09-29 2017-02-22 华为技术有限公司 Method for detecting communication state of cluster system and gateway cluster

Also Published As

Publication number Publication date
CN106656682A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN106656682B (en) Cluster heartbeat detecting method, system and device
CN109714192B (en) Monitoring method and system for monitoring cloud platform
CN105165054B (en) Network service failure processing method, service management system and system management module
US7685269B1 (en) Service-level monitoring for storage applications
EP2606607B1 (en) Determining equivalent subsets of agents to gather information for a fabric
KR20040093441A (en) Method and apparatus for discovering network devices
JPH09186688A (en) Improved node discovery and network control system with monitoring
CN109039795B (en) Cloud server resource monitoring method and system
CN113742066A (en) Load balancing system and method for server cluster
CN106021070A (en) Method and device for server cluster monitoring
CN108156040A (en) A kind of central control node in distribution cloud storage system
CN107070744A (en) Server monitoring method
CN109657005A (en) A kind of data cache method of distributed cluster system, device and equipment
CN109067600A (en) A kind of private clound management platform system and its task processing method
CN115202958A (en) Power abnormity monitoring method and device, electronic equipment and storage medium
CN112437145A (en) Server cluster management method and device and related components
CN106790610A (en) A kind of cloud system message distributing method, device and system
US8275865B2 (en) Methods, systems and computer program products for selecting among alert conditions for resource management systems
EP3424182B1 (en) Neighbor monitoring in a hyperscaled environment
CN106899659B (en) Distributed system and management method and management device thereof
US7379970B1 (en) Method and system for reduced distributed event handling in a network environment
CN114363150A (en) Network card connectivity monitoring method and device for server cluster
CN115118635A (en) Time delay detection method, device, equipment and storage medium
CN108781215B (en) Network service implementation method, service controller and communication system
CN114327849A (en) Resource scheduling method based on intelligent monitoring

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20191025

CF01 Termination of patent right due to non-payment of annual fee