CN106656682B - Cluster heartbeat detecting method, system and device - Google Patents
Cluster heartbeat detecting method, system and device Download PDFInfo
- Publication number
- CN106656682B CN106656682B CN201710106792.4A CN201710106792A CN106656682B CN 106656682 B CN106656682 B CN 106656682B CN 201710106792 A CN201710106792 A CN 201710106792A CN 106656682 B CN106656682 B CN 106656682B
- Authority
- CN
- China
- Prior art keywords
- node
- data node
- abnormal
- back end
- query result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/10—Active monitoring, e.g. heartbeat, ping or trace-route
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/04—Network management architectures or arrangements
- H04L41/044—Network management architectures or arrangements comprising hierarchical management structures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0817—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
Abstract
The invention discloses a kind of cluster heartbeat detecting method, system and devices, belong to computer communication technology field.Wherein, this method comprises the following steps: monitoring node pre-generatmg Polygon Topology structure, and is sent to back end according to the Polygon Topology Structure Creating configuration file, and by the configuration file;The back end periodically carries out heartbeat inquiry according to the configuration file, to its adjacent data node, and the abnormal heartbeats query result of generation is reported to the monitoring node;The monitoring node is according to the abnormal heartbeats query result and the back end total number of the abnormal heartbeats query result is reported to determine malfunctioning node, wherein, the back end for reporting the abnormal heartbeats query result has same adjacent data node.The present invention carries out mutually heartbeat detection by back end back end adjacent thereto, and the load of monitoring node can be greatly lowered.
Description
Technical field
The present invention relates to computer communication technology field more particularly to a kind of cluster heartbeat detecting methods, system and device.
Background technique
In large-scale cluster or service monitoring system, heartbeat is to detect the whether normal online weight of cluster interior nodes
Want means.
The typical method of current cluster heartbeat detection is the detection method using star-like heart beat configuration, that is, there is a monitoring
Node, the monitoring node is responsible to carry out heartbeat communication with other all detected data nodes, once it monitors node and is detected
This back end is labeled as malfunctioning node by the communication disruption of measured data node, monitoring node.
This scheme is very simple, but has a problem that: monitoring node needs to carry out heartbeat with a large amount of back end
Communication may cannot respond to the heartbeat message from back end when monitoring that node is busy, cause back end overtime
Erroneous judgement, and then may cause cluster crash.
Summary of the invention
In order to solve problems in the prior art, the embodiment of the invention provides a kind of cluster heartbeat detecting method, system and
Device.The technical solution is as follows:
On the one hand, a kind of cluster heartbeat detecting method is provided, is included the following steps:
Monitor node pre-generatmg Polygon Topology structure, and according to the Polygon Topology Structure Creating configuration file, and
The configuration file is sent to back end;
The back end periodically carries out heartbeat inquiry according to the configuration file, to its adjacent data node, and will
The abnormal heartbeats query result of generation reports to the monitoring node;
The monitoring node is according to the abnormal heartbeats query result and reports described in the abnormal heartbeats query result
Back end total number determines malfunctioning node, wherein the back end for reporting the abnormal heartbeats query result is equal
There is same adjacent data node.
Further, described to be specifically included according to the step of Polygon Topology Structure Creating configuration file:
The monitoring node distributes the ID value of the vertex node of the Polygon Topology structure to the back end;
The monitoring node is according to distribution to the ID value of the back end and the top of the Polygon Topology structure
Neighbouring relations between point node, create the configuration file.
Further, the Polygon Topology structure is honeycomb hexagonal topology structure.
Further, the configuration file includes at least heartbeat polling interval time, response time-out time, heartbeat issuer
Formula, the ID value of adjacent data node.
Further, the back end periodically carries out heartbeat to adjacent data node and looks into according to the configuration file
It askes, and the step of abnormal heartbeats query result of generation is reported to the monitoring node specifically includes:
The back end is every the heartbeat polling interval time, according to the heartbeat inquiry mode to the configuration text
Adjacent data node described in part carries out heartbeat inquiry;
If the back end is in the response time-out time, to adjacent data node described in the configuration file
Heartbeat inquiry failure then generates abnormal heartbeats query result, and the abnormal heartbeats query result is reported to the monitoring and is saved
Point.
Further, the abnormal heartbeats query result includes at least the data section for reporting abnormal heartbeats query result
The ID value of the ID value of point and the abnormal data node of heartbeat inquiry failure.
Further, the monitoring node according to the abnormal heartbeats query result and reports the abnormal heartbeats inquiry knot
The back end total number of fruit determines malfunctioning node, wherein the number for reporting the abnormal heartbeats query result
There is the step of same adjacent data node according to node specifically:
In abnormal heartbeats query result described in the monitoring node statistics, the abnormal data node is by its different consecutive number
The adjacent data node number reported according to node, when the adjacent data node number reaches setting number and is less than described different
When the adjacent data node total number of regular data node, the monitoring node initiates fault inquiry to the abnormal data node,
Judge whether the abnormal data node is the malfunctioning node;
In abnormal heartbeats query result described in the monitoring node statistics, the abnormal data node is by its different consecutive number
The adjacent data node number reported according to node, when the adjacent data node number is equal to the adjacent of the abnormal data node
When back end total number, judge the abnormal data node for the malfunctioning node.
On the other hand, a kind of palmus detection system, including monitoring node and back end, the monitoring node packet are provided
Include topography module, configuration module, judgment module;The back end includes enquiry module, result-generation module;Wherein,
The topography module is used for pre-generatmg Polygon Topology structure;
The configuration module, for according to the Polygon Topology Structure Creating configuration file, and by the configuration file
It is sent to the back end;
The enquiry module, for periodically carrying out heartbeat inquiry to its adjacent data node according to the configuration file;
The result-generation module, for the abnormal heartbeats query result of generation to be reported to the monitoring node;
The judgment module, for according to the abnormal heartbeats query result and reporting the abnormal heartbeats query result
The back end total number determines malfunctioning node, wherein the data section for reporting the abnormal heartbeats query result
Point has same adjacent data node.
Further, the configuration module is also used to the ID value of the vertex node of the Polygon Topology structure point
It is assigned to the back end, and according to ID value and the Polygon Topology structure of the distribution to the back end
Neighbouring relations between the node of vertex create the configuration file.
Further, the Polygon Topology structure is honeycomb hexagonal topology structure.
Further, the configuration file includes at least heartbeat polling interval time, response time-out time, heartbeat issuer
Formula, the ID value of adjacent data node;
The enquiry module was also used to every the heartbeat polling interval time, according to the heartbeat inquiry mode to described
Adjacent data node described in configuration file carries out heartbeat inquiry;
The result-generation module is also used in the response time-out time, and the enquiry module is to the configuration text
When the inquiry failure of adjacent data nodes heart beat described in part, abnormal heartbeats query result is generated.
Further, the abnormal heartbeats query result includes at least the data section for reporting abnormal heartbeats query result
The ID value of the ID value of point and the abnormal data node of heartbeat inquiry failure.
Further, the judgment module is also used to:
It counts in the abnormal heartbeats query result, the abnormal data node is reported by its different adjacent data node
Adjacent data node number, when the adjacent data node number reaches setting number and is less than the abnormal data node
When adjacent data node total number, Xiang Suoshu abnormal data node initiates fault inquiry, whether judges the abnormal data node
For the malfunctioning node;
It counts in the abnormal heartbeats query result, the abnormal data node is reported by its different adjacent data node
Adjacent data node number, when the adjacent data node that the adjacent data node number is equal to the abnormal data node is always a
When number, judge the abnormal data node for the malfunctioning node.
In another aspect, providing a kind of heartbeat detection device, including monitoring node, the monitoring node include:
Topography module is used for pre-generatmg Polygon Topology structure;
Configuration module, for being sent according to the Polygon Topology Structure Creating configuration file, and by the configuration file
To back end;
Judgment module, for executing the abnormal heartbeats query result reported after the configuration file according to the back end
And the back end total number of the abnormal heartbeats query result is reported to determine malfunctioning node, wherein described to report institute
The back end for stating abnormal heartbeats query result has same adjacent data node.
Further, the Polygon Topology structure is honeycomb hexagonal topology structure.
Further, the judgment module is also used to:
It counts in the abnormal heartbeats query result, the abnormal data node is reported by its different adjacent data node
Adjacent data node number, when the adjacent data node number reaches setting number and is less than the abnormal data node
When adjacent data node total number, Xiang Suoshu abnormal data node initiates fault inquiry, whether judges the abnormal data node
For the malfunctioning node;
It counts in the abnormal heartbeats query result, the abnormal data node is reported by its different adjacent data node
Adjacent data node number, when the adjacent data node that the adjacent data node number is equal to the abnormal data node is always a
When number, judge the abnormal data node for the malfunctioning node.
Technical solution provided in an embodiment of the present invention has the benefit that
The present invention carries out mutually heartbeat detection by back end back end adjacent thereto, and then obtains abnormal heartbeats and look into
It askes result and reports to monitoring node, monitoring node according to abnormal heartbeats query result and reports same abnormal data node
Adjacent data node number determines malfunctioning node, and then the load of monitoring node is greatly lowered, simultaneously as each data section
The adjacent data node of point is seldom, and then the load of each back end is extremely low, improves heartbeat detection between back end
Stability and accuracy.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is the flow chart for the cluster heartbeat detecting method that the embodiment of the present invention one provides;
Fig. 2 is the detailed substeps flow chart of step S101 in the embodiment of the present invention one;
Fig. 3 is the honeycomb hexagonal topology structural schematic diagram that the embodiment of the present invention one provides;
Fig. 4 is the detailed substeps flow chart of step S102 in the embodiment of the present invention one;
Fig. 5 is the detailed substeps flow chart of step S103 in the embodiment of the present invention one;
Fig. 6 is the structural schematic diagram of palmus detection system provided by Embodiment 2 of the present invention;
Fig. 7 is the structural schematic diagram for the heartbeat detection device that the embodiment of the present invention three provides.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
Embodiment one
The embodiment of the invention provides a kind of cluster heartbeat detecting methods to include the following steps: referring to Fig. 1
S101: monitoring node pre-generatmg Polygon Topology structure, and text is configured according to the Polygon Topology Structure Creating
Part, and the configuration file is sent to back end.
In the present embodiment, which is a special back end, can be carried out with all back end
Communication, including heartbeat communication.
In the present embodiment, described to be specifically included according to the step of Polygon Topology Structure Creating configuration file
Two sub-steps of S1011-S1012, as shown in Figure 2.
S1011: monitoring node distributes ID (Identity, the identity) value of the vertex node of the Polygon Topology structure
To the back end.
In the present embodiment, the preferred embodiment of the Polygon Topology structure is honeycomb hexagonal topology structure, each
A minor structure is hexagonal structure, and there are six vertex nodes for tool.Due to each vertex node of honeycomb hexagonal topology structure
At most there are three adjacent vertex nodes, and stable structure, marshalling are matched convenient for creating according to the neighbouring relations of vertex node
File is set, this guarantees each back end at most only to detect three adjacent data nodes, the inspection of each back end
It is extremely low to survey load, improves the stability and accuracy of heartbeat detection between back end.Above-mentioned Polygon Topology structure can also be with
It is triangle topological structure, pentagon topological structure, heptagon topological structure, these topological structures can also complete back end
Between heartbeat detection, such as triangle topological structure, each of which vertex node is and then each at most there are six adjacent vertex node
A back end at most only detects six adjacent data nodes.For the specific structure of above-mentioned Polygon Topology structure, herein not
It limits.
After the Polygon Topology structure to be generated, monitoring node receives back end registration, by the Polygon Topology structure
The ID value of vertex node is distributed to each back end, which has uniqueness, and ID value can be string number, can also be with
It is a string of letters, without limitation to its form, such as the ID value of the Polygon Topology structure some vertex node is 100001.
Referring to Fig. 3, Polygon Topology structure is honeycomb hexagonal topology structure, in honeycomb hexagonal topology structure, prison
The A-H vertex node location that totally eight back end are distributed is enumerated depending on node example, certainly, other do not indicate the vertex of letter
Node is also assigned back end.
S1012: monitoring node is according to the ID value and the Polygon Topology structure distributed to the back end
Neighbouring relations between the node of vertex create the configuration file.
In the present embodiment, in the Polygon Topology structure, vertex that each vertex node has several adjacent
Node, that is to say, that the back end that each corresponding back end of vertex node has several adjacent, these are adjacent
Back end, referred to as the adjacent data node of the back end.Such as when the Polygon Topology structure is honeycomb hexagonal topology
When structure, each back end and three or two adjacent data nodes monitor node according to the phase between the node of vertex
Adjacent relationship obtains the neighbouring relations of these corresponding back end of vertex node, and then creates configuration file.
In the present embodiment, the configuration file is looked into including at least heartbeat polling interval time, response time-out time, heartbeat
Inquiry mode, the ID value of adjacent data node.
Specifically, the configuration file defines the specific inquiry mode for mutually carrying out heartbeat inquiry between back end.By
It is cyclic activity in heartbeat inquiry, time rule back end in heartbeat polling interval periodically carries out the interval of heartbeat inquiry
Time, for example, the heartbeat polling interval time is set as 6-9s if system performance requirements are higher, system performance requirements are general
When, the heartbeat polling interval time is set as 30-40s.When response time-out time is the response of the target data node of heartbeat inquiry
Between, if target data node returns to heartbeat message in response time-out time, this heartbeat successful inquiring, if number of targets
Heartbeat message is not returned in response time-out time according to node, then this time heartbeat inquiry failure.Heartbeat inquiry mode is data section
Heartbeat message sends and receives mode between point, and which specify the type of transmission mode, such as transmission control protocol
(Transmission Control Protocol, TCP) transmission mode, heartbeat message transmission connects between alsoing specify back end
The specific port received.The ID value of adjacent data node is the ID value of the back end adjacent with each back end, meanwhile,
These adjacent data nodes are also the target data node of back end heartbeat inquiry.
Further, it monitors node pre-generatmg Polygon Topology structure, and is matched according to the Polygon Topology Structure Creating
File is set, creation mode can be according to the specific heartbeat inquiry mode and its adjacent data node of each back end
ID value creates corresponding configuration file for each back end, and the configuration file is sent to corresponding back end, this
When, the size meeting very little of each configuration file, and then the configuration file is easy to be updated.
Above-mentioned monitoring node pre-generatmg Polygon Topology structure, and text is configured according to the Polygon Topology Structure Creating
Part, can also be by way of creating unified configuration file, at this point, the configuration file includes the corresponding heart of all back end
Inquiry mode is jumped, and the configuration file is sent to all back end, it is easy to be obtained, the size of the configuration file can be very
Greatly, when and then updating to the configuration file, more resources can be occupied.
S102: the back end periodically carries out heartbeat inquiry according to the configuration file, to its adjacent data node,
And the abnormal heartbeats query result of generation is reported into the monitoring node.
In the present embodiment, the back end periodically carries out the heart to adjacent data node according to the configuration file
Inquiry is jumped, and the step S102 that the abnormal heartbeats query result of generation reports to the monitoring node is specifically included into S1021-
Two sub-steps of S1022, as shown in Figure 4.
S1021: the back end is every the heartbeat polling interval time, according to the heartbeat inquiry mode to described
Adjacent data node described in configuration file carries out heartbeat inquiry.
In the present embodiment, which initiates heartbeat inquiry to its adjacent data node, which connects
After receiving heartbeat inquiry, heartbeat response is made in response time-out time, after which receives heartbeat response, explanation
The adjacent node is normally online, therefore does not need to report any information, at this point, waiting next heartbeat to look into dormant state
Interval time arrival is ask, heartbeat request is re-initiated.Meanwhile the back end also can carry out heartbeat inquiry as its adjacent node
Target data node, back end needs make heartbeat response in response time-out time, to show that itself is normal online.
S1022: if the back end in the response time-out time, to consecutive number described in the configuration file
It inquires and fails according to nodes heart beat, then generate abnormal heartbeats query result, and the abnormal heartbeats query result reported to described
Monitor node.
In the present embodiment, if the back end initiates heartbeat inquiry to its adjacent data node, not in response time-out
Heartbeat response is received in time, shows that its adjacent data node does not make heartbeat response in response time-out time, i.e. the phase
Neighbors becomes abnormal data node, may go offline, and then the back end inquires its adjacent data nodes heart beat and fails,
At this point, generating abnormal heartbeats query result.
Further, the heartbeat polling interval time arrives instantly, and the above-mentioned back end continues to its adjacent data section
Point initiates heartbeat inquiry, if not receiving heartbeat response in response time-out time yet, at this point, there are many processing modes.Example
Such as, back end can continue to generate abnormal heartbeats query result, report to monitoring node, and then on the abnormal data node
Repeatedly, still, within each heartbeat polling interval time, which reports number at most primary to report.For another example
Back end stops generating abnormal heartbeats query result, or only generates abnormal heartbeats query result, does not report monitoring node, this
When, which reports number only primary, when back end detects that the abnormal data node midway is temporarily restored
When the normal exception of continuation again, which reports abnormal heartbeats query result again.
Further, when the abnormal data node can make heartbeat response, show that the abnormal data node restores normal,
At this point, the abnormal data node active reporting heartbeat message, shows that itself is normal online.
In the present embodiment, the abnormal heartbeats query result includes at least the number for reporting abnormal heartbeats query result
According to the ID value of the ID value of node and the abnormal data node of heartbeat inquiry failure.Specifically, back end is inquiring exception
When back end, the ID value of the abnormal data node inquired is added in abnormal heartbeats query result, it can also be by itself
ID value be added in abnormal heartbeats query result, convenient for being positioned to the abnormal data node and number being reported to count.
It should be noted that the adjacent node of above-mentioned abnormal data node or other back end, at this point, other data
Node can equally find the abnormal data node, and the abnormal heartbeats for generating the ID value including the abnormal data node inquire knot
Fruit.
Heartbeat detection is carried out mutually by back end back end adjacent thereto, and monitoring node can be greatly lowered
Load, simultaneously as the adjacent data node of each back end is seldom, and then the load of each back end is extremely low, mentions
The stability and accuracy of heartbeat detection between back end are risen.
S103: the monitoring node is according to the abnormal heartbeats query result and reports the abnormal heartbeats query result
The back end total number determines malfunctioning node, wherein the data section for reporting the abnormal heartbeats query result
Point has same adjacent data node.
In the present embodiment, all back end for inquiring abnormal data node, can be by the abnormal heart of each self-generating
It jumps query result and reports to monitoring node.In this step, the back end of the abnormal heartbeats query result is reported to have
Same adjacent data node, that is to say, that the back end for reporting the abnormal heartbeats query result, is abnormal data
When all adjacent data nodes of the adjacent data node of node, only abnormal data node report the abnormal data node,
Monitoring node determines that the abnormal data node is malfunctioning node.
In the present embodiment, the monitoring node according to the abnormal heartbeats query result and reports the abnormal heartbeats to look into
The back end total number for asking result determines malfunctioning node, wherein the institute for reporting the abnormal heartbeats query result
Stating back end has the step S103 of same adjacent data node to specifically include two sub-steps of S1031-S1032, such as Fig. 5 institute
Show.
S1031: in abnormal heartbeats query result described in the monitoring node statistics, the abnormal data node is by its difference
The adjacent data node number that adjacent data node reports, when the adjacent data node number reaches setting number and is less than
When the adjacent data node total number of the abnormal data node, the monitoring node initiates failure to the abnormal data node
Inquiry, judges whether the abnormal data node is the malfunctioning node.
In the present embodiment, judging the rule of malfunctioning node is, when back end is reported by its different adjacent data node
When, which becomes abnormal data node, when reporting the adjacent data node number of the abnormal data node to reach setting
Number or be equal to adjacent data node total number when, carry out failure determine.For example, working as the adjacent data of the abnormal data node
Node number reaches the half of adjacent data node total number, i.e. setting number, since not every adjacent data node is equal
The abnormal data node is reported, at this point, monitoring node actively initiates fault inquiry to the abnormal data node, and then determines that this is different
Whether regular data node is normal online, if not online, illustrates that the abnormal data node is malfunctioning node.
Referring again to Fig. 3, in honeycomb hexagonal topology structure, it is assumed that abnormal data node is back end C, due to number
There was only tri- adjacent data nodes of back end A, D, H according to node C.When back end C exception, back end A, D, H are examined
Measuring back end C is abnormal data node, and reports to monitoring node, and for some reason, monitoring node receives only
The abnormal heartbeats query result of back end A, D, at this point, monitoring node statistics obtain, back end C is by its 2 different phases
Adjacent back end reports, more than the half of back end C adjacent data node total number 3, at this point, monitoring node is to back end
A carries out fault inquiry.
It should be noted that above-mentioned setting number can be set, setting principle is, only when abnormal data node quilt
It giving the correct time on its most of adjacent data node, monitoring node, which can just be intervened, judges whether the abnormal data node is malfunctioning node,
Be conducive to save system resource.For example, setting number can be set as phase when topological structure is honeycomb hexagonal topology structure
Adjacent back end total number half;When topological structure is triangle topology structure, since each back end at most has six
A adjacent data node, setting number can be set as 2/3rds of adjacent data node total number.
S1032: in abnormal heartbeats query result described in the monitoring node statistics, the abnormal data node is by its difference
The adjacent data node number that adjacent data node reports, when the adjacent data node number is equal to the abnormal data node
Adjacent data node total number when, judge the abnormal data node for the malfunctioning node.
In the present embodiment, it is given the correct time on all normal online adjacent data nodes when abnormal data node by its, i.e. table
The bright abnormal data node failure, which is malfunctioning node.
It should be noted that the total number of above-mentioned adjacent data node, is total of normal online adjacent data node
Number.
By the further screening to abnormal heartbeats query result, the accuracy of determining malfunctioning node is improved, is contracted simultaneously
It is small further judged by monitoring node abnormal nodes whether be malfunctioning node range, saved system resource.
Embodiment two
Referring to Fig. 6, the embodiment of the invention provides a kind of palmus detection systems, can execute the collection provided in embodiment one
Group's heartbeat detecting method, including monitoring node 21 and back end 22, wherein monitoring node 21 is a special back end
22, it can be communicated with all back end 22, including heartbeat communication.In group system, 21 sum number of node is monitored
It is a host in cluster according to node 22.In service monitoring system, monitors node 21 and back end 22, be
A process in system.
In the present embodiment, monitoring node 21 includes topography module 211, configuration module 212, judgment module 213;Data section
Point 22 includes enquiry module 221, result-generation module 222.
Topography module 211 is used for pre-generatmg Polygon Topology structure.
Configuration module 212, for being sent out according to the Polygon Topology Structure Creating configuration file, and by the configuration file
It send to back end 22.
Enquiry module 221, for periodically carrying out heartbeat inquiry to adjacent data node according to the configuration file.
Result-generation module 222, for the abnormal heartbeats query result of generation to be reported to the monitoring node 21.
Judgment module 213, for according to the abnormal heartbeats query result and reporting the abnormal heartbeats query result
The back end total number determines malfunctioning node, wherein the data section for reporting the abnormal heartbeats query result
Point has same adjacent data node.
In the present embodiment, configuration module 212 be also used to distribute the ID value of the vertex node of Polygon Topology structure to
The back end, and according to distribution to the ID value of the back end and the vertex node of the Polygon Topology structure
Between neighbouring relations, create configuration file, and the configuration file be sent to back end 22.
Above-mentioned Polygon Topology structure is honeycomb hexagonal topology structure, each minor structure is hexagonal structure, is had
Six vertex nodes.Above-mentioned Polygon Topology structure is also possible to triangle topological structure, pentagon topological structure, heptagon and opens up
Structure is flutterred, it is not limited here.After the Polygon Topology structure to be generated, monitoring node 21 receives the registration of back end 22, will
The ID value of the Polygon Topology structure vertex node is distributed to each back end 22.
In the present embodiment, the configuration file is looked into including at least heartbeat polling interval time, response time-out time, heartbeat
Inquiry mode, the ID value of adjacent data node.The configuration file mutually carries out heartbeat inquiry between defining back end 22
Specific inquiry mode.
In the present embodiment, enquiry module 221 is also used to look into every the heartbeat polling interval time according to the heartbeat
Inquiry mode carries out heartbeat inquiry to adjacent data node described in the configuration file.
In the present embodiment, result-generation module 222 is also used in the response time-out time, the enquiry module
When the inquiry failure of adjacent data nodes heart beat described in 221 pairs of configuration files, abnormal heartbeats query result is generated.
In the present embodiment, the abnormal heartbeats query result includes at least the number for reporting abnormal heartbeats query result
According to the ID value of the ID value of node and the abnormal data node of heartbeat inquiry failure.
Heartbeat detection is carried out mutually by the back end adjacent thereto of back end 22, and monitoring node can be greatly lowered
21 load, simultaneously as the adjacent data node of each back end 22 is seldom, and then the load of each back end 22
It is extremely low, improve the stability and accuracy of 22 heartbeat detections of back end.
In the present embodiment, the judgment module 213 is also used to, and is counted in the abnormal heartbeats query result, described different
The adjacent data node number that regular data node is reported by its different adjacent data node, when the adjacent data node number reaches
To setting number and when being less than the adjacent data node total number of the abnormal data node, Xiang Suoshu abnormal data node hair
Fault inquiry is played, judges whether the abnormal data node is the malfunctioning node.
In the present embodiment, the judgment module 213 is also used to, and is counted in the abnormal heartbeats query result, described different
The adjacent data node number that regular data node is reported by its different adjacent data node, when described adjacent data node number etc.
When the adjacent data node total number of the abnormal data node, judge the abnormal data node for the malfunctioning node.
By the further screening to abnormal heartbeats query result, the accuracy of determining malfunctioning node is improved, is contracted simultaneously
It is small further judged by monitoring node abnormal nodes whether be malfunctioning node range, saved system resource.
Embodiment three
Referring to Fig. 7, the embodiment of the invention provides a kind of heartbeat detection device, including monitoring node 31, the monitoring section
Putting 31 includes:
Topography module 311 is used for pre-generatmg Polygon Topology structure;
Configuration module 312, for being sent to according to Polygon Topology Structure Creating configuration file, and by the configuration file
Back end;
Judgment module 313, for executing the abnormal heartbeats reported after the configuration file inquiry according to the back end
As a result and the back end total number of the abnormal heartbeats query result is reported to determine malfunctioning node, wherein on described
The back end of the abnormal heartbeats query result is reported to have same adjacent data node.
In the prior art, monitoring node 31 needs to carry out heartbeat communication with each back end, to each data
Node is detected, and in the present embodiment, monitoring node 31 carries out heartbeat detection to back end in the following ways.Monitoring section
The abnormal heartbeats query result that point 31 is provided by receiving back end, detects the abnormal data node emphasis screened,
And then final malfunctioning node is determined in abnormal data node, this just reduces detection range, greatly reduces monitoring node
31 load.
In the present embodiment, Polygon Topology structure is honeycomb hexagonal topology structure.
In the present embodiment, judgment module 313 is also used to count in the abnormal heartbeats query result, the abnormal data
The adjacent data node number that node is reported by its different adjacent data node, when the adjacent data node number reaches setting
Number and be less than the abnormal data node adjacent data node total number when, Xiang Suoshu abnormal data node initiate failure
Inquiry, judges whether the abnormal data node is the malfunctioning node.
In the present embodiment, judgment module 313 is also used to count in the abnormal heartbeats query result, the abnormal data
The adjacent data node number that node is reported by its different adjacent data node, when the adjacent data node number is equal to described
When the adjacent data node total number of abnormal data node, judge the abnormal data node for the malfunctioning node.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member
It is physically separated with being or may not be, component shown as a unit may or may not be physics list
Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs
In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness
Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on
Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should
Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers
It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation
Method described in certain parts of example or embodiment.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (16)
1. a kind of cluster heartbeat detecting method, which comprises the steps of:
Node pre-generatmg Polygon Topology structure is monitored, and according to the Polygon Topology Structure Creating configuration file, and by institute
It states configuration file and is sent to back end;
The back end periodically carries out heartbeat inquiry according to the configuration file, to its adjacent data node, and will generate
Abnormal heartbeats query result report to the monitoring node;
The monitoring node is according to the abnormal heartbeats query result and the data for reporting the abnormal heartbeats query result
Node total number determines malfunctioning node, wherein the back end for reporting the abnormal heartbeats query result has together
One adjacent data node;
The monitoring node is according to the abnormal heartbeats query result and the data for reporting the abnormal heartbeats query result
Node total number determines malfunctioning node, comprising:
In abnormal heartbeats query result described in the monitoring node statistics, abnormal data node is by its different adjacent data node
The adjacent data node number of report;When the adjacent data node number reaches setting number and is less than the abnormal data section
When the adjacent data node total number of point, the monitoring node is to abnormal data node initiation fault inquiry, described in judgement
Whether abnormal data node is the malfunctioning node.
2. cluster heartbeat detecting method according to claim 1, which is characterized in that described according to the Polygon Topology knot
The step of structure creation configuration file, specifically includes:
The monitoring node distributes the ID value of the vertex node of the Polygon Topology structure to the back end;
The monitoring node is saved according to the ID value of distribution to the back end and the vertex of the Polygon Topology structure
Neighbouring relations between point create the configuration file.
3. cluster heartbeat detecting method according to claim 2, which is characterized in that the Polygon Topology structure is honeycomb
Hexagonal topology structure.
4. cluster heartbeat detecting method according to claim 3, which is characterized in that the configuration file includes at least heartbeat
The polling interval time, response time-out time, heartbeat inquiry mode, adjacent data node ID value.
5. cluster heartbeat detecting method according to claim 4, which is characterized in that the back end is according to the configuration
File periodically carries out heartbeat inquiry to adjacent data node, and the abnormal heartbeats query result of generation is reported to the prison
It is specifically included depending on the step of node:
The back end is every the heartbeat polling interval time, according to the heartbeat inquiry mode in the configuration file
The adjacent data node carries out heartbeat inquiry;
If the back end is in the response time-out time, to adjacent data nodes heart beat described in the configuration file
Inquiry failure, then generate abnormal heartbeats query result, and the abnormal heartbeats query result is reported to the monitoring node.
6. cluster heartbeat detecting method according to claim 5, which is characterized in that the abnormal heartbeats query result is at least
ID including reporting the ID value of the back end of abnormal heartbeats query result and the abnormal data node of heartbeat inquiry failure
Value.
7. cluster heartbeat detecting method according to claim 6, which is characterized in that the monitoring node is according to the exception
Heartbeat query result and the back end total number of the abnormal heartbeats query result is reported to determine malfunctioning node, wherein
The back end for reporting the abnormal heartbeats query result has the step of same adjacent data node specifically:
In abnormal heartbeats query result described in the monitoring node statistics, the abnormal data node is by its different adjacent data section
The adjacent data node number that point reports, when the adjacent data node number is equal to the adjacent data of the abnormal data node
When node total number, judge the abnormal data node for the malfunctioning node.
8. a kind of palmus detection system, including monitoring node and back end, which is characterized in that the monitoring node includes topology
Module, configuration module, judgment module;The back end includes enquiry module, result-generation module;
The topography module is used for pre-generatmg Polygon Topology structure;
The configuration module, for being sent according to the Polygon Topology Structure Creating configuration file, and by the configuration file
To the back end;
The enquiry module, for periodically carrying out heartbeat inquiry to its adjacent data node according to the configuration file;
The result-generation module, for the abnormal heartbeats query result of generation to be reported to the monitoring node;
The judgment module, for according to the abnormal heartbeats query result and reporting described in the abnormal heartbeats query result
Back end total number determines malfunctioning node, wherein the back end for reporting the abnormal heartbeats query result is equal
There is same adjacent data node;
The judgment module is also used to count in the abnormal heartbeats query result, and abnormal data node is by its different consecutive number
The adjacent data node number reported according to node;When the adjacent data node number reaches setting number and is less than described different
When the adjacent data node total number of regular data node, Xiang Suoshu abnormal data node initiates fault inquiry, judges the exception
Whether back end is the malfunctioning node.
9. palmus detection system according to claim 8, which is characterized in that the configuration module is also used to will be described polygon
The ID value of the vertex node of shape topological structure is distributed to the back end, and according to the distribution to the back end
ID value and the Polygon Topology structure the vertex node between neighbouring relations, create the configuration file.
10. palmus detection system according to claim 9, which is characterized in that the Polygon Topology structure is honeycomb six
Side shape topological structure.
11. palmus detection system according to claim 10, which is characterized in that the configuration file is looked into including at least heartbeat
Ask the ID value of interval time, response time-out time, heartbeat inquiry mode, adjacent data node;
The enquiry module was also used to every the heartbeat polling interval time, according to the heartbeat inquiry mode to the configuration
Adjacent data node described in file carries out heartbeat inquiry;
The result-generation module is also used in the response time-out time, and the enquiry module is in the configuration file
When the adjacent data nodes heart beat inquiry failure, abnormal heartbeats query result is generated.
12. palmus detection system according to claim 11, which is characterized in that the abnormal heartbeats query result at least wraps
Include the ID value of the abnormal data node of the ID value for reporting the back end of abnormal heartbeats query result and heartbeat inquiry failure.
13. palmus detection system according to claim 12, which is characterized in that the judgment module is also used to:
It counts in the abnormal heartbeats query result, the abnormal data node is reported adjacent by its different adjacent data node
Back end number, when the adjacent data node number is equal to the adjacent data node total number of the abnormal data node
When, judge the abnormal data node for the malfunctioning node.
14. a kind of heartbeat detection device, including monitoring node, which is characterized in that the monitoring node includes:
Topography module is used for pre-generatmg Polygon Topology structure;
Configuration module, for being sent to number according to the Polygon Topology Structure Creating configuration file, and by the configuration file
According to node;
Judgment module, for executed according to the back end abnormal heartbeats query result reported after the configuration file and
Report the back end total number of the abnormal heartbeats query result to determine malfunctioning node, wherein it is described report it is described different
The back end of normal heartbeat query result has same adjacent data node;
The judgment module is also used to count in the abnormal heartbeats query result, and abnormal data node is by its different consecutive number
The adjacent data node number reported according to node;When the adjacent data node number reaches setting number and is less than described different
When the adjacent data node total number of regular data node, Xiang Suoshu abnormal data node initiates fault inquiry, judges the exception
Whether back end is the malfunctioning node.
15. heartbeat detection device according to claim 14, which is characterized in that the Polygon Topology structure is honeycomb six
Side shape topological structure.
16. heartbeat detection device according to claim 15, which is characterized in that the judgment module is also used to:
It counts in the abnormal heartbeats query result, the abnormal data node is reported adjacent by its different adjacent data node
Back end number, when the adjacent data node number is equal to the adjacent data node total number of the abnormal data node
When, judge the abnormal data node for the malfunctioning node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710106792.4A CN106656682B (en) | 2017-02-27 | 2017-02-27 | Cluster heartbeat detecting method, system and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710106792.4A CN106656682B (en) | 2017-02-27 | 2017-02-27 | Cluster heartbeat detecting method, system and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106656682A CN106656682A (en) | 2017-05-10 |
CN106656682B true CN106656682B (en) | 2019-10-25 |
Family
ID=58846801
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710106792.4A Expired - Fee Related CN106656682B (en) | 2017-02-27 | 2017-02-27 | Cluster heartbeat detecting method, system and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106656682B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11212204B2 (en) | 2017-06-30 | 2021-12-28 | Xi'an Zhongxing New Software Co., Ltd. | Method, device and system for monitoring node survival state |
CN109257195B (en) | 2017-07-12 | 2021-01-15 | 华为技术有限公司 | Fault processing method and equipment for nodes in cluster |
CN107566219B (en) * | 2017-09-27 | 2020-09-18 | 华为技术有限公司 | Fault diagnosis method applied to cluster system, node equipment and computer equipment |
CN109697193A (en) * | 2017-10-24 | 2019-04-30 | 中兴通讯股份有限公司 | A kind of method, node and the computer readable storage medium of determining abnormal nodes |
CN108235800B (en) * | 2017-12-19 | 2021-08-03 | 达闼机器人有限公司 | Network fault detection method, control center equipment and computer storage medium |
CN108092857A (en) * | 2018-01-15 | 2018-05-29 | 郑州云海信息技术有限公司 | A kind of distributed system heartbeat detecting method and relevant apparatus |
CN110365936B (en) * | 2018-04-11 | 2021-03-16 | 杭州海康威视系统技术有限公司 | Code stream obtaining method, device and system |
CN111225224A (en) * | 2018-11-27 | 2020-06-02 | 玲珑视界科技(北京)有限公司 | System and method for monitoring state of grid node |
CN109474694A (en) * | 2018-12-04 | 2019-03-15 | 郑州云海信息技术有限公司 | A kind of management-control method and device of the NAS cluster based on SAN storage array |
CN110611603B (en) * | 2019-09-09 | 2021-08-31 | 苏州浪潮智能科技有限公司 | Cluster network card monitoring method and device |
CN112988463B (en) * | 2021-02-23 | 2022-08-30 | 新华三大数据技术有限公司 | Fault node isolation method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1893370A (en) * | 2005-06-29 | 2007-01-10 | 国际商业机器公司 | Server cluster recovery and maintenance method and system |
CN103117901A (en) * | 2013-02-01 | 2013-05-22 | 华为技术有限公司 | Distributed heartbeat detection method, device and system |
CN106452952A (en) * | 2016-09-29 | 2017-02-22 | 华为技术有限公司 | Method for detecting communication state of cluster system and gateway cluster |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070294600A1 (en) * | 2006-05-08 | 2007-12-20 | Inventec Corporation | Method of detecting heartbeats and device thereof |
-
2017
- 2017-02-27 CN CN201710106792.4A patent/CN106656682B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1893370A (en) * | 2005-06-29 | 2007-01-10 | 国际商业机器公司 | Server cluster recovery and maintenance method and system |
CN103117901A (en) * | 2013-02-01 | 2013-05-22 | 华为技术有限公司 | Distributed heartbeat detection method, device and system |
CN106452952A (en) * | 2016-09-29 | 2017-02-22 | 华为技术有限公司 | Method for detecting communication state of cluster system and gateway cluster |
Also Published As
Publication number | Publication date |
---|---|
CN106656682A (en) | 2017-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106656682B (en) | Cluster heartbeat detecting method, system and device | |
CN109714192B (en) | Monitoring method and system for monitoring cloud platform | |
CN105165054B (en) | Network service failure processing method, service management system and system management module | |
US7685269B1 (en) | Service-level monitoring for storage applications | |
EP2606607B1 (en) | Determining equivalent subsets of agents to gather information for a fabric | |
KR20040093441A (en) | Method and apparatus for discovering network devices | |
JPH09186688A (en) | Improved node discovery and network control system with monitoring | |
CN109039795B (en) | Cloud server resource monitoring method and system | |
CN113742066A (en) | Load balancing system and method for server cluster | |
CN106021070A (en) | Method and device for server cluster monitoring | |
CN108156040A (en) | A kind of central control node in distribution cloud storage system | |
CN107070744A (en) | Server monitoring method | |
CN109657005A (en) | A kind of data cache method of distributed cluster system, device and equipment | |
CN109067600A (en) | A kind of private clound management platform system and its task processing method | |
CN115202958A (en) | Power abnormity monitoring method and device, electronic equipment and storage medium | |
CN112437145A (en) | Server cluster management method and device and related components | |
CN106790610A (en) | A kind of cloud system message distributing method, device and system | |
US8275865B2 (en) | Methods, systems and computer program products for selecting among alert conditions for resource management systems | |
EP3424182B1 (en) | Neighbor monitoring in a hyperscaled environment | |
CN106899659B (en) | Distributed system and management method and management device thereof | |
US7379970B1 (en) | Method and system for reduced distributed event handling in a network environment | |
CN114363150A (en) | Network card connectivity monitoring method and device for server cluster | |
CN115118635A (en) | Time delay detection method, device, equipment and storage medium | |
CN108781215B (en) | Network service implementation method, service controller and communication system | |
CN114327849A (en) | Resource scheduling method based on intelligent monitoring |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20191025 |
|
CF01 | Termination of patent right due to non-payment of annual fee |