CN109495543A - The management method and device of monitor in a kind of ceph cluster - Google Patents

The management method and device of monitor in a kind of ceph cluster Download PDF

Info

Publication number
CN109495543A
CN109495543A CN201811204207.5A CN201811204207A CN109495543A CN 109495543 A CN109495543 A CN 109495543A CN 201811204207 A CN201811204207 A CN 201811204207A CN 109495543 A CN109495543 A CN 109495543A
Authority
CN
China
Prior art keywords
monitor
metric value
stability metric
backup
given threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811204207.5A
Other languages
Chinese (zh)
Other versions
CN109495543B (en
Inventor
王彦斌
顾雷雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Technologies Co Ltd
Original Assignee
New H3C Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Technologies Co Ltd filed Critical New H3C Technologies Co Ltd
Priority to CN201811204207.5A priority Critical patent/CN109495543B/en
Publication of CN109495543A publication Critical patent/CN109495543A/en
Application granted granted Critical
Publication of CN109495543B publication Critical patent/CN109495543B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/54Presence management, e.g. monitoring or registration for receipt of user log-on information, or the connection status of the users
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0893Assignment of logical groups to network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)
  • Hardware Redundancy (AREA)

Abstract

This application discloses a kind of management methods of monitor in ceph cluster, the monitor of ceph cluster includes several primary monitors and several backup monitors, and there is corresponding stability metric value for the maintenance of each monitor, method includes: when the network state for monitoring any monitor becomes DOWN from UP, by the cumulative increment of the stability metric value of any monitor;If any monitor is primary monitor, judge whether the stability metric value of any monitor is more than or equal to the first given threshold;If it is determined that result be it is yes, then select a stability metric value to be less than or equal to the backup monitor of the second given threshold as primary monitor, and using any monitor as backup monitor from several backup monitors.The role of each monitor is adjusted according to the stability metric value of each monitor using the above method, selects more stable monitor as primary monitor, to promote the stability of ceph cluster.

Description

The management method and device of monitor in a kind of ceph cluster
Technical field
This application involves the management methods and dress of monitor in technical field of data storage more particularly to a kind of ceph cluster It sets.
Background technique
Ceph is a kind of unification designed for outstanding performance, reliability and scalability, distributed file system. In Ceph cluster, the status information of management, maintenance and publication cluster is collectively responsible for by several monitors (monitor);? A leader (leader) can be selected in several monitor, member is elected in other common participations in these monitor (peon) under the leader of the leader, the latest edition of cluster diagram (cluster map) is generated, then sends out the latest edition The entire objects into Ceph cluster are sent to store equipment (Object-based Storage Device, OSD) and client (Client).OSD carries out the maintenance of data using cluster map, and Client carries out seeking for data using cluster map Location.In general Monitor can be individually deployed on physical host, Monitor and memory node can also be deployed in this On physical host.
When carrying out leader election, a committee (quorum) is first collectively formed by monitor, then the committee Member selects leader in inside.The a member of each monitor as quorum, for safeguarding the healthy shape of entire ceph cluster Condition maintains every important information in ceph cluster, is the pivotal player in ceph cluster, the health status of Monitor will Directly affect the stabilization of entire ceph cluster.
During leader election, Ceph can not externally provide service, until electing leader, and in leader Leading under formed cluster map master version.If there are monitor to restart in quorum, in network exist concussion, The unstable factors such as delay will cause and initiate leader election in quorum repeatedly.So, entire monitor cluster can be always In election state, waste of resource is unfavorable for the stabilization of ceph cluster, and can not externally provide service.
Summary of the invention
The application provides the management method and device of monitor in a kind of ceph cluster, for solving to exist in the related technology Lead to initiate leader election repeatedly in quorum since monitor frequently occurs exception, it is unstable so as to cause ceph cluster It is fixed, the problem of service, can not be externally provided.
To achieve the above object, the embodiment of the present application the technical solution adopted is as follows:
In a first aspect, the embodiment of the present application provides a kind of management method of monitor in ceph cluster, above-mentioned ceph collection The monitor of group includes several primary monitors and several backup monitors, and has corresponding stabilization for the maintenance of each monitor Metric, the above method include:
When the network state for monitoring any monitor becomes DOWN from UP, stablizing for any of the above-described monitor is measured It is worth the increment that adds up;
If any of the above-described monitor is primary monitor, judge whether the stability metric value of any of the above-described monitor is greater than Equal to the first given threshold;
If it is determined that result be it is yes, then select stability metric value to be less than or equal to second from above-mentioned several backup monitors The backup monitor of given threshold is as primary monitor, and using any of the above-described monitor as backup monitor, wherein above-mentioned First given threshold is greater than above-mentioned second given threshold.
Optionally, after an increment that the stability metric value of any of the above-described monitor adds up, the above method further include:
Start the corresponding drop timer of any of the above-described monitor, and presses preset attenuation function within the current attenuation period Decay to the stability metric value of any of the above-described monitor.
Optionally, it is above-mentioned within the current attenuation period by preset attenuation function to the corresponding stabilization of any monitor The step of metric is decayed include:
Within the current attenuation period, declined with stability metric value of the specified attenuation coefficient to any of the above-described monitor Subtract;
Wherein, above-mentioned specified attenuation coefficient meets: the stabilization of any of the above-described monitor in current attenuation end cycle Metric is the half of the stability metric value of any of the above-described monitor when the current attenuation period starts.
Optionally, the above method further include:
Any of the above-described monitor was in the current attenuation period, if monitoring the network state of any of the above-described monitor by UP Become DOWN, then by the cumulative increment of the stability metric value of any of the above-described monitor, and restarts drop timer, enter Next damped cycle;
In the current attenuation end cycle of any of the above-described monitor, drop timer is restarted, so that above-mentioned One monitor enters next damped cycle.
Optionally, the above method further include:
If the stability metric value after any of the above-described monitor decaying is less than or equal to third given threshold, by any of the above-described prison The stability metric value of visual organ is set to initial value, wherein above-mentioned third given threshold is less than said one increment.
Optionally, a stability metric value is selected to be less than or equal to the second given threshold from above-mentioned several backup monitors Backup monitor includes: as the step of primary monitor
It determines the stability metric value of each backup monitor, and judges whether there is two settings that stability metric value is less than or equal to The backup monitor of threshold value;
If it is determined that there are the stability metric values of M backup monitor to be less than or equal to the second given threshold, then it is standby from above-mentioned M The smallest backup monitor of stability metric value is selected as primary monitor in part monitor;
If it is determined that there are the stability metric value of N number of backup monitor be minimum value in above-mentioned M backup monitor, then from A backup monitor is randomly choosed in above-mentioned N number of backup monitor as primary monitor.
Second aspect, the embodiment of the present application provide a kind of managing device of monitor in ceph cluster, above-mentioned ceph collection The monitor of group includes several primary monitors and several backup monitors, and has corresponding stabilization for the maintenance of each monitor Metric, above-mentioned apparatus include:
Monitoring unit, when for becoming DOWN from UP in the network state for monitoring any monitor, by any of the above-described prison The cumulative increment of the stability metric value of visual organ;
Judging unit, for judging any of the above-described monitor when determining any of the above-described monitor is primary monitor Whether stability metric value is more than or equal to the first given threshold;
Selecting unit, for selecting one to stablize measurement from above-mentioned several backup monitors when determining result is to be Value is less than or equal to the backup monitor of the second given threshold as primary monitor, and supervises any of the above-described monitor as backup Visual organ, wherein above-mentioned first given threshold is greater than above-mentioned second given threshold.
Optionally, above-mentioned apparatus further includes measurement adjustment unit, is added up by the stability metric value of any of the above-described monitor After one increment, above-mentioned measurement adjustment unit is used for:
Start the corresponding drop timer of any of the above-described monitor, and presses preset attenuation function within the current attenuation period Decay to the stability metric value of any of the above-described monitor.
Optionally, above-mentioned that stablizing for any monitor is measured by preset attenuation function within the current attenuation period When value is decayed, above-mentioned measurement adjustment unit is used for:
Within the current attenuation period, declined with stability metric value of the specified attenuation coefficient to any of the above-described monitor Subtract;
Wherein, above-mentioned specified attenuation coefficient meets: the stabilization of any of the above-described monitor in current attenuation end cycle Metric is the half of the stability metric value of any of the above-described monitor when the current attenuation period starts.
Optionally, any of the above-described monitor was in the current attenuation period, if above-mentioned monitoring unit monitor it is any of the above-described The network state of monitor becomes DOWN from UP, then above-mentioned measurement adjustment unit tires out the stability metric value of any of the above-described monitor Add an increment, and restart drop timer, so that any of the above-described monitor enters next damped cycle;
At the end of the damped cycle of any of the above-described monitor, above-mentioned measurement adjustment unit restarts drop timer, So that any of the above-described monitor enters next damped cycle.
Optionally, above-mentioned measurement adjustment unit is also used to:
If the stability metric value after any of the above-described monitor decaying is less than or equal to third given threshold, by any of the above-described prison The stability metric value of visual organ is set to initial value, wherein above-mentioned third given threshold is less than said one increment.
Optionally, a stability metric value is selected to be less than or equal to the second given threshold from above-mentioned several backup monitors When backup monitor is as primary monitor, above-mentioned selecting unit is used for:
It determines the stability metric value of each backup monitor, and judges whether there is two settings that stability metric value is less than or equal to The backup monitor of threshold value;
If it is determined that there are the stability metric values of M backup monitor to be less than or equal to the second given threshold, then it is standby from above-mentioned M The smallest backup monitor of stability metric value is selected as primary monitor in part monitor;
If it is determined that there are the stability metric value of N number of backup monitor be minimum value in the M backup monitor, then from A backup monitor is randomly choosed in above-mentioned N number of backup monitor as primary monitor.
The third aspect, the embodiment of the present application also provides a kind of calculating equipment, which includes:
Memory, for storing program instruction;
Processor executes as above for calling the program instruction stored in above-mentioned memory according to the program instruction of acquisition The step of stating any one of first aspect above-mentioned method.
Fourth aspect, it is above-mentioned computer-readable to deposit the embodiment of the present application also provides a kind of computer readable storage medium Storage media is stored with computer executable instructions, and above-mentioned computer executable instructions are for making above-mentioned computer execute such as above-mentioned the The step of any one of one side above method.
Achieved by the application the utility model has the advantages that
In conclusion in the embodiment of the present application, it, will when the network state for monitoring any monitor becomes DOWN from UP The corresponding stability metric value of any of the above-described monitor adds up an increment;Stability metric value is judged whether there is more than or equal to first The primary monitor of given threshold;If it is determined that result be it is yes, then from above-mentioned several backup monitors select one stablize measurement Value is less than or equal to the backup monitor of the second given threshold as primary monitor, and using any of the above-described primary monitor as standby Part monitor, wherein above-mentioned first given threshold is greater than above-mentioned second given threshold.
Using monitor management method provided by the embodiments of the present application, by the network state of each monitor, dynamic is adjusted The stability metric value of each monitor in ceph cluster safeguards the health status of each monitor, and according to the healthy shape of each monitor The role of the dynamic auto each monitor of adjustment of condition, selects more stable monitor as primary monitor, reduces primary monitor The probability to break down, promotes the stability of quorum, to promote the stability of ceph cluster.
Detailed description of the invention
Fig. 1 is role's configuration schematic diagram of each host in a kind of ceph cluster provided in the embodiment of the present application;
Fig. 2 is the detail flowchart of the management method of monitor in a kind of ceph cluster provided in the embodiment of the present application;
Fig. 3 is the variation schematic diagram of the stability metric value of the monitor provided in the embodiment of the present application;
Fig. 4 is the structural schematic diagram of the managing device of monitor in a kind of ceph cluster provided in the embodiment of the present application;
Fig. 5 is a kind of structural schematic diagram of the calculating equipment provided in the embodiment of the present application.
The embodiments will be further described with reference to the accompanying drawings for realization, functional characteristics and the advantage of the application purpose.
Specific embodiment
Firstly, term "and" in the embodiment of the present application, a kind of only incidence relation for describing affiliated partner, expression can be with There are three kinds of relationships, for example, A and B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.Separately Outside, character "/" herein typicallys represent the relationship that forward-backward correlation object is a kind of "or".
When the application refers to ordinal numbers such as " first ", " second ", " third " or " the 4th ", unless based on context it is true The meaning of real order of representation, it is appreciated that being only to distinguish to be used.
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, is not whole embodiments.Based on this Embodiment in application, every other reality obtained by those of ordinary skill in the art without making creative efforts Example is applied, the application scope of protection is belonged to.
In the embodiment of the present application, pre-defining in ceph cluster has several primary monitors and backup monitor, and is directed to Each primary monitor and the maintenance of each backup monitor have corresponding stability metric value.
Wherein, primary monitor is a member in the committee (quorum), and primary monitor is for safeguarding entire ceph collection The health status of group safeguards every important information in entire ceph cluster;Backup monitor is candidate monitor.One monitoring The corresponding stability metric value of device can be used for characterizing the frequency that a monitor breaks down whithin a period of time, can be used to table Levy the stability of a monitor.
Illustratively, as shown in fig.1, in the embodiment of the present application, role's configuration schematic diagram of each host in ceph cluster. Several primary monitor (e.g., host 10, host 11 ... ..., host n) and several backup monitors are included at least in Ceph cluster (e.g., host 20, host 21 ... ..., host m) can also include further other hosts (e.g., host 30, host 31 ... ..., host p).Primary monitor and backup monitor both can be used as monitoring node and use, and can also be used as storage section Point uses, and other hosts in addition to above-mentioned primary monitor and backup monitor are only capable of using as memory node.
The embodiment of the present application proposes a kind of management method of monitor in ceph cluster, for dynamically adjusting in ceph cluster As the primary monitor of a member in quorum, selects more stable monitor as primary monitor, reduce primary monitoring The probability that device breaks down, promotes the stability of quorum, to promote the stability of ceph cluster.
Illustratively, as shown in fig.2, in a kind of ceph cluster provided in the embodiment of the present application monitor manager The detailed process of method is as follows:
Step 200: when the network state for monitoring any monitor becomes DOWN from UP, by any of the above-described monitor Stability metric value adds up an increment.
In the embodiment of the present application, for monitor each in ceph cluster (including primary monitor and backup monitor) point Wei Hu there be corresponding stability metric value, the initial value of the stability metric value of each monitor is 0.It is provided by the present application executing When monitor management method, the network state of each monitor is monitored, the network state of a monitor becomes from UP if monitoring When DOWN, determine that monitor goes offline, at this time, it may be necessary to the stability metric value of a monitor be adjusted, specifically, this is supervised The cumulative increment k (e.g., k=100) of the stability metric value of visual organ, that is to say, that the every DOWN mono- of the network of a monitor It is secondary, the stability metric value of a monitor just add up k.
In practical application, when the network state of a monitor becomes UP from DOWN, the stability of a monitor Magnitude does not adjust accordingly.
For example, it is assumed that an increment k is 100, when monitoring that the network state of primary monitor 1 becomes DOWN from UP, The stability metric value of primary monitor 1 is 130, then, it is needed at this time on the basis of the original stability metric value of primary monitor 1 Upper cumulative 100, the stability metric value of primary monitor 1 is 230 after adding up.
In the embodiment of the present application, a kind of optional embodiment is to add up by the stability metric value of any of the above-described monitor After one increment, start the corresponding drop timer of any of the above-described monitor, and decline within the current attenuation period by preset Subtraction function decays to the stability metric value of any of the above-described monitor.
That is, having corresponding drop timer, damped cycle for each monitor arrangement in the embodiment of the present application For t, when the stability metric value of any monitor is not initial value, the corresponding drop timer starting of any monitor should Any monitor is in a damped cycle (i.e. current attenuation period), and in this prior in the period, any monitor Stability metric value linearly declines, and fall need to meet claimed below: in this prior when end cycle, any monitor pair The stability metric value answered drops to designated value.
Specifically, a kind of optional embodiment is, by preset attenuation function to any of the above-described within the current attenuation period The step of stability metric value of monitor is decayed includes: within the current attenuation period, with specified attenuation coefficient to above-mentioned The stability metric value of any monitor is decayed;Wherein, above-mentioned specified attenuation coefficient meets: in current attenuation end cycle When any of the above-described monitor stability metric value be any of the above-described monitor when the current attenuation period starts stability metric value Half.
For example, it is assumed that damped cycle is t, when the current attenuation period starts, stability metric value is primary monitor 1 180, in order to ensure in current attenuation end cycle, stability metric value can drop to 90, then, in the current attenuation period It is interior, it needs to decay with specified attenuation coefficient (e.g., ﹣ 90/t) to 180.Obviously, it in the case where damped cycle is fixed, declines The size for subtracting coefficient is related to the size of the current stability metric value of monitor.
Further, any of the above-described monitor was in the current attenuation period, if the network state of any of the above-described monitor DOWN is become from UP, then by the cumulative increment of the stability metric value of any of the above-described monitor, and restarts drop timer, So that any of the above-described monitor enters next damped cycle.
When i.e. a monitor is in damped cycle, if monitoring again, the network state of a monitor becomes from UP When for DOWN (i.e. a monitor going offline again), need the cumulative increment of the stability metric value of a monitor Meanwhile drop timer is restarted, so that a monitor reenters a new damped cycle.
For example, it is assumed that an increment k is 100, in the current attenuation period, need with decay coefficient K1By primary monitor 1 Stability metric value be reduced to 115 from 230, be reduced to 150 (not being reduced to 115 also) in the stability metric value of primary monitor 1 When, if monitoring again, the network state of primary monitor 1 becomes DOWN from UP, by the stability metric value of primary monitor 1 Cumulative 100,250 are increased to, and restart the corresponding drop timer of primary monitor 1, terminates the current attenuation period, go forward side by side Enter next damped cycle, and in next damped cycle, it needs with decay coefficient K2By the stability metric value of primary monitor 1 from 250 are reduced to 125.
In the current attenuation end cycle of any of the above-described monitor, drop timer is restarted, so that above-mentioned One monitor enters next damped cycle.
For example, it is assumed that corresponding 1 time-out of fade timers of primary monitor 1, the corresponding stability metric value of primary monitor 1 From 400 be reduced to 200 after, fade timers 1 are restarted immediately, so that primary monitor 1 enters the next decaying of decaying Period.
Step 210: if any of the above-described monitor is primary monitor, judging the stability metric value of any of the above-described monitor Whether the first given threshold is more than or equal to.
Specifically, in the embodiment of the present application, it, can be with each monitor of Dynamic Maintenance by monitoring the network state of each monitor Stability metric value, then, any of the above-described monitor stability metric value add up an increment after, if any of the above-described monitor For primary monitor, then need whether the stability metric value for judging any of the above-described monitor is more than or equal to the first given threshold.If It is to then follow the steps 220, otherwise, executes step 200.
In practical application, if the stability metric value of a monitor is more than or equal to the first given threshold, it is determined that this Monitor health status is unstable, is not suitable anymore for as primary monitor, becomes a member in the committee.If a monitor For primary monitor, then need to select from backup monitor one it is stable, be suitable as the backup monitoring of primary monitor Device substitutes a primary monitor, and in the addition committee;If a monitor is backup monitor, this is continued to The role of one monitor is constant.
Of course, it is possible to corresponding first given threshold is set according to different application scene and/or different user demands, this Apply being not specifically limited herein in embodiment.
Step 220: if it is determined that result be it is yes, then from above-mentioned several backup monitors select a stability metric value be less than Equal to the second given threshold backup monitor as primary monitor, and using any of the above-described monitor as backup monitor, Wherein, above-mentioned first given threshold is greater than above-mentioned second given threshold.
Specifically, if it is determined that the stability metric value of a primary monitor is more than or equal to the first given threshold, it is determined that each The stability metric value of backup monitor, and judge whether there is the backup monitoring for two given thresholds that stability metric value is less than or equal to Device;If it is determined that there are the stability metric values of M backup monitor to be less than or equal to the second given threshold, then supervised from above-mentioned M backup The smallest backup monitor of stability metric value is selected in visual organ as primary monitor.
For example, it is assumed that the second given threshold is 150, the stability metric value of backup monitor 1 is 130, backup monitor 2 Stability metric value is 120, and the stability metric value of backup monitor 3 is 0, backup monitor 1, backup monitor 2 and backup monitoring The stability metric value of device 3 is respectively less than the second given threshold, then, it, then can be with since the stability metric value of backup monitor 3 is minimum Select backup monitor 3 as primary monitor.
Further, if it is determined that there are the stability metric value of N number of backup monitor being most in above-mentioned M backup monitor Small value then randomly chooses a backup monitor as primary monitor from above-mentioned N number of backup monitor.
For example, it is assumed that the second given threshold is 150, the stability metric value of backup monitor 1 is 130, backup monitor 2 Stability metric value is 120, backup monitor 3, and the stability metric value of backup monitor 4 and backup monitor 5 is 0, then can be with Backup monitor 3 is selected, any one backup monitor in backup monitor 4 and backup monitor 5 is as primary monitor.
Of course, it is possible to corresponding second given threshold is set according to different application scene and/or different user demands, this Apply being not specifically limited herein in embodiment.
Further, if the stability metric value after the decaying of any of the above-described monitor is less than or equal to third given threshold, The stability metric value of any of the above-described monitor is set to 0 (e.g., initial value 0), wherein above-mentioned third given threshold is less than above-mentioned One increment.
For example, it is assumed that the corresponding stabilization measurement initial value of primary monitor 1 is 0, an increment is 100, and third sets threshold Value is 40, and when the stability metric value of primary monitor 1 is reduced to 40, the stability metric value of primary monitor 1 is set to 0.
Of course, it is possible to corresponding third given threshold is set according to different application scene and/or different user demands, this Apply being not specifically limited herein in embodiment.
It should be appreciated that specific embodiment described herein is only used to explain the application, it is not used to limit the application.
Below with reference to shown in Fig. 3, to the stability metric value of a monitor (e.g., host 10) in the embodiment of the present application Change procedure elaborates.
Specifically, hypothesis host 10 is primary monitor in the initial stage, it is initial for the preset stable measurement of host 10 Value is 0, monitors that the network state of host 10 becomes DOWN from UP at the T1 moment, by the cumulative k of the stability metric value of host 10, and Start the corresponding drop timer 10 of host, into the first damped cycle, wherein the when a length of t of damped cycle is decaying In the process, it monitors that the network state of host 10 becomes DOWN from UP again at the T2 moment, the stability metric value of host 10 is tired out Add k, restart drop timer 10, into the second damped cycle, in attenuation process, monitors host 10 again in the T3 time Network state becomes DOWN from UP, by the cumulative k of the stability metric value of host 10, restarts drop timer 10, decays into third Period monitors that the network state of host 10 becomes DOWN from UP in the T4 time, by the steady of host 10 in attenuation process again Determine the cumulative k of metric, restart drop timer 10, into the 4th damped cycle, at this point, the stability metric value (X) of host 10 is super The first given threshold is crossed, the backup for needing that a stability metric value is selected to be less than or equal to the second given threshold from backup monitor The committee is added in monitor (e.g., host 20), and to substitute host 10, host 10 is not re-used as a member of the committee, at the T5 moment, At the end of 4th damped cycle, the stability metric value of host 10 becomes X/2 from X, and restarts drop timer 10, declines into the 5th Subtract the period, at the end of the T6 moment, the 5th damped cycle, the stability metric value of host 10 becomes X/4 from X/2, and restarts decaying Timer 10, into the 6th damped cycle, at the end of the T7 moment, the 6th damped cycle, the stability metric value of host 10 is from X/4 Become X/8, if third given threshold is X/8, the stability metric value of host 10 is directly set to 0, if third given threshold is big In X/8, it is less than X/4, then in the attenuation process of the 6th damped cycle, decays to the from X/4 in the stability metric value of host 10 When three given thresholds, the stability metric value of host 10 is set to 0, if third given threshold is less than X/8, restarts drop timer 10, into the 7th damped cycle, until when the stability metric value of host 10 decays to third given threshold, by the stabilization of host 10 Metric is set to 0.
In practical application, the cluster map of ceph cluster includes monitor map, OSD map, PG map, CRUSH Map and MDS map, in the embodiment of the present application, a kind of optional embodiment is that the cluster map of ceph cluster further includes Health map, each primary monitor information, each standby monitor information, the stability metric value of each monitor, increment, decaying week Phase, the first given threshold, the second given threshold, the data such as third given threshold are stored in health map, by ceph cluster In primary monitor the maintenance mode of cluster map is safeguarded referring to current.
Based on inventive concept same as above method embodiment, a kind of ceph cluster is additionally provided in the embodiment of the present application The managing device of middle monitor, the monitor of above-mentioned ceph cluster include several primary monitors and several backup monitors, and There is corresponding stability metric value for the maintenance of each monitor.Illustratively, as shown in fig.4, being provided in the embodiment of the present application A kind of ceph cluster in monitor managing device structural schematic diagram, which includes:
Monitoring unit 40 will be any of the above-described when for becoming DOWN from UP in the network state for monitoring any monitor The corresponding stability metric value of monitor adds up an increment;
Judging unit 41, the primary monitoring for being more than or equal to the first given threshold for judging whether there is stability metric value Device;
Selecting unit 42, for selecting a stability from above-mentioned several backup monitors when determining result is to be Magnitude be less than or equal to the second given threshold backup monitor as primary monitor, and using any of the above-described primary monitor as Backup monitor, wherein above-mentioned first given threshold is greater than above-mentioned second given threshold.
Optionally, above-mentioned apparatus further includes measurement adjustment unit 43, is measured by the corresponding stabilization of any of the above-described monitor It is worth after a cumulative increment, measurement adjustment unit 43 is used for:
Start the corresponding drop timer of any of the above-described monitor, and presses preset attenuation function within the current attenuation period Decay to the corresponding stability metric value of any of the above-described monitor.
Optionally, it is above-mentioned within the current attenuation period by preset attenuation function to the corresponding stabilization of any monitor When metric is decayed, measurement adjustment unit 43 is used for:
Within the current attenuation period, the corresponding stability metric value of any of the above-described monitor is carried out with specified attenuation coefficient Decaying;
Wherein, stability metric value when above-mentioned specified attenuation coefficient meets current attenuation end cycle is current attenuation week The half of stability metric value when phase starts.
Optionally, if any of the above-described monitor was in the current attenuation period, monitoring unit 40 monitors any of the above-described prison The network state of visual organ becomes DOWN from UP, then measures adjustment unit 43 and tire out the corresponding stability metric value of any of the above-described monitor Add an increment, and restart drop timer, so that any of the above-described monitor enters next damped cycle;
If the damped cycle of any of the above-described monitor terminates, measures adjustment unit 43 and restart drop timer, with So that any of the above-described monitor enters next damped cycle.
Optionally, measurement adjustment unit 43 is also used to:
If the stability metric value after any of the above-described monitor decaying is less than or equal to third given threshold, by any of the above-described prison The stability metric value of visual organ is set to initial value, wherein above-mentioned third given threshold is less than said one increment.
Optionally, a stability metric value is selected to be less than or equal to the second given threshold from above-mentioned several backup monitors When backup monitor is as primary monitor, selecting unit 43 is used for:
It determines the stability metric value of each backup monitor, and judges whether there is two settings that stability metric value is less than or equal to The backup monitor of threshold value;
If it is determined that there are the stability metric values of M backup monitor to be less than or equal to the second given threshold, then above-mentioned M is selected The smallest backup monitor of stability metric value is as primary monitor in backup monitor;
If it is determined that there are the stability metric value of N number of backup monitor be minimum value in the M backup monitor, then from A backup monitor is randomly choosed in above-mentioned N number of backup monitor as primary monitor.
The above unit can be arranged to implement one or more integrated circuits of above method, such as: one Or multiple specific integrated circuits (Application Specific Integrated Circuit, abbreviation ASIC), or, one Or multi-microprocessor (digital singnal processor, abbreviation DSP), or, one or more field programmable gate Array (Field Programmable Gate Array, abbreviation FPGA) etc..For another example, when some above unit passes through processing elements When the form of part scheduler program code is realized, which can be general processor, such as central processing unit (Central Processing Unit, abbreviation CPU) or it is other can be with the processor of caller code.For another example, these units can integrate Together, it is realized in the form of system on chip (system-on-a-chip, abbreviation SOC).
The managing device that administration monitor is used in a kind of ceph cluster is also provided in the embodiment of the present application.Illustratively, As shown in fig.5, the structural schematic diagram of managing device provided by the embodiments of the present application, which is included at least: processor 50 and memory 51, in which:
Memory 50 is for storing program instruction;Processor 51 calls the program instruction that stores in memory 50, according to obtaining The program instruction obtained executes above method embodiment.Specific implementation is similar with technical effect, and which is not described herein again.
Optionally, the application also provides a kind of calculating equipment, including for execute above method embodiment at least one Processing element (or chip).
Optionally, the application also provides a kind of program product, such as computer readable storage medium, this is computer-readable to deposit Storage media is stored with computer executable instructions, which implements for making the computer execute the above method Example.
In several embodiments provided herein, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or logical of device or unit Letter connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) or processor (English: processor) execute this Shen Please each embodiment the method part steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (English: Read-Only Memory, abbreviation: ROM), random access memory (English: Random Access Memory, letter Claim: RAM), the various media that can store program code such as magnetic or disk.
The above is only preferred embodiment of the present application, are not intended to limit the scope of the patents of the application, all to utilize this Shen Please equivalent structure or equivalent flow shift made by specification and accompanying drawing content, be applied directly or indirectly in other relevant skills Art field similarly includes in the scope of patent protection of the application.

Claims (12)

1. the management method of monitor in a kind of ceph cluster, which is characterized in that the monitor of the ceph cluster includes several Primary monitor and several backup monitors, and have corresponding stability metric value, the method packet for the maintenance of each monitor It includes:
When the network state for monitoring any monitor becomes DOWN from UP, the stability metric value of any monitor is tired out Add an increment;
If any monitor is primary monitor, judge whether the stability metric value of any monitor is more than or equal to First given threshold;
If it is determined that result be it is yes, then select stability metric value to be less than or equal to the second setting from several backup monitors The backup monitor of threshold value is as primary monitor, and using any monitor as backup monitor, wherein described first Given threshold is greater than second given threshold.
2. the method as described in claim 1, which is characterized in that the stability metric value of any monitor is one cumulative After increment, the method also includes:
Start the corresponding drop timer of any monitor, and presses preset attenuation function to institute within the current attenuation period The stability metric value for stating any monitor is decayed.
3. method according to claim 2, which is characterized in that described to press preset attenuation function pair within the current attenuation period The step of stability metric value of any monitor is decayed include:
Within the current attenuation period, decayed with stability metric value of the specified attenuation coefficient to any monitor;
Wherein, the specified attenuation coefficient meets: in current attenuation end cycle, any monitor stablizes measurement Value is the half of the stability metric value of any monitor when the current attenuation period starts.
4. method according to claim 2, which is characterized in that the method also includes:
Any monitor was in the current attenuation period, if monitoring, the network state of any monitor is become from UP DOWN then by the cumulative increment of the stability metric value of any monitor, and restarts drop timer, so that institute Any monitor is stated into next damped cycle;
In the current attenuation end cycle of any monitor, drop timer is restarted, so that any prison Visual organ enters next damped cycle.
5. such as the described in any item methods of claim 2-4, which is characterized in that the method also includes:
If the stability metric value after any monitor decaying is less than or equal to third given threshold, by any monitor Stability metric value be set to initial value, wherein the third given threshold be less than one increment.
6. method according to any of claims 1-4, which is characterized in that select one from several backup monitors Stability metric value be less than or equal to the second given threshold backup monitor include: as the step of primary monitor
It determines the stability metric value of each backup monitor, and judges whether there is two given thresholds that stability metric value is less than or equal to Backup monitor;
If it is determined that there are the stability metric values of M backup monitor to be less than or equal to the second given threshold, then supervised from the M backup The smallest backup monitor of stability metric value is selected in visual organ as primary monitor;
If it is determined that there are the stability metric value of N number of backup monitor being minimum value in the M backup monitor, then from described A backup monitor is randomly choosed in N number of backup monitor as primary monitor.
7. the managing device of monitor in a kind of ceph cluster, which is characterized in that the monitor of the ceph cluster includes several Primary monitor and several backup monitors, and have corresponding stability metric value, described device packet for the maintenance of each monitor It includes:
Monitoring unit, when for becoming DOWN from UP in the network state for monitoring any monitor, by any monitor Stability metric value add up an increment;
Judging unit, for judging the stabilization of any monitor when determining any monitor is primary monitor Whether metric is more than or equal to the first given threshold;
Selecting unit, for selecting a stability metric value small from several backup monitors when determining result is to be It is monitored in the backup monitor for being equal to the second given threshold as primary monitor, and using any monitor as backup Device, wherein first given threshold is greater than second given threshold.
8. device as claimed in claim 7, which is characterized in that described device further includes measurement adjustment unit, described will appointed The stability metric value of one monitor adds up after an increment, and the measurement adjustment unit is used for:
Start the corresponding drop timer of any monitor, and presses preset attenuation function to institute within the current attenuation period The stability metric value for stating any monitor is decayed.
9. device as claimed in claim 8, which is characterized in that described to press preset attenuation function pair within the current attenuation period When the stability metric value of any monitor is decayed, the measurement adjustment unit is used for:
Within the current attenuation period, decayed with stability metric value of the specified attenuation coefficient to any monitor;
Wherein, the specified attenuation coefficient meets: in current attenuation end cycle, any monitor stablizes measurement Value is the half of the stability metric value of any monitor when the current attenuation period starts.
10. device as claimed in claim 8, which is characterized in that
Any monitor was in the current attenuation period, if the monitoring unit monitors the network of any monitor State becomes DOWN from UP, then the measurement adjustment unit by the stability metric value of any monitor add up an increment, And drop timer is restarted, so that any monitor enters next damped cycle;
At the end of the damped cycle of any monitor, the measurement adjustment unit restarts drop timer, so that It obtains any monitor and enters next damped cycle.
11. a kind of calculating equipment characterized by comprising
Memory, for storing program instruction;
Processor is executed according to the program instruction of acquisition as right is wanted for calling the program instruction stored in the memory It asks such as the step of method described in any one of claims 1 to 6.
12. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer can It executes instruction, the computer executable instructions are for making the computer execute the side as described in any one of claims 1 to 6 The step of method.
CN201811204207.5A 2018-10-16 2018-10-16 Management method and device for monitors in ceph cluster Active CN109495543B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811204207.5A CN109495543B (en) 2018-10-16 2018-10-16 Management method and device for monitors in ceph cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811204207.5A CN109495543B (en) 2018-10-16 2018-10-16 Management method and device for monitors in ceph cluster

Publications (2)

Publication Number Publication Date
CN109495543A true CN109495543A (en) 2019-03-19
CN109495543B CN109495543B (en) 2021-08-24

Family

ID=65690856

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811204207.5A Active CN109495543B (en) 2018-10-16 2018-10-16 Management method and device for monitors in ceph cluster

Country Status (1)

Country Link
CN (1) CN109495543B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111600742A (en) * 2020-04-02 2020-08-28 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Method and system for dynamically switching main monitor of distributed storage system
CN111756571A (en) * 2020-05-28 2020-10-09 苏州浪潮智能科技有限公司 Cluster node fault processing method, device, equipment and readable medium
CN112597243A (en) * 2020-12-22 2021-04-02 新华三大数据技术有限公司 Method and device for accelerating synchronous state in Ceph cluster
CN117743181A (en) * 2023-12-25 2024-03-22 杭州云掣科技有限公司 System for constructing observable control surface

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101106443A (en) * 2007-08-10 2008-01-16 中兴通讯股份有限公司 A system and method for controlling switch of primary and backup board
CN101132314A (en) * 2007-09-21 2008-02-27 中兴通讯股份有限公司 Method for implementing redundancy backup
CN102238364A (en) * 2010-04-22 2011-11-09 上海国际技贸联合有限公司 Method for redundancy of key equipment in rail transit television monitoring system
US20140298091A1 (en) * 2013-04-01 2014-10-02 Nebula, Inc. Fault Tolerance for a Distributed Computing System
CN105119754A (en) * 2015-09-08 2015-12-02 烽火通信科技股份有限公司 System and method for performing virtual master-to-slave shift to keep TCP connection
US20150363124A1 (en) * 2012-01-17 2015-12-17 Amazon Technologies, Inc. System and method for data replication using a single master failover protocol
US20170288948A1 (en) * 2016-03-30 2017-10-05 Juniper Networks, Inc. Failure handling for active-standby redundancy in evpn data center interconnect

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101106443A (en) * 2007-08-10 2008-01-16 中兴通讯股份有限公司 A system and method for controlling switch of primary and backup board
CN101132314A (en) * 2007-09-21 2008-02-27 中兴通讯股份有限公司 Method for implementing redundancy backup
CN102238364A (en) * 2010-04-22 2011-11-09 上海国际技贸联合有限公司 Method for redundancy of key equipment in rail transit television monitoring system
US20150363124A1 (en) * 2012-01-17 2015-12-17 Amazon Technologies, Inc. System and method for data replication using a single master failover protocol
US20140298091A1 (en) * 2013-04-01 2014-10-02 Nebula, Inc. Fault Tolerance for a Distributed Computing System
CN105119754A (en) * 2015-09-08 2015-12-02 烽火通信科技股份有限公司 System and method for performing virtual master-to-slave shift to keep TCP connection
US20170288948A1 (en) * 2016-03-30 2017-10-05 Juniper Networks, Inc. Failure handling for active-standby redundancy in evpn data center interconnect

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111600742A (en) * 2020-04-02 2020-08-28 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Method and system for dynamically switching main monitor of distributed storage system
CN111600742B (en) * 2020-04-02 2023-03-24 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Method and system for dynamically switching main monitor of distributed storage system
CN111756571A (en) * 2020-05-28 2020-10-09 苏州浪潮智能科技有限公司 Cluster node fault processing method, device, equipment and readable medium
WO2021238275A1 (en) * 2020-05-28 2021-12-02 苏州浪潮智能科技有限公司 Cluster node fault processing method and apparatus, and device and readable medium
CN111756571B (en) * 2020-05-28 2022-02-18 苏州浪潮智能科技有限公司 Cluster node fault processing method, device, equipment and readable medium
US11750437B2 (en) 2020-05-28 2023-09-05 Inspur Suzhou Intelligent Technology Co., Ltd. Cluster node fault processing method and apparatus, and device and readable medium
CN112597243A (en) * 2020-12-22 2021-04-02 新华三大数据技术有限公司 Method and device for accelerating synchronous state in Ceph cluster
CN112597243B (en) * 2020-12-22 2022-05-27 新华三大数据技术有限公司 Method and device for accelerating synchronous state in Ceph cluster
CN117743181A (en) * 2023-12-25 2024-03-22 杭州云掣科技有限公司 System for constructing observable control surface

Also Published As

Publication number Publication date
CN109495543B (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN109495543A (en) The management method and device of monitor in a kind of ceph cluster
CN111049705B (en) Method and device for monitoring distributed storage system
CN108268372B (en) Mock test processing method and device, storage medium and computer equipment
CN108810046A (en) A kind of method, apparatus and equipment of election leadership person Leader
US20120084788A1 (en) Complex event distributing apparatus, complex event distributing method, and complex event distributing program
CN106899681A (en) The method and server of a kind of information pushing
KR20190126406A (en) Method and apparatus for processing resource requests
CN105760230B (en) A kind of method and device of adjust automatically cloud host operation
CN109379238B (en) CTDB main node election method, device and system of distributed cluster
CN110048896B (en) Cluster data acquisition method, device and equipment
CN106445473A (en) Container deployment method and apparatus
CN109597800B (en) Log distribution method and device
CN109284229B (en) Dynamic adjustment method based on QPS and related equipment
CN106960060A (en) The management method and device of a kind of data-base cluster
CN110471749A (en) Task processing method, device, computer readable storage medium and computer equipment
CN106708623B (en) Object resource processing method, device and system
CN113918647A (en) Distributed database elastic expansion method, device, equipment and storage medium
CN107330061B (en) File deletion method and device based on distributed storage
CN106682198B (en) Method and device for realizing automatic database deployment
CN111130834B (en) Method and device for processing network elasticity strategy
CN114267440B (en) Medical order information processing method and device and computer readable storage medium
CN113542775B (en) Live broadcast keep-alive service system, live broadcast keep-alive management method, server and medium
CN115563160A (en) Data processing method, data processing device, computer equipment and computer readable storage medium
CN112162698B (en) Cache partition reconstruction method, device, equipment and readable storage medium
CN113742581A (en) List generation method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant