CN110109776A - A kind of Node Processing Method, device and electronic equipment - Google Patents

A kind of Node Processing Method, device and electronic equipment Download PDF

Info

Publication number
CN110109776A
CN110109776A CN201910426366.8A CN201910426366A CN110109776A CN 110109776 A CN110109776 A CN 110109776A CN 201910426366 A CN201910426366 A CN 201910426366A CN 110109776 A CN110109776 A CN 110109776A
Authority
CN
China
Prior art keywords
node
status information
current
remaining
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910426366.8A
Other languages
Chinese (zh)
Inventor
许广彬
吴业亮
谭瑞忠
刘馗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Huayun Data Technology Service Co Ltd
Original Assignee
Wuxi Huayun Data Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Huayun Data Technology Service Co Ltd filed Critical Wuxi Huayun Data Technology Service Co Ltd
Priority to CN201910426366.8A priority Critical patent/CN110109776A/en
Publication of CN110109776A publication Critical patent/CN110109776A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1441Resetting or repowering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer And Data Communications (AREA)

Abstract

The embodiment of the present application provides a kind of Node Processing Method, device and electronic equipment, first node in high-availability cluster is by the way that locally whether viewing the current node status information of second node can determine whether second node goes wrong, and then remotely control second node is restarted when something goes wrong in confirmation second node, to attempt to restore second node.Pass through above-mentioned realization process, the a certain node of high-availability cluster when something goes wrong, remaining node can quickly orient the node by the real-time acquisition situation of the node status information to the node and there is a problem, and then it can make a response rapidly, the node is remotely controlled to be restarted, this allows for high-availability cluster and is provided with the function of quickly being handled trouble node, the problem that can solving trouble node to a certain extent, guarantees the reliability of high-availability cluster and the validity to business processing.

Description

A kind of Node Processing Method, device and electronic equipment
Technical field
This application involves Clustering fields, in particular to a kind of Node Processing Method, device and electronic equipment.
Background technique
With the fast development of internet, the portfolio of user is continuously increased, to the reliability and performance requirement of business It is higher and higher.To meet the needs of users, often using HA (High Availability, height in actual application environment Can be with) cluster realizes the processing of business.In high-availability cluster, need harmoniously to guarantee cluster to industry between each node The validity of business processing.If some node goes wrong in cluster, the working performance of entire cluster is just influenced whether, therefore need Cluster is wanted to have the function of quickly handling trouble node, to guarantee the reliability of cluster and to business processing Validity.
Summary of the invention
The embodiment of the present application is designed to provide a kind of Node Processing Method, device and electronic equipment, and use is so that height Availability cluster has the function of quickly handling trouble node, to guarantee high-availability cluster to a certain extent Reliability and validity to business processing.
The embodiment of the present application provides a kind of Node Processing Method, is applied in high-availability cluster;The high-availability cluster Remaining node including first node and in addition to the first node, the Node Processing Method include: that first node obtains in real time The node status information for taking remaining described node current;The first node is not viewing the current node of second node locally When status information, determines whether the second node meets to preset and restart condition;The second node is in remaining described node Any node;When second node satisfaction is preset and restarts condition, the first node remotely controls the second node Restarted.
During above-mentioned realization, first node by obtaining the current node status information of remaining node in real time, in turn When local does not view the current node status information of second node, determines whether second node meets to preset and restarts condition, And then second node is remotely controlled when second node meets and presets and restart condition and is restarted.In this way, first node passes through this Whether ground, which views the current node status information of second node, can determine whether second node goes wrong, and then confirm Second node when something goes wrong remotely restarted by control second node, to attempt to restore second node.Pass through above-mentioned realization Process, a certain node of high-availability cluster when something goes wrong, remaining node can pass through the node status information to the node Obtain situation in real time quickly to orient the node and there is a problem, and then can make a response rapidly, remotely control the node into Row is restarted, this allows for high-availability cluster and is provided with the function of quickly being handled trouble node, can be to a certain extent The problem that solving trouble node, guarantees the reliability of high-availability cluster and the validity to business processing.
Further, the first node is not when viewing the current node status information of second node locally, It determines whether the second node meets default to restart condition to include: the first node work as locally not viewing second node When preceding node status information, obtains third node in the high-availability cluster and observe conditions to the second node;Institute It states third node and the second node when observing conditions not view, is determined by the second node meets to preset and restarts item Part;The third node is the node in remaining described node in addition to the second node;Otherwise, it determines the second node is not Meet to preset and restarts condition.
During above-mentioned realization, first node when not viewing the current node status information of second node locally, It further obtains third node in high-availability cluster to observe conditions to second node, second node is checked in third node Situation is also not view, and when also which node to have carried out processing to second node without in cluster, first node is just remote Second node described in process control is restarted.This allow for for second node whether be trouble node determination, need by collecting Node in addition to the second node confirms jointly in group, ensure that the confirmation accuracy to trouble node, avoids judging by accident.
Further, it is remotely controlled before the second node restarted in the first node, further includes: determine institute It states first node and does not receive processed notice to the second node;The processed notice is for notifying the first segment Point has remotely controlled the second node and has been restarted.
During above-mentioned realization, when there is no node to handle second node in high-availability cluster, first node Just remotely control second node is restarted, and avoiding problems nodes different in high-availability cluster to repeat to second node The case where processing, the resource of high-availability cluster is saved, while having also improved the processing validity to second node.
Further, the first node obtains the current node status information of remaining described node in real time further include: institute First node is stated to obtain the current node status information of remaining described node in real time and save to preset in the first node In memory database or distributed memory cluster.
During above-mentioned realization, realized by memory database or distributed memory cluster current to remaining node The storage of node status information.Which improves reading and writing data speed, so that can also have very in the case where loading higher situation Good reading and writing data performance, so that first node also can be very good to realize in the case where loading higher situation to getting Current node status information is checked, the probability of miscarriage of justice in the case where loading higher situation to trouble node is reduced.
Further, the Node Processing Method further include: the first node is by itself current node status information It is synchronized to remaining described node.
During above-mentioned realization, node also stores itself current section in memory database or distributed memory cluster Three-point state information, while itself current node status information is synchronized to remaining node, to ensure that in high-availability cluster The node status information of each node is able to real-time synchronization.
Further, the Node Processing Method further include: when the first node is connected to shutdown or instruction of restarting, institute First node is stated by the Data Migration in the preset memory database or distributed memory cluster into default disk, and Starting shutdown programm shuts down after Data Migration finishes;It is opened at the first node in the shutdown or restart in the state of After machine, the first node by the default disk by being migrated in the preset memory database or distributed memory cluster The Data Migration come returns in the preset memory database or distributed memory cluster.
It should be appreciated that in actual application, the data in memory have the risk of loss of data during shutdown.Upper During stating realization, by carrying out Data Migration automatically when receiving shutdown or instruction of restarting, by the Data Migration in memory Into disk, and the data adjourned in disk are moved back in memory automatically after powering, that is, effectively reduces the wind of loss of data Danger, ensure that the safety of data.
Further, it is remotely controlled before the second node restarted in the first node, further includes: described One node obtains the second node and restarts number in preset duration, restarts number greater than preset times threshold value described When, it remotely controls the second node and shuts down.
In actual application, node, which is likely due to situations such as hardware fault, causes node to go wrong, at this time The node can not be solved the problems, such as by restarting.During above-mentioned realization, if a node is in preset duration Restart number greater than preset times threshold value, i.e., it is believed that the problem of node be it is insurmountable by restarting, at this time by the section Point carries out shutdown processing, causes more to influence on high-availability cluster to avoid it.
Further, after the first node obtains the current node status information of remaining described node in real time, institute State method further include: in the first node not when being viewed locally the current node status information of remaining described node, institute First node is stated to be restarted.
During above-mentioned realization, if first node is being viewed locally the node state letter current less than remaining node Breath, to attempt to be restored, so that the problem that solving itself to a certain extent, guarantees the reliability of high-availability cluster With the validity to business processing.
Further, described not to be viewed locally the current node state letter of remaining described node in the first node When breath, it includes: not to be viewed locally remaining described section in continuous n seconds of the first node that the first node, which restart, When putting current node status information, the first node is restarted;The n be it is preset be greater than 0 constant.
It should be understood that may be caused in practical applications, between node due to the problems such as network of short duration data outage or Delay, so as to cause node in the status information for not getting remaining node sometime.During above-mentioned realization, first Node is by the continuous confirmation in n seconds continuous, to avoided to a certain extent since the problems such as network leads to of short duration number According to interrupting or postponing, causes node to judge the case where itself is trouble node by accident, improve to whether itself is that trouble node is sentenced Disconnected accuracy.
The embodiment of the present application also provides a kind of node processing devices, applied on the first node in high-availability cluster; The high-availability cluster includes first node and remaining node in addition to the first node;The node processing device includes: Data obtaining module, information view block, message processing module, remote control module;The data obtaining module is for real-time Obtain the current node status information of remaining described node;The information view block for real time inspection get it is described its The current node status information of remaining node;The message processing module is used to not view in the information view block locally When the current node status information of second node, determines whether the second node meets and default restart condition;Second section Point is any node in remaining described node;The remote control module, which is used to meet to preset in the second node, restarts item When part, remotely controls the second node and restarted.
In above-mentioned realization structure, whether first node views the current node status information of second node by local It can determine whether second node goes wrong, and then remotely control second node carries out when something goes wrong in confirmation second node Restart, to attempt to restore second node.By above-mentioned realization process, a certain node of high-availability cluster when something goes wrong, Remaining node can quickly orient the node by the real-time acquisition situation of the node status information to the node and there is a problem, And then can make a response rapidly, it remotely controls the node and is restarted, this allows for high-availability cluster and is provided with quickly to asking The problem that inscribing the function that node is handled, can solving trouble node to a certain extent, guarantees high-availability cluster Reliability and validity to business processing.
A kind of electronic equipment, including processor, memory and communication bus are additionally provided in the embodiment of the present application;It is described logical Believe bus for realizing the connection communication between processor and memory;The processor is for executing one stored in memory A or multiple programs, the step of to realize any one of the above Node Processing Method.
A kind of computer storage medium is additionally provided in the embodiment of the present application, the computer storage medium is stored with one Or multiple programs, one or more of programs can be executed by one or more processor, it is above-mentioned any one to realize The step of kind Node Processing Method.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application will make below to required in the embodiment of the present application Attached drawing is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore should not be seen Work is the restriction to range, for those of ordinary skill in the art, without creative efforts, can be with Other relevant attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of flow diagram of Node Processing Method provided by the embodiments of the present application;
Fig. 2 is a kind of system schematic of high-availability cluster provided by the embodiments of the present application in normal state;
Fig. 3 is a kind of system schematic of the high-availability cluster provided by the embodiments of the present application under fissure state;
Fig. 4 is a kind of structural schematic diagram of node processing device provided by the embodiments of the present application;
Fig. 5 is the structural schematic diagram of a kind of electronic equipment provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application is described.
Embodiment one
Fig. 1 is please referred to, Fig. 1 is a kind of flow diagram of Node Processing Method provided by the embodiments of the present application, comprising:
S101: first node obtains the current node status information of remaining node in real time;
It should be noted that Node Processing Method provided in the embodiment of the present application is applied to appointing in high-availability cluster On one node.In the embodiment of the present application, for ease of description, the node in high-availability cluster is divided into first node and except first Remaining node outside node.Hereafter Node Processing Method provided by the embodiment of the present application will be described from the angle of first node Implementation process.
It should also be noted that, in the embodiment of the present application, node status information, which refers to, can characterize node work at present shape The information of state, as node hardware use information (such as CPU (Central Processing Unit, central processing unit) use feelings Condition, memory service condition etc.), the network information (such as bandwidth usage) etc..
In the embodiment of the present application, each node in high-availability cluster can independently Real-time Notification itself be worked as in cluster Preceding node status information.In this way, i.e. so that the current node status information of each node in high-availability cluster is able to collecting Synchronous in group, first node can get the node status information that remaining node is current in high-availability cluster in real time.Especially It should be noted that, due to the node in first node and high-availability cluster, first node also can be independently in height in this mode The current node status information of Real-time Notification itself in availability cluster.
It should be understood that in actual application, it can also be by remaining node of first node real time access, Jin Ercong Remaining node grabs current node status information;Or it is obtained in real time to remaining node sending node status information from first node Request is taken, to obtain the current node status information that remaining node is returned based on node status information acquisition request.
S102: first node determines second node when local does not view the current node status information of second node Whether satisfaction is default restarts condition;
It should be noted that in the embodiment of the present application, second node can be any node in remaining node.
It is worth noting that, in the embodiment of the present application, first node can in local real time inspection its get it is each The current node status information of node.Illustratively, first node can be worked as by house dog program come each node of real time inspection Preceding node status information.
In a kind of embodiments possible of the embodiment of the present application, first node in local real time inspection its get it is each It, then can be with if it find that the current node status information of a certain node does not view when the current node status information of node Think that the node meets to preset and restarts condition.In this embodiment, present node status information cannot be by first node at this The node that ground views is second node, and second node can be any node in remaining node.It is default to restart condition i.e. The current node status information of second node is not being viewed locally for first node.
In another embodiments possible of the embodiment of the present application, it gets first node in local real time inspection When the current node status information of each node, if it find that the current node status information of a certain node does not view, then may be used It is observed conditions with further obtaining third node in high-availability cluster to the node, if third node checks the node Situation is also not view, then it is assumed that the node, which meets to preset, restarts condition;, whereas if the present node state of the node is believed Breath can be viewed locally by first node to or there are third nodes can view the current node status information of the node Situation, i.e., it is believed that the node, which is unsatisfactory for presetting, restarts condition.In this embodiment, present node status information cannot be by One node is second node in the node being viewed locally, and second node can be any node in remaining node, and the Three nodes are the node in remaining node in addition to second node.It is default to restart condition to be first node do not viewing the locally The current node status information of two nodes, and third node locally can not view the current node state of second node Information.
In above-mentioned embodiments possible, first node obtains third node in high-availability cluster and checks feelings to the node The mode of condition includes but is not limited to following methods:
Mode one: first node is not when local views the current node status information of second node, in High Availabitity collection Notice itself does not view the current node status information of second node in group, and third node is according to the corresponding detection of the notice itself The current node status information of the second node whether can be locally being viewed, and is returning and checks result.Illustratively, first segment Point is noticed in high-availability cluster when not viewing the current node status information of second node locally and itself is not viewed the Two nodes current node status information includes the identification information of first node and the identification information of second node in notice;The Whether three nodes can view the second node from local according to the corresponding detection of identification information of second node in the notice Current node status information, and check result to first node according to the return of the identification information of first node.
Mode two: first node is not when local views the current node status information of second node, in High Availabitity collection Notice itself does not view the current node status information of second node in group, while obtaining third node in high-availability cluster and sending out The notice for not viewing the current node status information of the second node of cloth.I.e. in this mode, each node is not viewing When the current node status information of a certain node, it can be noticed in cluster in real time.If third section has not been obtained in first node The notice for not viewing the current node status information of the second node of point publication, that is, show that the third node can view this The current node status information of second node.
It should be noted that third node may have it is multiple, can in a kind of embodiments possible of the embodiment of the present application When not viewing the current node status information of the second node to be set in all third nodes, second node is just determined It is unsatisfactory for presetting and restarts condition.In another embodiments possible of the embodiment of the present application, it also may be set in that (m is there are m It is preset be more than or equal to 1 integer) when not viewing the current node status information of the second node, determining should for a third node Second node, which is unsatisfactory for presetting, restarts condition.
Optionally, in the embodiment of the present application can also directly according to recorded in the current node status information of each node in Hold to determine whether second node meets to preset and restart condition.Illustratively, the current node status information of second node not When meeting default use condition, determines that the second node meets to preset and restart condition.For example, including in node status information When CPU usage and memory usage, if the current CPU usage of second node is greater than preset CPU usage threshold value, and/ Or second node current memory usage is when being greater than preset memory usage, determines that the second node meets and default restarts item Part.
In the embodiment of the present application, first node, can after obtaining the current node status information of remaining node in real time To save the current node status information of remaining node to memory database preset in first node (such as: redis) or In distributed memory cluster.The characteristic for the high read or write speed having in this way using memory database or distributed memory cluster, i.e., It can guarantee that first node also can be very good to realize the current node status information to getting in the case where loading higher situation Check, reduce the probability of miscarriage of justice for whether needing to restart to second node in the case where loading higher situation.
It is to be appreciated that in the embodiment of the present application, first node can be real by itself current node status information In Shi Baocun to first node in preset memory database or distributed memory cluster, and itself current node state is believed Breath is synchronized to remaining node.Similarly, remaining node can also execute the synchronizing process, so that each node in high-availability cluster Between realize node status information real-time synchronization.
It is also to be appreciated that in the embodiment of the present application, node status information can be synchronized to memory database or In distributed memory cluster, and business datum is synchronized in the disk of each node to save memory database or distribution The resource of formula main memory cluster.But it is also possible to which node status information and business datum are synchronized to memory database or distribution In formula main memory cluster, to improve the overall performance of high-availability cluster.
It should know, for most of memories, the data after the shutdown of place equipment in memory may lose It loses.To solve this problem, in the embodiment of the present application, when being connected to shutdown or instruction of restarting, node will be preset node automatically Memory database or distributed memory cluster in Data Migration start into default disk, and after Data Migration finishes Shutdown programm shuts down;Node booting after, then automatically by default disk by preset memory database or distribution It deposits the Data Migration for migrating to come in cluster to return in preset memory database or distributed memory cluster, effectively reduce in this way The risk of loss of data ensure that the safety of data.It should be noted that when carrying out Data Migration, it can be to migration Data are marked, if position mark (migrating into the specified region of disk), data markers are (in the specific word of the data of migration Mark value is added in section) etc., it will be by being migrated in preset memory database or distributed memory cluster after being switched in order to node The Data Migration come returns in preset memory database or distributed memory cluster.
S103: when second node satisfaction is preset and restarts condition, first node remotely controls second node and is restarted.
It is worth noting that, in the embodiment of the present application, remotely controlled before second node restarted in first node, It can first determine the processed notice itself whether having received to the second node.It is to be appreciated that the embodiment of the present application In it is processed notice in cluster notify first node remotely control second node restarted.It is received in first node When having arrived the processed notice to the second node, that is, show that existing node restarts the second node, at this time no longer Execute the operation that long-range control second node is restarted.On the contrary, if first node does not receive the place to the second node Reason notice, that is, show temporarily does not have node remotely to control the second node at present in high-availability cluster is restarted, at this time first segment Point is that remote controlled second node is restarted.It is default for second node discovery complete fulfillment in this way when restarting condition, only It can execute and once restart, avoid the generation for the case where multiple node Repetitive controller second nodes are restarted in cluster, save Cluster resource.
It should be noted that in actual application, it can for the problems such as node system failure, deadlock, cluster fissure It is solved in a manner of by restarting trouble node, but the case where for causing node to go wrong due to hardware fault etc., The node can not be solved the problems, such as by restarting at this time.Therefore in the embodiment of the present application, first node is remotely controlling Before second node is restarted, it can also first obtain second node and restart number in preset duration, it is big restarting number When preset times threshold value, the long-range second node that controls shuts down.It can be issued after being shut down, in cluster for The warning message of two nodes gives node administration personnel, and in order to node administration, personnel overhaul second node.
In the embodiment of the present application, after remotely control second node is restarted, it can issue and locate in cluster Reason notifies, may include processing time, second node mark etc. in processed notice.It is logical according to history when being counted in this way Processing time and second node mark in announcement can determine that second node restarts number in preset duration.
It should be noted that being equipped with two sets of independently operated systems, a set of business on each node of high-availability cluster Processing system, for carrying out data synchronization and a set of physical power source management system in business processing and cluster, for controlling Node Switch machine.First node can be realized by the physical power source management system of house dog program far call second node Second node restarts or shuts down.
In a kind of embodiments possible of the embodiment of the present application, do not work as being viewed locally remaining node in first node When preceding node status information, i.e. the current node status information of remaining node, first node is had not been obtained in first node at this time It is considered that itself breaks down, and then first node can voluntarily carry out reboot operation.
It should be understood that in practical applications, of short duration data outage may be led to due to the problems such as network between node Or delay, so as to cause node in the node status information for not getting remaining node sometime.Therefore in order to guarantee the One node to the judgement accuracy itself whether to break down, can continuous n (n be it is preset be greater than 0 constant) in the second not The disconnected node status information for attempting to obtain remaining node, if interior do not worked as being viewed locally remaining node always at continuous n second Preceding node status information, then the problems such as restart to first node, can avoid to a certain extent due to network in this way Lead to of short duration data outage or delay, first node is caused to judge the case where itself is trouble node by accident, improves first node pair Itself whether be trouble node judgement accuracy.
In conclusion Node Processing Method provided by the embodiments of the present application, first node by obtaining remaining node in real time Current node status information, and then when local does not view the current node status information of second node, determine the second section Whether point, which meets to preset, restarts condition, and then remotely controls second node when second node meets and presets and restart condition and carry out weight It opens.In this way, first node is by the way that locally whether viewing the current node status information of second node can determine second node Whether go wrong, and then remotely control second node is restarted when something goes wrong in confirmation second node, to attempt the Two nodes restore.By above-mentioned realization process, a certain node of high-availability cluster when something goes wrong, remaining node can be by right The real-time acquisition situation of the node status information of the node, which quickly orients the node, there is a problem, and then can make rapidly Reaction, remotely controls the node and is restarted, this allows for high-availability cluster and be provided with quickly to handle trouble node Function, guarantees the reliability of high-availability cluster and to business at the problem that can solving trouble node to a certain extent The validity of processing.In addition, in the embodiment of the present application, due to it is synchronous be node status information, data volume is compared with business number According to etc. types data for it is much smaller, this allows for the synchronizing speed of node status information quickly, so that high-availability cluster Interior each node, which is obtained and checked in real time for node status information, can become rapidly, this is just further such that High Availabitity Cluster is determining for trouble node and processing is all able to become more efficient and rapid.
Embodiment two:
The present embodiment on the basis of example 1, for the treatment process by a kind of pair of cluster fissure the case where, to this Further illustration is done in application.
Shown in Figure 2, in normal state, each node is real-time by current node status information for high-availability cluster It is stored on memory 12, and node status information is mutually synchronized by node 1-3 by memory simultaneous techniques.Guarding the gate on node Dog 11 can be to seeing the current node status information of node 1-3 on the memory of place node simultaneously.
Shown in Figure 3, it is current to can only see node 2 and 3 on the memory 12 of node 2 for the house dog 11 of node 2 at this time Node status information, can't see the current node status information of node 1.Similarly, the house dog 11 on node 3 is in node 3 Also it can only see the current node status information of node 2 and 3 on memory 12, can't see the current node status information of node 1.By Node status information can be normally obtained in node 2 and node 3, the node status information of node 1 is obtained less than therefore can be with only Judge that cluster fissure, node 1 are trouble node.The house dog 11 on node 2 or node 3 can direct far call node at this time 1 physical power source management system restarts node 1, to solve fissure.
By above-mentioned implementation procedure, the fissure in high-availability cluster can be effectively solved the problems, such as.Furthermore by memory come real When store each node of high-availability cluster node status information so that can also realize well to problem in high-load situations, this The determination of node reduces the probability of miscarriage of justice in the case where loading higher situation to trouble node.
Embodiment three
Referring to Fig. 4, Fig. 4 is shown using the one-to-one node processing device of Node Processing Method shown in FIG. 1, answer Understanding, the device 400 is corresponding with the embodiment of the method for above-mentioned Fig. 1, it is able to carry out each step that above method embodiment is related to, The specific function of device 400 may refer to it is described above, it is appropriate herein to omit detailed description to avoid repeating.Device 400 include that at least one can be stored in memory or be solidificated in device 400 in the form of software or firmware (firmware) Software function module in operating system (operating system, OS).Specifically, which is applied to High Availabitity collection (high-availability cluster includes first node and remaining node in addition to first node, and first node is height on first node in group Any node in availability cluster), comprising: data obtaining module 401, message processing module 402, remote control module 403, letter Breath checks module 404.Wherein:
Data obtaining module 401 for obtaining the current node status information of remaining node in real time;
The current node status information of remaining node that information view block 404 is got for real time inspection;
Message processing module 402 is not viewing the current node status information of second node locally for first node When, it determines whether second node meets to preset and restarts condition;
In the embodiment of the present application, second node is any node in remaining node.
Remote control module 403 is used for when second node satisfaction is preset and restarts condition, and the long-range second node that controls carries out Restart.
In a kind of embodiments possible of the embodiment of the present application, data obtaining module 401 is also used in information inspection mould Block 404 obtains in high-availability cluster third node to the when not viewing the current node status information of second node locally The node status information of two nodes observes conditions;Message processing module 402 is specifically used in information view block 404 in local The current node status information of second node is not viewed, and third node observing conditions not view to second node When, it determines that second node meets to preset and restarts condition;Otherwise it determines that second node is unsatisfactory for presetting and restarts condition.
In the embodiment of the present application, third node is the node in remaining node in addition to second node.
In the embodiment of the present application, message processing module 402 is also used to, described in remote control module 403 remotely control Before second node is restarted, determine that first node does not receive the processed notice to second node.Implement in the application In example, processed notice is for notifying first node has remotely controlled the second node to be restarted.
In the embodiment of the present application, when data obtaining module 401 obtains the current node status information of remaining node in real time, The current node status information of remaining node can be saved to memory database preset in first node or distributed memory In cluster.
In the embodiment of the present application, message processing module 402 can be synchronous with the current node status information of first node Give remaining node.The current node status information of first node can additionally be saved in real time to default in the first node Memory database or distributed memory cluster in.
In the embodiment of the present application, data obtaining module 401 is also used to receive shutdown or instruction of restarting;Message processing module 402 are also used to when data obtaining module 401 is connected to shutdown or instruction of restarting, by the memory database or distributed memory Data Migration in cluster starts shutdown programm after Data Migration finishes and shuts down into default disk;And One node in the shutdown or in the state of restart after booting, by default disk by preset memory database or distribution The Data Migration for migrating to come in cluster is deposited to return in preset memory database or distributed memory cluster.
In the embodiment of the present application, before remote control module 403 remotely restarted by control second node, information is obtained Modulus block 401 can also obtain second node and restart number in preset duration, restart number greater than preset times threshold value When, remote control module 403 can also remotely control second node and shut down.
In the embodiment of the present application, the current node status information of remaining node is obtained in real time in data obtaining module 401 Later, in information view block 404 not when being viewed locally the current node status information of remaining node, message processing module 402 control first nodes are restarted.
In a kind of embodiments possible of the embodiment of the present application, it can be set that (n is in the continuous n of information view block 404 It is preset be greater than 0 constant) in the second not when being viewed locally the current node status information of remaining node, message processing module 402 control first nodes are restarted.
It should be noted that information view block 404 and remote control module 403 can pass through in the embodiment of the present application House dog program is realized.
Node processing device provided by the embodiments of the present application, the first node node status information current by second node And it is default restart condition i.e. and can determine whether second node goes wrong, and then confirming that second node is long-range when something goes wrong Control second node is restarted, to attempt to restore second node.By above-mentioned realization process, high-availability cluster is provided with The function of quickly being handled trouble node, the problem that can solving trouble node to a certain extent, guarantee high The reliability of availability cluster and validity to business processing.
Example IV
A kind of electronic equipment applied in high-availability cluster is present embodiments provided, it is shown in Figure 5 comprising processing Device 501, memory 502 and communication bus 503.Wherein:
Communication bus 503 is for realizing the connection communication between processor 501 and memory 502.
Processor 501 is for executing the one or more programs stored in memory 502, to realize above-described embodiment one And/or each step of two interior joint processing method of embodiment.
It should be noted that electronic equipment can be the equipment such as server, computer in the embodiment of the present application.
It is appreciated that structure shown in fig. 5 is only to illustrate, electronic equipment may also include than shown in Fig. 5 more or more Few component, or with the configuration different from shown in Fig. 5.
Present embodiments provide a kind of computer readable storage medium, as floppy disk, CD, hard disk, flash memory, USB flash disk, CF card, SD card, mmc card etc. are stored with one or more journey for realizing above-mentioned each step in the computer readable storage medium Sequence, this one or more program can be executed by one or more processor, to realize at above-mentioned first embodiment interior joint Each step of reason method.Details are not described herein.
In several embodiments provided herein, it should be understood that disclosed device and method can also pass through Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, flow chart and block diagram in attached drawing Show the device of multiple embodiments according to the application, the architectural framework in the cards of method and computer program product, Function and operation.In this regard, each box in flowchart or block diagram can represent the one of a module, section or code Part, a part of the module, section or code, which includes that one or more is for implementing the specified logical function, to be held Row instruction.It should also be noted that function marked in the box can also be to be different from some implementations as replacement The sequence marked in attached drawing occurs.For example, two continuous boxes can actually be basically executed in parallel, they are sometimes It can execute in the opposite order, this depends on the function involved.It is also noted that every in block diagram and or flow chart The combination of box in a box and block diagram and or flow chart can use the dedicated base for executing defined function or movement It realizes, or can realize using a combination of dedicated hardware and computer instructions in the system of hardware.
In addition, each functional module in each embodiment of the application can integrate one independent portion of formation together Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module It is stored in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) execute each embodiment the method for the application all or part of the steps. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
The above description is only an example of the present application, the protection scope being not intended to limit this application, for ability For the technical staff in domain, various changes and changes are possible in this application.Within the spirit and principles of this application, made Any modification, equivalent substitution, improvement and etc. should be included within the scope of protection of this application.It should also be noted that similar label and Letter indicates similar terms in following attached drawing, therefore, once it is defined in a certain Xiang Yi attached drawing, then in subsequent attached drawing In do not need that it is further defined and explained.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

Claims (11)

1. a kind of Node Processing Method, which is characterized in that be applied in high-availability cluster;The high-availability cluster includes first segment Point and remaining node in addition to the first node, the Node Processing Method include:
First node obtains the current node status information of remaining described node in real time;
The first node determines that the second node is when local does not view the current node status information of second node No satisfaction is default to restart condition;The second node is any node in remaining described node;
When second node satisfaction is preset and restarts condition, the first node remotely controls the second node and carries out weight It opens.
2. Node Processing Method as described in claim 1, which is characterized in that the first node is not viewing second locally When the current node status information of node, determines whether the second node meets and default restart condition and include:
The first node obtains the high-availability cluster when local does not view the current node status information of second node Middle third node observes conditions to the node status information of the second node;In the third node to the second node When observing conditions not view, determine that the second node meets and default restart condition;The third node be it is described its Node in remaining node in addition to the second node;
Otherwise, it determines the second node, which is unsatisfactory for presetting, restarts condition.
3. Node Processing Method as described in claim 1, which is characterized in that remotely control described second in the first node Before node is restarted, further includes:
Determine that the first node does not receive the processed notice to the second node;The processed notice is for notifying The first node has remotely controlled the second node and has been restarted.
4. Node Processing Method as described in claim 1, which is characterized in that the first node obtains remaining described section in real time Putting current node status information includes:
The first node obtains the current node status information of remaining described node in real time and saves to the first node In preset memory database or distributed memory cluster.
5. Node Processing Method as claimed in claim 4, which is characterized in that the method also includes:
Itself current node status information is synchronized to remaining described node by the first node.
6. Node Processing Method as described in claim 4 or 5, which is characterized in that the method also includes:
When the first node is connected to shutdown or instruction of restarting, the first node is by the preset memory database or divides Data Migration in cloth main memory cluster starts shutdown programm after Data Migration finishes and is closed into default disk Machine;
In the first node in the shutdown or in the state of restart after booting, the first node will be in the default disk By migrated in the preset memory database or distributed memory cluster Lai Data Migration return the preset internal storage data In library or distributed memory cluster.
7. Node Processing Method as described in any one in claim 1-5, which is characterized in that remotely controlled in the first node Before the second node is restarted, further includes:
The first node obtains the second node and restarts number in preset duration, restarts number greater than default described When frequency threshold value, remotely controls the second node and shut down.
8. Node Processing Method as described in any one in claim 1-5, which is characterized in that obtained in real time in the first node After the current node status information of remaining described node, the method also includes:
In the first node not when being viewed locally the current node status information of remaining described node, the first node Restarted.
9. Node Processing Method as claimed in claim 8, which is characterized in that described not to be viewed locally in the first node When to the current node status information of remaining described node, the first node restart include:
In continuous n seconds of the first node not when being viewed locally the current node status information of remaining described node, institute First node is stated to be restarted;The n be it is preset be greater than 0 constant.
10. a kind of node processing device, which is characterized in that applied on the first node in high-availability cluster;The High Availabitity Cluster includes first node and remaining node in addition to the first node;The node processing device includes: acquisition of information mould Block, information view block, message processing module, remote control module;
The data obtaining module for obtaining the current node status information of remaining described node in real time;
The current node status information of remaining described node that the information view block is got for real time inspection;
The message processing module is used to not view the current node shape of second node locally in the information view block When state information, determines whether the second node meets to preset and restart condition;The second node is in remaining described node Any node;
The remote control module is used to remotely control the second node when second node satisfaction is preset and restarts condition Restarted.
11. a kind of electronic equipment, which is characterized in that including processor, memory and communication bus;
The communication bus is for realizing the connection communication between processor and memory;
The processor is for executing one or more program stored in memory, to realize as appointed in claim 1 to 9 The step of Node Processing Method described in one.
CN201910426366.8A 2019-05-21 2019-05-21 A kind of Node Processing Method, device and electronic equipment Pending CN110109776A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910426366.8A CN110109776A (en) 2019-05-21 2019-05-21 A kind of Node Processing Method, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910426366.8A CN110109776A (en) 2019-05-21 2019-05-21 A kind of Node Processing Method, device and electronic equipment

Publications (1)

Publication Number Publication Date
CN110109776A true CN110109776A (en) 2019-08-09

Family

ID=67491537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910426366.8A Pending CN110109776A (en) 2019-05-21 2019-05-21 A kind of Node Processing Method, device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110109776A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416641A (en) * 2020-11-24 2021-02-26 中国工商银行股份有限公司 Controlled end node restart detection method in master-slave architecture and master control end node
CN112905352A (en) * 2021-01-29 2021-06-04 北京深演智能科技股份有限公司 Method and device for processing node deadlock

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120254652A1 (en) * 2011-03-31 2012-10-04 Microsoft Corporation Fault detection and recovery as a service
CN103152434A (en) * 2013-03-27 2013-06-12 江苏辰云信息科技有限公司 Leader node replacing method of distributed cloud system
CN103533567A (en) * 2012-09-29 2014-01-22 深圳市友讯达科技发展有限公司 Determination method of standby node, and nodes
CN108600040A (en) * 2018-03-16 2018-09-28 国电南瑞科技股份有限公司 A kind of distributed system node failure detection method based on High Availabitity detection node
CN109144789A (en) * 2018-09-10 2019-01-04 网宿科技股份有限公司 A kind of method, apparatus and system for restarting OSD
CN109471770A (en) * 2018-09-11 2019-03-15 华为技术有限公司 A kind of method for managing system and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120254652A1 (en) * 2011-03-31 2012-10-04 Microsoft Corporation Fault detection and recovery as a service
CN103533567A (en) * 2012-09-29 2014-01-22 深圳市友讯达科技发展有限公司 Determination method of standby node, and nodes
CN103152434A (en) * 2013-03-27 2013-06-12 江苏辰云信息科技有限公司 Leader node replacing method of distributed cloud system
CN108600040A (en) * 2018-03-16 2018-09-28 国电南瑞科技股份有限公司 A kind of distributed system node failure detection method based on High Availabitity detection node
CN109144789A (en) * 2018-09-10 2019-01-04 网宿科技股份有限公司 A kind of method, apparatus and system for restarting OSD
CN109471770A (en) * 2018-09-11 2019-03-15 华为技术有限公司 A kind of method for managing system and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416641A (en) * 2020-11-24 2021-02-26 中国工商银行股份有限公司 Controlled end node restart detection method in master-slave architecture and master control end node
CN112416641B (en) * 2020-11-24 2023-09-22 中国工商银行股份有限公司 Method for detecting restarting of controlled end node in master-slave architecture and master control end node
CN112905352A (en) * 2021-01-29 2021-06-04 北京深演智能科技股份有限公司 Method and device for processing node deadlock

Similar Documents

Publication Publication Date Title
CN105357038B (en) Monitor the method and system of cluster virtual machine
US8171142B2 (en) Data center inventory management using smart racks
CN105099783B (en) A kind of method and system for realizing operation system alarm emergency disposal automation
US20170083535A1 (en) Managing sequential data store
CN101137984B (en) Systems, methods, and software for distributed loading of databases
CN101010669A (en) Techniques for health monitoring and control of application servers
CN102200941A (en) Method and unit for monitoring process state
CN107660289A (en) Automatic network controls
CN103425645A (en) Monitoring system and monitoring method for single point of failure of database cluster
CN110109776A (en) A kind of Node Processing Method, device and electronic equipment
CN203849768U (en) Graphical inspection system
CN103259688A (en) Failure diagnosis method and device of distributed storage system
WO2020015116A1 (en) Database monitoring method and terminal device
CN113422692A (en) Method, device and storage medium for detecting and processing node faults in K8s cluster
CN107153595A (en) The fault detection method and its system of distributed data base system
CN109922070A (en) A kind of automatic reply method and device
CN107729213B (en) Background task monitoring method and device
CN105760251B (en) A kind of method and apparatus of Backup Data
CN110502399A (en) Fault detection method and device
CN110018932A (en) A kind of monitoring method and device of container disk
CN113094224B (en) Server asset management method and device, computer equipment and storage medium
CN101719853A (en) Processing method for detecting operation condition of server
CN110209497A (en) A kind of method and system of the scalable appearance of host resource dynamic
CN112650532B (en) Method, system and medium for modifying source code file under multi-person mode
CN114116288A (en) Fault processing method, device and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190809

RJ01 Rejection of invention patent application after publication