CN108964977B - Node exception handling method and system, storage medium and electronic device - Google Patents

Node exception handling method and system, storage medium and electronic device Download PDF

Info

Publication number
CN108964977B
CN108964977B CN201810577770.0A CN201810577770A CN108964977B CN 108964977 B CN108964977 B CN 108964977B CN 201810577770 A CN201810577770 A CN 201810577770A CN 108964977 B CN108964977 B CN 108964977B
Authority
CN
China
Prior art keywords
node
service
gateway
exception handling
heartbeat file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810577770.0A
Other languages
Chinese (zh)
Other versions
CN108964977A (en
Inventor
梁海安
李耀宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810577770.0A priority Critical patent/CN108964977B/en
Priority to PCT/CN2018/101021 priority patent/WO2019232931A1/en
Publication of CN108964977A publication Critical patent/CN108964977A/en
Application granted granted Critical
Publication of CN108964977B publication Critical patent/CN108964977B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route

Abstract

The invention discloses a node exception handling method and system, a storage medium and electronic equipment, which are applied to an infiluxdb cluster and relate to the technical field of data storage. The node exception handling method comprises the following steps: the first node determines the connection state of the second node and the gateway; wherein the first node and the second node share a virtual IP; if the connection between the second node and the gateway fails, the first node writes second node abnormal information into a heartbeat file and ensures that the second node stops service; after ensuring that the second node is out of service, the first node processes a service request for the second node based on the virtual IP. The method and the device can realize timely processing of the service request when the node is abnormal.

Description

Node exception handling method and system, storage medium and electronic device
Technical Field
The present disclosure relates to the field of data storage technologies, and in particular, to a node exception handling method, a node exception handling system, a storage medium, and an electronic device.
Background
The time sequence database is more and more favored by developers due to the advantages of ordered time, intuition, convenience in distinguishing and the like. As one of the timing databases which are currently most concerned, infiluxdb has been widely applied to a plurality of application scenarios such as log monitoring.
Due to the network, the node hardware equipment and the like, the infiluxdb node may be abnormal (e.g., down). At present, when an abnormal condition occurs, the nodes need to be restarted manually. On one hand, the processing process is complicated due to human participation, and the workload of operators is increased; on the other hand, due to manual restart, a certain time is required, when a service request comes, the service request cannot be processed in time, and in addition, under the condition that hardware equipment is damaged, equipment may need to be replaced, so that longer time is consumed, and the timeliness for processing the service request is greatly reduced.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure aims to provide a node exception handling method, a node exception handling system, a storage medium and an electronic device, so as to overcome, at least to a certain extent, the problems that an exception handling process is complicated and cannot be handled in time due to the need of manually restarting a node.
According to an aspect of the present disclosure, a node exception handling method is provided, which is applied to an influxdb cluster, and includes: the first node determines the connection state of the second node and the gateway; wherein the first node and the second node share a virtual IP; if the connection between the second node and the gateway fails, the first node writes second node abnormal information into a heartbeat file and ensures that the second node stops service;
after ensuring that the second node is out of service, the first node processes a service request for the second node based on the virtual IP.
In an exemplary embodiment of the present disclosure, writing, by the first node, second node exception information into the heartbeat file and ensuring that the second node is out of service includes: writing second node abnormal information into the heartbeat file by the first node; the second node inquires that the heartbeat file contains the second node abnormal information, responds to the second node abnormal information to stop service and marks the service state of the second node as stop in the heartbeat file; the first node inquires from the heartbeat file that the service state of the second node is marked as stopped so as to ensure that the second node stops service.
In an exemplary embodiment of the present disclosure, writing, by the first node, second node exception information into the heartbeat file and ensuring that the second node is out of service includes: writing second node abnormal information into the heartbeat file by the first node; and if the first node inquires that the service state of the second node in the heartbeat file is not marked to be stopped, the first node sends a service stopping instruction to the second node to ensure that the second node stops service.
In an exemplary embodiment of the present disclosure, before the first node determines the connection status of the second node and the gateway, the node exception handling method further includes: the first node judges whether an event of updating the heartbeat file by the second node is normal or not; and if not, the first node determines the connection state of the second node and the gateway.
In an exemplary embodiment of the present disclosure, the abnormality in the event that the second node updates the heartbeat file includes: in a first preset time, the second node does not update the heartbeat file; or within second preset time, the time intervals of the second node updating the heartbeat file are all larger than a time threshold.
In an exemplary embodiment of the present disclosure, the node exception handling method further includes: monitoring the connection state of the first node and the gateway through a ping operation of the gateway when the second node fails to connect with the gateway.
In an exemplary embodiment of the present disclosure, the node exception handling method further includes: when the connection between the second node and the gateway fails, the first node acquires a lock file to prevent other nodes under the gateway from processing the service request aiming at the second node in parallel.
According to one aspect of the present disclosure, a node exception handling system is provided, which is applied to an influxdb cluster and includes a gateway, a first node and a second node sharing a virtual IP, where: the first node is used for determining the connection state of the second node and the gateway; if the connection between the second node and the gateway fails, writing second node abnormal information into the heartbeat file and ensuring that the second node stops service; processing a service request for the second node based on the virtual IP after ensuring that the second node is out of service.
In an exemplary embodiment of the present disclosure, writing, by the first node, second node exception information into the heartbeat file and ensuring that the second node is out of service includes: writing second node abnormal information into the heartbeat file by the first node; the second node inquires that the heartbeat file contains the second node abnormal information, responds to the second node abnormal information to stop service and marks the service state of the second node as stop in the heartbeat file; the first node inquires from the heartbeat file that the service state of the second node is marked as stopped so as to ensure that the second node stops service.
In an exemplary embodiment of the present disclosure, writing, by the first node, second node exception information into the heartbeat file and ensuring that the second node is out of service includes: writing second node abnormal information into the heartbeat file by the first node; and if the first node inquires that the service state of the second node in the heartbeat file is not marked to be stopped, the first node sends a service stopping instruction to the second node to ensure that the second node stops service.
In an exemplary embodiment of the disclosure, the first node is further configured to determine whether the time for updating the heartbeat file by the second node is normal, and if not, the first node determines a connection status between the second node and the gateway.
In an exemplary embodiment of the present disclosure, the abnormality in the event that the second node updates the heartbeat file includes: in a first preset time, the second node does not update the heartbeat file; or within second preset time, the time intervals of the second node updating the heartbeat file are all larger than a time threshold.
In an exemplary embodiment of the present disclosure, the gateway performs a ping operation to monitor a connection status of the first node with the gateway.
In one exemplary embodiment of the disclosure, when the second node fails to connect with the gateway, the first node acquires the lock file to prevent the rest of nodes under the gateway from processing the service request aiming at the second node in parallel.
According to an aspect of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the node exception handling method of any one of the above.
According to an aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform any of the above-described node exception handling methods via execution of the executable instructions.
In the technical solutions provided by some embodiments of the present disclosure, when the second node fails to connect with the gateway, the first node under the gateway, which shares the virtual IP with the second node, may process the service request for the second node instead of the second node. On one hand, when the node is abnormal, manual processing is not needed, and the workload of operators is reduced; on the other hand, the first node sharing the virtual IP replaces an abnormal second node to process the service request, the switching speed is high, and the requirement of service timeliness is met.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:
FIG. 1 schematically illustrates a flow chart of a method of node exception handling according to an exemplary embodiment of the present disclosure;
FIG. 2 schematically shows a flow chart of a process before a first node processes a service request for a second node, according to an example embodiment of the present disclosure;
FIG. 3 schematically illustrates an architecture diagram of a node exception handling system according to an exemplary embodiment of the present disclosure;
FIG. 4 shows a schematic diagram of a storage medium according to an example embodiment of the present disclosure; and
fig. 5 schematically shows a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Fig. 1 schematically shows a flowchart of a node exception handling method of an exemplary embodiment of the present disclosure, which is applied to an inflixdb cluster. Referring to fig. 1, the node exception handling method may include the steps of:
s12, the first node determines the connection state of the second node and the gateway; wherein the first node and the second node share a virtual IP.
The first node and the second node can be two nodes in an infiluxdb cluster and are used for realizing data reading and writing services. In addition, the first node and the second node may be configured under a gateway for communication with the rest of the nodes in the cluster or external systems (e.g., data generation system, data consumption system, etc.). In addition, it should be understood that the terms "first" and "second" in this disclosure are only used for distinguishing different nodes, and do not have other limiting effects on the nodes.
In one embodiment of the present disclosure, the first node and the second node may be understood as two hosts, wherein the second node has an inflixdb database configured thereon.
The first node and the second node share the virtual IP, specifically, the process of sharing the virtual IP may be configured through an ifconfig command, however, the first node and the second node may also share the virtual IP by using an existing configuration method, which is not limited by the present disclosure.
Under normal conditions, the second node may continuously provide the read-write service, and the first node may be in a standby state or the first node may execute other services in the cluster.
The first node may determine a connection status of the second node with the gateway. That is, the first node may determine whether the second node is communicating properly with the gateway or whether the second node is disconnected from the gateway. Specifically, the first node may initiate a connection request to the second node by means of TCP/IP in combination with port information. If the connection fails, the first node may re-connect, e.g., may re-attempt the connection 5 times, each time with an interval of 10 s. If the connection still fails, it may be determined that the second node fails to connect with the gateway, i.e., the second node is abnormal. It is easily understood that if the first node receives the feedback information sent by the second node in response to the connection request, the second node may be considered normal.
S14, if the connection between the second node and the gateway fails, the first node writes second node abnormal information into the heartbeat file and ensures that the second node stops service.
In the example where there is only a first node and a second node under the gateway, it is necessary to keep at least one of the first node and the second node normal. When it is determined in step S12 that the connection between the second node and the gateway has failed, that is, the second node is abnormal, the connection status between the first node and the gateway can be monitored through a ping operation of the gateway. Specifically, when the second node is abnormal, the cluster system may add a ping operation to the gateway to monitor whether the first node is normal. If the first node is found to be abnormal, the cluster system can send out alarm information to operators so as to enter a manual processing stage.
When the gateway is configured with other nodes besides the first node and the second node, if the second node fails to be connected with the gateway, the first node may acquire the lock file to prevent the other nodes from processing the service request for the second node in parallel. The lock File may be located in a shared storage System, which may be GPFS (General Parallel File System). However, the shared storage system of the present disclosure may also be a distributed file system such as PVFS, Lustre, PanFS, google fs, which is not particularly limited in the present exemplary embodiment.
It should be appreciated that in this example, the first node among the nodes under the gateway other than the second node is the node that first obtained the lock file. In addition, election processes may be attached to determine the first node. For example, A, B, C, D four nodes are configured under the gateway, when node a is abnormal, that is, when node a is the second node of the above abnormality, node B may be superior to node C and node D for reasons of network conditions and processing capacity, in this case, node B acquires the lock file from the shared storage system, and after B acquires the lock file, node C and node D may determine whether node B is qualified to process the service request of node a according to node attributes (for what data to store), processing capacity, and the like of node B. When nodes C and D determine that node B qualifies, the system determines node B as the first node.
The second node anomaly may include two cases, a first case: the second node fails to connect with the gateway, but the second node can also read and write the heartbeat file. The heartbeat file can be configured in the shared storage system, and the running state of the node can be obtained from the heartbeat file. That is, at this time, the second node is not down, but may be connected with the gateway to fail due to network failure and other reasons; in the second case: and if the second node is down, namely the connection with the gateway fails, the heartbeat file cannot be modified. These two cases will be separately explained with reference to fig. 2.
For the first case
In step S202, the first node may write second node exception information to the heartbeat file. The format and the data content of the information are not specially limited, and the second node can be identified as the standard; in step S204, the second node may inquire that there is second node abnormal information in the heartbeat file, stop the service by itself in response to the second node abnormal information, and mark the service state of the second node as stopped in the heartbeat file, that is, the state is rewritten to "down"; in step S206, the first node may query from the heartbeat file that the service status of the second node is marked as stopped, at which point it is ensured that the second node has stopped service.
For the second case
Step S202 jumps to step S208, and in step S208, if the first node queries that the service status of the second node in the heartbeat file is not marked as stop, the first node may send a service stop instruction to the second node to ensure that the second node stops service. Specifically, after determining that the connection between the second node and the gateway fails for 10 seconds, the first node may send a service stop instruction to the second node, or may call a remote program to stop the service of the second node. In fact, at this time, the second node goes down, and the service stop instruction is not received. The purpose of this is to further ensure that the second node service is stopped, thereby ensuring the security of the first node activation.
Specifically, under the second condition, the first node may write the heartbeat file with the second node abnormal information, and because the second node is down, the first node does not stop the service itself, and the first node does not acquire the information that the second node has stopped the service. After determining that the connection between the second node and the gateway has failed for 10 seconds, the first node may send a service stop instruction to the second node.
And S16, after ensuring that the second node stops service, the first node processes a service request aiming at the second node based on the virtual IP.
In addition, the node exception handling method of the present disclosure further includes, before step S12, a step of determining whether the heartbeat file is updated normally. It should be appreciated that in the exemplary embodiments of the present disclosure, whether a node updates a heartbeat file normally is different from whether the node can communicate with the heartbeat file, that is, even if the node updates the heartbeat file abnormally, it does not directly mean that the node is down. In one example, the node does not process the traffic service, possibly due to a connection failure between the node and the gateway, and at this time, the node does not perform operations such as modification and writing of the heartbeat file although being able to communicate with the heartbeat file.
And under the condition that the second node is normal, updating the heartbeat file. The first node may indirectly determine whether the second node may be abnormal by determining whether the update of the heartbeat file is normal. When it is determined that the update of the heartbeat file by the second node is not normal, the processes of step S12 and step S16 are performed.
According to some embodiments of the present disclosure, if the first node determines that the second node does not update the heartbeat file within the first preset time, indicating that the second node may be abnormal, the processes of step S12 and step S16 may be performed. The first preset time may be set according to a time for updating the heartbeat file historically and an actual service condition, for example, the first preset time is 10s, which is not limited in this disclosure.
According to some other embodiments, if the first node determines that the time intervals of updating the heartbeat file of the second node are all greater than a time threshold within a second preset time, which indicates that the second node may be abnormal, the processes of step S12 and step S16 may be performed. Similarly, the second preset time and the time threshold may be set according to the time for updating the heartbeat file historically and the actual service condition, for example, the second preset time is 100s, and the time threshold is 7s, which is not limited in this disclosure.
In the node exception handling method according to the exemplary embodiment of the present disclosure, on one hand, when a node is abnormal, manual handling is not required, and the workload of operators is reduced; on the other hand, the first node sharing the virtual IP replaces an abnormal second node to process the service request, the switching speed is high, and the requirement of service timeliness is met.
It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Further, the present exemplary embodiment also provides a node exception handling system, which is applied to an inflixdb cluster.
FIG. 3 schematically illustrates an architecture diagram of a node exception handling system of an exemplary embodiment of the present disclosure.
Referring to fig. 3, a node exception handling system according to an exemplary embodiment of the present disclosure may include a gateway 31, a first node 32 and a second node 33 sharing a virtual IP. In addition, the node exception handling system may further include a shared storage system 34, and the shared storage system 34 is configured with a heartbeat file and a lock file.
Specifically, the first node 32 may be configured to determine a connection status of the second node 33 and the gateway 31; if the connection between the second node 33 and the gateway 31 fails, writing second node abnormal information into the heartbeat file and ensuring that the second node 33 stops service; after ensuring that the second node 33 stops servicing, the service request for the second node 33 is processed based on the virtual IP.
According to an exemplary embodiment of the present disclosure, writing the second node exception information into the heartbeat file by the first node 32 and ensuring that the second node 33 is out of service includes: the first node 32 writes second node abnormal information into the heartbeat file; the second node 33 inquires that the heartbeat file has second node abnormal information, responds to the second node abnormal information to stop service, and marks the service state of the second node 33 as stop in the heartbeat file; the first node 32 queries from the heartbeat file that the service status of the second node 33 is marked as stopped to ensure that the second node 32 is out of service.
According to an exemplary embodiment of the present disclosure, writing the second node exception information into the heartbeat file by the first node 32 and ensuring that the second node 33 is out of service includes: the first node 32 writes second node abnormal information into the heartbeat file; if the first node 32 queries that the service status of the second node 33 in the heartbeat file is not marked to stop, the first node 32 sends a service stop instruction to the second node 33 to ensure that the second node 33 stops service.
According to the exemplary embodiment of the present disclosure, the first node 32 is further configured to determine whether the time for updating the heartbeat file by the second node 33 is normal, and if not, the first node 32 determines the connection status of the second node 33 and the gateway 31.
According to an exemplary embodiment of the present disclosure, the event that the second node 33 updates the heartbeat file is abnormal includes: within a first preset time, the second node 33 does not update the heartbeat file; or within a second predetermined time, the time intervals of updating the heartbeat file by the second node 33 are all larger than a time threshold.
According to an exemplary embodiment of the present disclosure, the gateway 31 performs a ping operation to monitor the connection status of the first node 32 and the gateway 31.
According to the exemplary embodiment of the present disclosure, when the second node 33 fails to connect with the gateway 31, the first node 32 acquires the lock file to prevent the rest of the nodes under the gateway from processing the service request for the second node 33 in parallel.
In the node exception handling system of the exemplary embodiment of the present disclosure, on one hand, when a node is abnormal, manual handling is not required, and the workload of operators is reduced; on the other hand, the first node sharing the virtual IP replaces an abnormal second node to process the service request, the switching speed is high, and the requirement of service timeliness is met.
Since the descriptions of the parts of the node exception handling system according to the embodiment of the present invention have been described in the above node exception handling method, they are not described herein again.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.
Referring to fig. 4, a program product 400 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 500 according to this embodiment of the invention is described below with reference to fig. 5. The electronic device 500 shown in fig. 5 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the electronic device 500 is embodied in the form of a general purpose computing device. The components of the electronic device 500 may include, but are not limited to: the at least one processing unit 510, the at least one memory unit 520, a bus 530 connecting various system components (including the memory unit 520 and the processing unit 510), and a display unit 540.
Wherein the storage unit stores program code that is executable by the processing unit 510 to cause the processing unit 510 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 510 may perform steps S12 and S16 as shown in fig. 1.
The memory unit 520 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)5201 and/or a cache memory unit 5202, and may further include a read only memory unit (ROM) 5203.
Storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 500 may also communicate with one or more external devices 600 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 500, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 500 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 550. Also, the electronic device 500 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 560. As shown, the network adapter 560 communicates with the other modules of the electronic device 500 over the bus 530. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims (8)

1. A node exception handling method is applied to an infiluxdb cluster and is characterized by comprising the following steps:
the first node determines the connection state of the second node and the gateway; wherein the first node and the second node share a virtual IP;
if the second node fails to be connected with the gateway, the first node writes second node abnormal information into a heartbeat file, the second node inquires that the heartbeat file contains the second node abnormal information, the second node responds to the second node abnormal information to stop service and marks the service state of the second node as stop in the heartbeat file, and the first node inquires from the heartbeat file that the service state of the second node is marked as stop to ensure that the second node stops service;
after ensuring that the second node is out of service, the first node processes a service request for the second node based on the virtual IP.
2. The node exception handling method according to claim 1, wherein before the first node determines the connection state of the second node to the gateway, the node exception handling method further comprises:
the first node judges whether an event of updating the heartbeat file by the second node is normal or not;
and if not, the first node determines the connection state of the second node and the gateway.
3. The node exception handling method of claim 2, wherein the event that the second node updates the heartbeat file is not normal comprises:
in a first preset time, the second node does not update the heartbeat file; or
And within second preset time, the time interval of updating the heartbeat file by the second node is larger than a time threshold value.
4. The node exception handling method according to claim 1, further comprising:
monitoring the connection state of the first node and the gateway through a ping operation of the gateway when the second node fails to connect with the gateway.
5. The node exception handling method according to claim 1, further comprising:
when the connection between the second node and the gateway fails, the first node acquires a lock file to prevent other nodes under the gateway from processing the service request aiming at the second node in parallel.
6. A node exception handling system is applied to an infiluxdb cluster and is characterized by comprising a gateway, a first node and a second node sharing a virtual IP, wherein:
the first node is used for determining the connection state of the second node and the gateway; if the connection between the second node and the gateway fails, writing second node abnormal information into the heartbeat file and ensuring that the second node stops service; processing a service request for the second node based on the virtual IP after ensuring that the second node is out of service;
wherein the process by the first node to ensure the second node is out of service is configured to perform: and when the second node inquires that the heartbeat file contains the second node abnormal information, the first node inquires from the heartbeat file that the service state of the second node is marked as stopped under the condition that the second node abnormal information stops service and the service state of the second node is marked as stopped in the heartbeat file, so that the second node is ensured to stop service.
7. A storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the node exception handling method of any one of claims 1 to 5.
8. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the node exception handling method of any of claims 1 to 5 via execution of the executable instructions.
CN201810577770.0A 2018-06-05 2018-06-05 Node exception handling method and system, storage medium and electronic device Active CN108964977B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810577770.0A CN108964977B (en) 2018-06-05 2018-06-05 Node exception handling method and system, storage medium and electronic device
PCT/CN2018/101021 WO2019232931A1 (en) 2018-06-05 2018-08-17 Node exception processing method and system, device and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810577770.0A CN108964977B (en) 2018-06-05 2018-06-05 Node exception handling method and system, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN108964977A CN108964977A (en) 2018-12-07
CN108964977B true CN108964977B (en) 2021-06-01

Family

ID=64493457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810577770.0A Active CN108964977B (en) 2018-06-05 2018-06-05 Node exception handling method and system, storage medium and electronic device

Country Status (2)

Country Link
CN (1) CN108964977B (en)
WO (1) WO2019232931A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113572683B (en) * 2020-04-29 2023-07-04 腾讯科技(深圳)有限公司 Data processing method, device, electronic equipment and storage medium
CN114422412B (en) * 2020-10-13 2023-11-17 华为技术有限公司 Equipment detection method and device and communication equipment
WO2022140961A1 (en) * 2020-12-28 2022-07-07 西安大医集团股份有限公司 Method for monitoring communication connection, medical system and storage medium
CN115134219A (en) * 2022-06-29 2022-09-30 北京飞讯数码科技有限公司 Device resource management method and device, computing device and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7475134B2 (en) * 2003-10-14 2009-01-06 International Business Machines Corporation Remote activity monitoring
CN101043354B (en) * 2006-04-28 2010-05-12 华为技术有限公司 Uplink protection method and system for wideband Communication apparatus
CN101217402B (en) * 2008-01-15 2012-01-04 杭州华三通信技术有限公司 A method to enhance the reliability of the cluster and a high reliability communication node
CN101483555A (en) * 2009-02-12 2009-07-15 浙江工商大学 Method for redundant backup implemented by dual ForCES control piece
CN103209095B (en) * 2013-03-13 2017-05-17 广东中兴新支点技术有限公司 Method and device for preventing split brain on basis of disk service lock
CN105991325B (en) * 2015-02-10 2019-06-21 华为技术有限公司 Handle the method, apparatus and system of the failure at least one distributed type assemblies

Also Published As

Publication number Publication date
CN108964977A (en) 2018-12-07
WO2019232931A1 (en) 2019-12-12

Similar Documents

Publication Publication Date Title
CN108964977B (en) Node exception handling method and system, storage medium and electronic device
US9389937B2 (en) Managing faulty memory pages in a computing system
US8245077B2 (en) Failover method and computer system
US9052833B2 (en) Protection of former primary volumes in a synchronous replication relationship
RU2653254C1 (en) Method, node and system for managing data for database cluster
KR20210040866A (en) File resource processing method and apparatus, device and medium
US10842041B2 (en) Method for remotely clearing abnormal status of racks applied in data center
US10754722B1 (en) Method for remotely clearing abnormal status of racks applied in data center
CN110881224B (en) Network long connection method, device, equipment and storage medium
CN109639755B (en) Associated system server decoupling method, device, medium and electronic equipment
US8812900B2 (en) Managing storage providers in a clustered appliance environment
CN111031126A (en) Cluster cache sharing method, system, equipment and storage medium
US20200305300A1 (en) Method for remotely clearing abnormal status of racks applied in data center
CN111159237B (en) System data distribution method and device, storage medium and electronic equipment
CN112217718A (en) Service processing method, device, equipment and storage medium
CN110806917A (en) Anti-split virtual machine high-availability management device and method
CN113206760B (en) Interface configuration updating method and device for VRF resource allocation and electronic equipment
CN110554731A (en) Clock synchronization control method, intelligent terminal and storage medium
CN111416721A (en) Far-end eliminating method for abnormal state of cabinet applied to data center
CN116010026B (en) Micro-service control method and control system based on virtual path
US20130212210A1 (en) Rule engine manager in memory data transfers
CN117170975A (en) Application program monitoring method, device, equipment and storage medium
KR20220069747A (en) Power Supply Apparatus for Clustering System
JP2023530772A (en) Operation status switching method, device, active/standby management system and network system
CN113656231A (en) Processing method, device, equipment and storage medium for disk failure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant