CN103995901A - Method for determining data node failure - Google Patents

Method for determining data node failure Download PDF

Info

Publication number
CN103995901A
CN103995901A CN 201410254980 CN201410254980A CN103995901A CN 103995901 A CN103995901 A CN 103995901A CN 201410254980 CN201410254980 CN 201410254980 CN 201410254980 A CN201410254980 A CN 201410254980A CN 103995901 A CN103995901 A CN 103995901A
Authority
CN
Grant status
Application
Patent type
Prior art keywords
node
data
application
data node
failure
Prior art date
Application number
CN 201410254980
Other languages
Chinese (zh)
Other versions
CN103995901B (en )
Inventor
赵晓平
唐超
马丽伟
秦波
王�锋
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor ; File system structures therefor in structured data stores
    • G06F17/30575Replication, distribution or synchronisation of data between databases or within a distributed database; Distributed database system architectures therefor

Abstract

The invention discloses a method for determining a data node failure. The method is used for a distributed database. The method comprises the steps that all application nodes of the distributed database are accessed, and when any application node cannot be connected with a certain data node in the distributed database, the broadcast that the data node cannot be connected is sent to other application nodes; after other application nodes receive the broadcast, a connecting request is sent to the data node to determine whether the data node can be connected or not; when the number of the application nodes which cannot be connected with the data node reaches a set threshold value, the data node failure is determined. According to the method, the characteristic that the application nodes belong to different IPs is utilized, whether the data node is in failure or not is determined, the influence of network fluctuation on a single IP generated when a synchronous request is sent to the data node through the same IP can be avoided, and then the failure reason of the data node can be judged more accurately.

Description

一种确定数据节点失效的方法 A method of determining failure data node

技术领域 FIELD

[0001] 本发明涉及分布式数据库领域,特别涉及一种确定数据节点失效的方法。 [0001] The present invention relates to a distributed database, and in particular relates to a method for determining a data node failure.

背景技术 Background technique

[0002] 随着网络技术的不断发展,对数据的存储和访问的要求越来越高,由此,分布式数据库应运而生。 [0002] With the continuous development of network technology, the requirements for storing and accessing data increasing, thus, distributed database came into being. 分布式数据库的高扩展性和高可用性为许多需要不间断工作的网站解决了难题。 Distributed database high scalability and high availability for uninterrupted work of many sites need to solve the problem.

[0003] 分布式数据库,是由分布在多个计算机节点上的子数据库组成,分布在各个计算机节点上的各个子数据库称为数据节点,各个数据节点在逻辑上是相关的,地位是平等的。 [0003] distributed database, by sub-databases distributed over a plurality of computer nodes composition, each sub-databases distributed on each computer node is referred to as a data node, each data node logically related, are equal in status . 为了保证整个分布式数据库的正常运行,必须即时了解每个数据节点的运行状态,以确定是否能正常提供服务,即确定数据节点是否有效。 In order to ensure the normal operation of the distributed database must understand the real-time operational status of each data node, to determine whether it can provide normal service, that node determines that the data is valid. 而网络波动、硬件故障等原因,都可能导致数据节点的失效,例如,网络波动会引起数据节点的暂时性失效,而硬件故障则会到时数据节点永久失效。 And network fluctuations, hardware failures and other reasons, may lead to failure of the data node, e.g., a temporary network failure will cause fluctuations in the data node, and the premature failure of a hardware failure occurs to the data node. 因此需要一种有效的手段来确定当前数据节点是否失效。 Therefore a need for an efficient means to determine whether the current data node failure.

[0004] Cassandra是一套开源分布式NoSQL数据库系统。 [0004] Cassandra is an open source NoSQL distributed database system. 由于Cassandra良好的可扩放性,已被众多知名网站所采纳,成为了一种流行的分布式结构化数据存储方案。 As good Cassandra can enlarge the resistance, has been adopted by many well-known sites, it has become a popular distributed structured data storage solutions. 在Cassandra中,判定节点失效的方法是采用基于疑似度的检测(Accrual Fai lureDetection)。 Cassandra method, the node failure is determined based detection of suspected degree (Accrual Fai lureDetection). 该方法的基本思想是在分布式环境下,通过一个代表失效疑似度的值来判断数据节点是否失效。 The basic idea is that in a distributed environment, to determine whether the data value of the suspected failure of node failure by a representative. 该方法是在一定的时间窗口内,不断向数据节点发送同步请求,如果数据节点未能响应同步消息一次,那么该数据节点的失效疑似度的值就加1,当失效疑似度的值达到某个设定的阈值后,就确定该数据节点的永久失效。 This method is within a certain window of time, continues to send data to the node synchronization request, if the data node fails to respond to a synchronization message, then the value of the suspected failure data node is increased by 1, when the value reaches a certain degree of suspected failure after a set threshold value, it is determined that the data is permanently disable node.

[0005] 由于采用上述基于疑似度的检测的方法,通过同一个IP向数据节点发送同步请求,不能很好的避免因网络波动对所发送同步请求的影响,在一段时间内由于网络波动可能产生同步请求数据和/或数据节点对同步请求的响应数据的丢失,进而可能造成在发送同步请求的一段时间内,数据节点失效疑似度的值显著增加,甚至使得数据节点失效疑似度的达到所设定的阈值而被判定为永久失效,但实际上在这段时间过后,数据节点仍然会处于可用状态而并非真的永久失效。 [0005] Because the method of detection of suspected based on the data synchronization request node transmits the same through an IP, can not be good to avoid the influence of fluctuations in the transmission network synchronization request, the network may be generated due to the ripple of the period of time synchronization request missing data and / or data node in response to the data synchronization request, which may cause over a period of time to send the synchronization request, the data node failure values ​​suspected of significantly increased, even such data node failure suspected degree reaches the set threshold value is determined to be a permanent failure, but in fact, after this period of time, the data node will still be in a usable state and not really a permanent failure. 因此,现有的上述基于疑似度的检测的方法在使用过程中可能产生数据节点失效的误判。 Thus, the conventional method for detecting the suspected of false data may be generated based on the node failure during use.

发明内容 SUMMARY

[0006] 有鉴于此,本发明提供一种确定数据节点失效的方法,以准确的判断数据节点是因网络引起的暂时性失效,还是硬件原因引起的永久失效。 [0006] Accordingly, the present invention provides a method of determining failure data node to the data node is the accurate determination of premature failure due to a temporary network failure caused by, causes or hardware.

[0007] 本申请的技术方案是这样实现的: [0007] The technical solution of the present application is implemented:

[0008] 一种确定数据节点失效的方法,用于分布式数据库,该方法包括: [0008] A method of determining failure of a data node, a distributed database, the method comprising:

[0009] 在访问所述分布式数据库的所有应用节点中,当任意一个应用节点连接不上所述分布式数据库中的某个数据节点时,向其它应用节点发出连接不上该数据节点的广播; [0009] Application of all the nodes in the distributed database accessing, when any one application node is not connected to said one data node in a distributed database, a broadcast is sent on the data node is not connected to other nodes applications ;

[0010] 其它应用节点收到所述广播后,向该数据节点发出连接请求,以确定是否能够连接该数据节点; After [0010] the other node receives the broadcast application, the connection request sent to the data node, to determine whether data can be connected to the node;

[0011] 当无法连接该数据节点的应用节点数量达到所设定的阈值时,确定该数据节点失效。 [0011] When the number of nodes the application can not connect to the data node reaches the set threshold value, determines that the data node failure.

[0012] 进一步,在访问所述分布式数据库的所有应用节点中,选出任意一个应用节点作为仲裁节点,以统计无法连接该数据节点的应用节点的数量。 [0012] Further, in all applications to access nodes in the distributed database, selecting any of a number of arbitration node as the application node, the application node can not be connected to the statistics of the data node.

[0013]进一步: [0013] Further:

[0014] 在所述仲裁节点中设定一判定值,并将所述判定值初始化为O ; [0014] is set in a node in the arbitration decision value and the decision value initialization is O;

[0015] 当所述其它应用节点向该数据节点发出连接请求后,均将是否能够连接该数据节点的信息发送给所述仲裁节点; [0015] When the application makes a connection request to the other nodes to the node data, it is able to connect both the information of the data node to the arbitration node;

[0016] 所述仲裁节点接收所有应用节点发来的是否能够连接该数据节点的信息,且所述仲裁节点每收到一个应用节点发来的无法连接该数据节点的消息,便将所述判定值做一次加I操作; [0016] The arbitration node receives all sent to the application node is able to connect information of the data node, and each said arbitration node receives a message sent to the application node can not be connected to the data node, a determination put plus I do a value operation;

[0017] 当所述仲裁节点接收完所有应用节点发来的是否能够连接该数据节点的信息后: [0017] node when the arbiter has received all the application node sent information about whether the data can be connected to the node:

[0018] 若所述判定值达到所设定的阈值,则确定该数据节点失效; [0018] If the determination threshold value reaches the set value, it is determined that the data node failure;

[0019] 若所述判定值未达到所设定的阈值,则确定该数据节点有效。 [0019] When the determination threshold is not reached the set, it is determined that the data is valid node.

[0020] 进一步,所述阈值为访问所述分布式数据库的所有应用节点数量的一半。 [0020] Further, the threshold value of the access number of the distributed database applications half of all nodes.

[0021] 进一步,确定该数据节点失效之后,所述方法还包括: After [0021] Further, it is determined that the data node failure, the method further comprising:

[0022] 将该数据节点从所述分布式数据库中删除; [0022] node deletes the data from the distributed database;

[0023]启用该数据节点的备份节点。 [0023] The data backup node Enable node.

[0024] 进一步,确定该数据节点有效之后,所述方法还包括: After [0024] Further, the node determines that the data is valid, the method further comprising:

[0025] 将所述判定值恢复为初始值O ; [0025] The determination value to the initial value of O;

[0026] 连接不上该数据节点的应用节点定时向该数据节点发送连接请求,以等待该数据节点恢复连接。 [0026] is not connected to the data node application node periodically sends a connection request to the data node, to wait for the data to restore the connection node.

[0027] 进一步,当任意一个应用节点连接不上所述分布式数据库中的某个数据节点时,屏蔽掉该应用节点到该数据节点的连接。 [0027] Further, when any one application node is not connected to said one data node in a distributed database, to shield off the application node connected to the data node.

[0028] 进一步,各个应用节点分属于不同IP。 [0028] Further, each application node belong to different IP.

[0029] 从上述方案可以看出,本发明的确定数据节点失效的方法中,当某一应用节点连接不上某个数据节点后,通过多个应用节点向该数据节点发出连接请求以确定是否能够连接该数据节点,进而确定该数据节点是否失效,由于各个应用节点分属于不同IP,进而可避免现有技术中通过同一个IP向数据节点发送同步请求时由于网络波动对该单一IP造成的影响。 [0029] As can be seen from the above embodiment, the present invention is the method for determining failure of a node, when the node can not connect an application to a data node, a connection request sent to the application node through a plurality of data nodes to determine whether can be connected to the data node, and then determine whether the data node failure, since the respective nodes belong to different IP application, and thus can avoid fluctuations of the single IP network prior art when a synchronization request to transmit data through the same IP node influences. 本发明比现有技术更加准确的判断数据节点是因网络引起的暂时性失效,还是硬件原因引起的永久失效。 More accurate than the prior art of the present invention determines the data node is a temporary failure due to premature failure caused by the network, or hardware causes.

附图说明 BRIEF DESCRIPTION

[0030] 图1为本发明的确定数据节点失效的方法流程图; The method of determining data node [0030] FIG. 1 is a flowchart of the failure of the present invention;

[0031] 图2为本发明实施例流程图。 [0031] The flowchart in FIG. 2 embodiment of the present invention.

具体实施方式[0032] 为了使本发明的目的、技术方案及优点更加清楚明白,以下参照附图并举实施例,对本发明作进一步详细说明。 DETAILED DESCRIPTION [0032] To make the objectives, technical solutions and advantages of the present invention will become more apparent, with reference to the accompanying drawings and the following embodiments, the present invention is described in further detail.

[0033] 本发明的确定数据节点失效的方法用于分布式数据库,如图1所示,该方法包括: [0033] The present invention is the method for determining failure of a node for a distributed database, shown in Figure 1, the method comprising:

[0034] 在访问所述分布式数据库的所有应用节点中,当任意一个应用节点连接不上所述分布式数据库中的某个数据节点时,向其它应用节点发出连接不上该数据节点的广播; [0034] Application of all the nodes in the distributed database accessing, when any one application node is not connected to said one data node in a distributed database, a broadcast is sent on the data node is not connected to other nodes applications ;

[0035] 其它应用节点收到所述广播后,向该数据节点发出连接请求,以确定是否能够连接该数据节点; After [0035] the other node receives the broadcast application, the connection request sent to the data node, to determine whether data can be connected to the node;

[0036] 当无法连接该数据节点的应用节点数量达到所设定的阈值时,确定该数据节点失效。 [0036] When the number of nodes the application can not connect to the data node reaches the set threshold value, determines that the data node failure.

[0037] 其中,统计无法连接该数据节点的应用节点的数量是在一仲裁节点中进行。 Quantity [0037] wherein the statistical data is not connected to the application node in the node is a node in the arbitration. 仲裁节点的选择是:在访问所述分布式数据库的所有应用节点中,任意选出的一个应用节点作为仲裁节点。 Arbitration is selected nodes: nodes access to all applications in the distributed database, an application arbitrarily selected node as the arbitration node.

[0038] 所述仲裁节点统计无法连接该数据节点通过如下方法进行: [0038] Statistics can not connect to the arbitration node of the node data by the following method:

[0039] 在所述仲裁节点中设定一判定值,并将所述判定值初始化为O ; [0039] is set in a node in the arbitration decision value and the decision value initialization is O;

[0040] 当所述其它应用节点向该数据节点发出连接请求后,均将是否能够连接该数据节点的信息发送给所述仲裁节点; [0040] When the application makes a connection request to the other nodes to the node data, it is able to connect both the information of the data node to the arbitration node;

[0041] 所述仲裁节点接收所有应用节点发来的是否能够连接该数据节点的信息,且所述仲裁节点每收到一个应用节点发来的无法连接该数据节点的消息,便将所述判定值做一次加I操作; [0041] The arbitration node receives all sent to the application node is able to connect information of the data node, and each said arbitration node receives a message sent to the application node can not be connected to the data node, a determination put plus I do a value operation;

[0042] 当所述仲裁节点接收完所有应用节点发来的是否能够连接该数据节点的信息后: [0042] node when the arbiter has received all the application node sent information about whether the data can be connected to the node:

[0043] 若所述判定值达到所设定的阈值,则确定该数据节点失效; [0043] If the determination threshold value reaches the set value, it is determined that the data node failure;

[0044] 若所述判定值未达到所设定的阈值,则确定该数据节点有效。 [0044] When the determination threshold is not reached the set, it is determined that the data is valid node.

[0045] 与现有技术不同的是,本发明的方法是当某一应用节点连接不上某个数据节点后,通过多个应用节点向该数据节点发出连接请求以确定是否能够连接该数据节点,进而确定该数据节点是否失效,各个应用节点分属于不同IP,进而可避免现有技术中通过同一个IP向数据节点发送同步请求时由于网络波动对该单一IP造成的影响,进而比现有技术更加准确的判断数据节点是因网络引起的暂时性失效,还是硬件原因引起的永久失效。 [0045] Unlike the prior art, the process of the present invention is applied when a node not connected to a data node, a connection request sent to the application node through a plurality of data nodes to determine whether the node connected to the data and then determine whether the data node fails, each node belonging to different IP application, and thus can avoid sending the influence caused by the fluctuations caused by the single IP synchronization request by the prior art to the same IP data node, and further than in the prior technical data more accurately determine the permanent node failures due to transient network failures caused by hardware or causes.

[0046] 本发明的上述方法中,当确定该数据节点失效之后,还包括: [0046] The method of the present invention, after determining that the data node fails, further comprising:

[0047] 将该数据节点从所述分布式数据库中删除; [0047] node deletes the data from the distributed database;

[0048]启用该数据节点的备份节点。 [0048] The data backup node Enable node.

[0049] 进而实现了对失效数据节点替换。 [0049] Further to achieve the replacement of the node failure data.

[0050] 当确定该数据节点有效之后,本发明的方法还包括: [0050] After determining the effective data node, the method of the present invention further comprises:

[0051 ] 将所述判定值恢复为初始值O ; [0051] The determination value to the initial value of O;

[0052] 连接不上该数据节点的应用节点定时向该数据节点发送连接请求,以等待该数据节点恢复连接。 [0052] The data connection is not the timing of application node node transmits a connection request to the data node, to wait for the data to restore the connection node.

[0053] 在实际网络应用时,访问分布式数据库的应用节点的数量庞大,每个应用节点的IP地址各不相同,而分布式数据库中具有大量的数据节点。 [0053] In practical network applications, access to a large number of distributed database application node, IP address of each node is not the same application, but with a large number of distributed databases data node. 以下结合一个具体实施例,对本发明的方法进行说明。 The following embodiment in conjunction with a specific embodiment, the method of the present invention will be described. 该实施例中,假设访问分布式数据库的应用节点共有N个,N>1,分布式数据库中具有M个数据节点(M>1),其中出现N个应用节点中的应用节点i(l < i < N)连接不上分布式数据库中的数据节点j (数据节点j为M个数据节点中的任意一个)。 In this embodiment, assume that the application access nodes distributed database total of N, N> 1, a distributed database having data node M (M> 1), wherein the application occurs applications node N nodes i (l < i <N) is not connected to the data node in the distributed database and j (j is any one of the data nodes M data nodes). 如图2所示,该实施例包括以下步骤: As illustrated, the Example 2 comprises the following steps:

[0054] 步骤1、从N个应用节点中任意选出一个应用节点作为仲裁节点,并在仲裁节点中设定一判定值,并将判定值初始化为“0”,设定一阈值,并将阈值设置为N/2,之后进入步骤2。 [0054] Step 1, from the N application arbitrarily selected node as a node arbitration node application, setting a determination value and the node in the arbitration, and the determined value is initialized to "0", setting a threshold value, and threshold is set to N / 2, then proceeds to step 2.

[0055] 步骤2、当应用节点i连接不上分布式数据库中的数据节点j时,向其它应用节点发出连接不上数据节点j的广播,之后进入步骤3。 [0055] Step 2, the application data node when the node i is not connected to the j in a distributed database, a broadcast is sent to the data node j is not connected to other application nodes, then proceeds to step 3.

[0056] 所有应用节点中的任意一个应用节点连接不上分布式数据库中的某个数据节点时,还可进一步包括,屏蔽掉该应用节点到该数据节点的连接。 When [0056] the application of any one application node all nodes not connected to a data node in a distributed database, may further comprise, to shield off the application node connected to the node data. 例如本步骤2中,当应用节点i连接不上数据节点j时,应用节点i屏蔽掉其到数据节点j的连接,进而可避免应用节点i 一直发起对数据节点j的连接但连接不上数据节点j所造成的网络资源开销。 For example in the present step 2, when the application node i is not connected to the data node j, the application node i shield its connection to the data node j, in turn, can avoid the application of node i have to initiate a connection to the data node j, but not connected to the data node j caused by the overhead of network resources.

[0057] 步骤3、其它应用节点收到连接不上数据节点j的广播后,向数据节点j发出连接请求,之后进入步骤4。 [0057] Step 3, after receiving nodes other applications not connected node j broadcast data, data connection request is issued to the node j, then proceeds to step 4.

[0058] 步骤4、其它应用节点均将是否能够连接数据节点j的信息发送给所述仲裁节点,之后进入步骤5。 [0058] Step 4, as to whether other applications are able to connect the node transmitting information data arbitration node j to the node, then proceeds to step 5.

[0059] 步骤5、仲裁节点接收所有应用节点发来的是否能够连接数据节点j的信息,且仲裁节点每收到I个应用节点发来的无法连接数据节点j的消息,便将判定值进行加I操作,之后进入步骤6。 [0059] Step 5, the application node arbitration node receives all incoming data information is able to connect the node j, and each arbitration node I receives the message sent by the node applications can not be connected to the data node j, put determination value I add operation, then proceeds to step 6.

[0060] 步骤6、仲裁节点判定累加的判定值是否达到设定的阈值N/2:若累加的判定值达到设定的阈值N/2,则确定数据节点j失效,之后进入步骤7;若累加的判定值未达到所设定的阈值N/2,则确定该数据节点有效,之后进入步骤9。 [0060] Step 6, the arbitration node determines the accumulated determines whether the value reaches the set threshold value N / 2: If the accumulated determination value reaches a set threshold value N / 2, it is determined that the data node j fails, then proceeds to step 7; if accumulated value does not reach the set determination threshold value N / 2, then the node determines that the data is valid, then proceeds to step 9.

[0061] 步骤7、将数据节点j从所述分布式数据库中删除,之后进入步骤8。 [0061] Step 7, the data is deleted from the node j distributed database, then proceeds to step 8.

[0062] 步骤8、启用数据节点j的备份节点j',以替代数据节点j。 [0062] Step 8, the data backup node to enable the node j j ', instead of the data node j.

[0063] 步骤9、仲裁节点将所述判定值恢复为初始值0,并通知应用节点i数据节点j有效,之后进入步骤10; [0063] Step 9, the arbitration node determination value to the initial value 0, and informs the application node i to node j valid data, then proceeds to step 10;

[0064] 步骤10、应用节点i接收到仲裁节点发来的数据节点j有效的消息后,定时向数据节点j发送连接请求,以等待数据节点j恢复连接。 After [0064] Step 10, the application of arbitration node i to node receives the data sent by node j valid message, the timing of the connection request to the data transmission node j, node j to wait for the data to restore the connection.

[0065] 采用本发明的确定数据节点失效的方法,当某一应用节点连接不上某个数据节点后,通过多个应用节点向该数据节点发出连接请求以确定是否能够连接该数据节点,进而确定该数据节点是否失效,由于各个应用节点分属于不同IP,进而可避免现有技术中通过同一个IP向数据节点发送同步请求时由于网络波动对该单一IP造成的影响。 [0065] The method of the present invention determines the data node failure, when an application on one node is not connected to data node, a connection request sent to the application node through a plurality of data to determine whether the node connected to the data node, and further determine whether the data node failure, since the respective nodes belong to different IP application, and thus can avoid the influence caused by the fluctuations caused by the single IP prior art synchronization request to send data through the same IP node. 本发明比现有技术更加准确的判断数据节点是因网络引起的暂时性失效,还是硬件原因引起的永久失效。 More accurate than the prior art of the present invention determines the data node is a temporary failure due to premature failure caused by the network, or hardware causes.

[0066] 以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。 [0066] The foregoing is only preferred embodiments of the present invention but are not intended to limit the present invention, all within the spirit and principle of the present invention, any changes made, equivalent substitutions and improvements should be included within the scope of protection of the present invention.

Claims (8)

  1. 1.一种确定数据节点失效的方法,用于分布式数据库,该方法包括: 在访问所述分布式数据库的所有应用节点中,当任意一个应用节点连接不上所述分布式数据库中的某个数据节点时,向其它应用节点发出连接不上该数据节点的广播; 其它应用节点收到所述广播后,均向该数据节点发出连接请求,以确定是否能够连接该数据节点; 当无法连接该数据节点的应用节点数量达到所设定的阈值时,确定该数据节点失效。 1. A method of determining failure of a data node for a distributed database, the method comprising: accessing all nodes in the distributed database application, when a node is not connected to any application on the distributed database when data nodes, the broadcast connections are not sent to the data node to other nodes applications; the other node receives the broadcast applications, each connection request is issued to the data node, to determine whether data can be connected to the node; when unable to connect application of the number of nodes of the node data reaches the set threshold value, determines that the data node failure.
  2. 2.根据权利要求1所述的确定数据节点失效的方法,其特征在于: 在访问所述分布式数据库的所有应用节点中,选出任意一个应用节点作为仲裁节点,以统计无法连接该数据节点的应用节点的数量。 The method of determining failure data node according to claim 1, wherein: the nodes in all applications accessing the distributed database, selecting any one application node of arbitration nodes, the statistics can not be connected to the data node the number of application nodes.
  3. 3.根据权利要求2所述的确定数据节点失效的方法,其特征在于: 在所述仲裁节点中设定一判定值,并将所述判定值初始化为O ; 当所述其它应用节点向该数据节点发出连接请求后,均将是否能够连接该数据节点的信息发送给所述仲裁节点; 所述仲裁节点接收所有应用节点发来的是否能够连接该数据节点的信息,且所述仲裁节点每收到一个应用节点发来的无法连接该数据节点的消息,便将所述判定值做一次加I操作; 当所述仲裁节点接收完所有应用节点发来的是否能够连接该数据节点的信息后: 若所述判定值达到所设定的阈值,则确定该数据节点失效; 若所述判定值未达到所设定的阈值,则确定该数据节点有效。 3. The method of determining the data node failure according to claim 2, wherein: determining a set value of the arbitration node, and initializing the value determined is O; when the node to other applications after the node issues the connection request data, is able to connect both the information of the data node to the arbitration node; said arbitration node receives all sent to the application node is able to connect the information of the data node, and each said arbitration node after the node when the arbiter has received all the application node can be sent to the link information whether the data node; node receives an application can not be sent to the node connected to the data message, put the decision to do a value operation plus I : if the determination threshold value reaches the set value, it is determined that the data node failure; if the determination threshold is not reached the set, it is determined that the data is valid node.
  4. 4.根据权利要求1所述的确定数据节点失效的方法,其特征在于:所述阈值为访问所述分布式数据库的所有应用节点数量的一半。 4. The method of determining the data node failure according to claim 1, wherein: said threshold value is half of the distributed database access all applications the number of nodes.
  5. 5.根据权利要求1所述的确定数据节点失效的方法,其特征在于,确定该数据节点失效之后,所述方法还包括: 将该数据节点从所述分布式数据库中删除; 启用该数据节点的备份节点。 5. The method of determining failure data node according to claim 1, characterized in that, after determining that the data node failure, the method further comprising: deleting the node from the data in the distributed database; enable the data node the backup node.
  6. 6.根据权利要求3所述的确定数据节点失效的方法,其特征在于,确定该数据节点有效之后,所述方法还包括: 将所述判定值恢复为初始值O ; 连接不上该数据节点的应用节点定时向该数据节点发送连接请求,以等待该数据节点恢复连接。 6. After the method for determining failure of a node according to claim 3, wherein the data node is determined valid, the method further comprising: the value to the initial value of the determination O; not connected to the data node the timing of the application node sends a connection request to the data node, to wait for the data to restore the connection node.
  7. 7.根据权利要求1所述的确定数据节点失效的方法,其特征在于,当任意一个应用节点连接不上所述分布式数据库中的某个数据节点时,屏蔽掉该应用节点到该数据节点的连接。 The method of determining failure data node according to claim 1, wherein, when any one application node is not connected to said one data node in a distributed database, the application node to shield off the data node Connection.
  8. 8.根据权利要求1所述的确定数据节点失效的方法,其特征在于,各个应用节点分属于不同IP。 8. The method of determining failure data node according to claim 1, wherein each application node belong to different IP.
CN 201410254980 2014-06-10 2014-06-10 A method of determining failure data node CN103995901B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201410254980 CN103995901B (en) 2014-06-10 2014-06-10 A method of determining failure data node

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201410254980 CN103995901B (en) 2014-06-10 2014-06-10 A method of determining failure data node

Publications (2)

Publication Number Publication Date
CN103995901A true true CN103995901A (en) 2014-08-20
CN103995901B CN103995901B (en) 2018-01-12

Family

ID=51310066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201410254980 CN103995901B (en) 2014-06-10 2014-06-10 A method of determining failure data node

Country Status (1)

Country Link
CN (1) CN103995901B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105306545A (en) * 2015-09-28 2016-02-03 浪潮(北京)电子信息产业有限公司 Failover method and system for external service node of cluster
CN105975212A (en) * 2016-04-29 2016-09-28 深圳市永兴元科技有限公司 Failure detection processing method and device for distributed data system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102231681A (en) * 2011-06-27 2011-11-02 中国建设银行股份有限公司 High availability cluster computer system and fault treatment method thereof
US20120101987A1 (en) * 2010-10-25 2012-04-26 Paul Allen Bottorff Distributed database synchronization
CN102882792A (en) * 2012-06-20 2013-01-16 杜小勇 Method for simplifying internet propagation path diagram
US20130246608A1 (en) * 2012-03-15 2013-09-19 Microsoft Corporation Count tracking in distributed environments
US20130297976A1 (en) * 2012-05-04 2013-11-07 Paraccel, Inc. Network Fault Detection and Reconfiguration

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120101987A1 (en) * 2010-10-25 2012-04-26 Paul Allen Bottorff Distributed database synchronization
CN102231681A (en) * 2011-06-27 2011-11-02 中国建设银行股份有限公司 High availability cluster computer system and fault treatment method thereof
US20130246608A1 (en) * 2012-03-15 2013-09-19 Microsoft Corporation Count tracking in distributed environments
US20130297976A1 (en) * 2012-05-04 2013-11-07 Paraccel, Inc. Network Fault Detection and Reconfiguration
CN102882792A (en) * 2012-06-20 2013-01-16 杜小勇 Method for simplifying internet propagation path diagram

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105306545A (en) * 2015-09-28 2016-02-03 浪潮(北京)电子信息产业有限公司 Failover method and system for external service node of cluster
CN105306545B (en) * 2015-09-28 2018-09-07 浪潮(北京)电子信息产业有限公司 Methods kinds of cluster node failure to take over the foreign service and system
CN105975212A (en) * 2016-04-29 2016-09-28 深圳市永兴元科技有限公司 Failure detection processing method and device for distributed data system

Also Published As

Publication number Publication date Type
CN103995901B (en) 2018-01-12 grant

Similar Documents

Publication Publication Date Title
US20080077686A1 (en) System and Method for Replication of Network State for Transparent Recovery of Network Connections
US20130318221A1 (en) Variable configurations for workload distribution across multiple sites
US20080250097A1 (en) Method and system for extending the services provided by an enterprise service bus
CN101261644A (en) Method and device for accessing united resource positioning symbol database
CN101706805A (en) Method and system for storing object
CN102932210A (en) Method and system for monitoring node in PaaS cloud platform
CN201682522U (en) Conversation information storage system and application server
US20120323368A1 (en) Energy management gateways and processes
US7058773B1 (en) System and method for managing data in a distributed system
US8447757B1 (en) Latency reduction techniques for partitioned processing
US20060184672A1 (en) Communication channels in a storage network
US20120233496A1 (en) Fault tolerance in a parallel database system
Gunawi et al. Why does the cloud stop computing?: Lessons from hundreds of service outages
Meng et al. Reliable state monitoring in cloud datacenters
CN101651710A (en) Disaster-tolerant backup method based on P2P
US20080046552A1 (en) Service resiliency within on-premise products
US20070180309A1 (en) System and method for mirroring data
US20110320870A1 (en) Collecting network-level packets into a data structure in response to an abnormal condition
US20150036479A1 (en) Automatic stack unit replacement system
CN102318275A (en) Method, device, and system for processing messages based on CC-NUMA
CN103763383A (en) Integrated cloud storage system and storage method thereof
US20160359878A1 (en) Synthetic data for determining health of a network security system
US20120166588A1 (en) Shared-bandwidth multiple target remote copy
CN102148850A (en) Cluster system and service processing method thereof
CN102664757A (en) Cascading method and equipment for storage devices

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
GR01