CN103995901B - A kind of method for determining back end failure - Google Patents
A kind of method for determining back end failure Download PDFInfo
- Publication number
- CN103995901B CN103995901B CN201410254980.8A CN201410254980A CN103995901B CN 103995901 B CN103995901 B CN 103995901B CN 201410254980 A CN201410254980 A CN 201410254980A CN 103995901 B CN103995901 B CN 103995901B
- Authority
- CN
- China
- Prior art keywords
- back end
- node
- application node
- application
- failure
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention discloses a kind of method for determining back end failure, for distributed data base, this method includes:In all application nodes for accessing the distributed data base, when any one application node does not connect some back end in the distributed data base, the broadcast for not connecting the back end is sent to other application node;After other application node receives the broadcast, connection request is sent to the back end, to determine whether that the back end can be connected;When the application node quantity that can not connect the back end reaches set threshold value, determine that the back end fails.In the method for the present invention, the characteristics of belonging to different IP using each application node, determine whether the back end fails, can avoid passing through same IP to back end send synchronization request when because network fluctuation influences to caused by the single IP, and then can more accurately judge the failure cause of back end.
Description
Technical field
The present invention relates to distributed data base field, more particularly to a kind of method for determining back end failure.
Background technology
With the continuous development of network technology, the requirement more and more higher of the storage to data and access, thus, distributed number
Arisen at the historic moment according to storehouse.The high scalability and high availability of distributed data base are that many websites for needing non-stop run solve
Problem.
Distributed data base, it is made up of the subdata base being distributed on multiple computer nodes, is distributed in each calculating
Each subdata base on machine node is referred to as back end, and each back end is logically related, and status is equality.
In order to ensure the normal operation of whole distributed data base, it is necessary to the running status of each back end is understood immediately, to determine
Whether service can be normally provided, that is, determine whether back end is effective.And the reason such as network fluctuation, hardware fault, it may all lead
The failure of back end is caused, for example, network fluctuation can cause the temporary failure of back end, and hardware fault then can be counted then
According to node permanent failure.Therefore a kind of effective means are needed to determine whether current data node fails.
Cassandra is a set of distributed NoSQL Database Systems of increasing income.Due to the good scalabilities of Cassandra,
Adopted by numerous well-known websites, become a kind of popular distributed structured data storage scheme.In Cassandra
In, the method for predicate node failure is to use the detection (Accrual Failure Detection) based on Suspected Degree.This method
Basic thought be under distributed environment, the value that the Suspected Degree that fails is represented by one judges whether back end fails.
This method is in regular hour window, constantly synchronization request is sent to back end, if back end fails to respond to together
Walk message once, then the value of the failure Suspected Degree of the back end just adds 1, when the value of failure Suspected Degree reaches some setting
After threshold value, the permanent failure of the back end is determined that.
Due to the method using the above-mentioned detection based on Suspected Degree, synchronous ask is sent to back end by same IP
Ask, it is impossible to avoid well because of the influence of network fluctuation synchronization request to transmitted by, because network fluctuation can within a period of time
The loss of synchronization request data and/or back end to the response data of synchronization request can be produced, and then may cause sending
In a period of time of synchronization request, the value of back end failure Suspected Degree dramatically increases, even more so that back end failure is doubtful
Degree reaches set threshold value and is judged as permanent failure, but actually after this period, back end still can
In the not genuine permanent failure of upstate.Therefore, the method for the existing above-mentioned detection based on Suspected Degree was using
There may be the erroneous judgement of back end failure in journey.
The content of the invention
In view of this, the present invention provides a kind of method for determining back end failure, accurately to judge that back end is
The temporary failure caused by network, or permanent failure caused by hardware reason.
What the technical scheme of the application was realized in:
A kind of method for determining back end failure, for distributed data base, this method includes:
In all application nodes for accessing the distributed data base, when any one application node do not connect it is described
During some back end in distributed data base, the broadcast for not connecting the back end is sent to other application node;
After other application node receives the broadcast, connection request is sent to the back end, to determine whether to connect
Connect the back end;
When the application node quantity that can not connect the back end reaches set threshold value, determine that the back end loses
Effect.
Further, in all application nodes for accessing the distributed data base, any one application node work is selected
For arbitration node, the quantity of the application node of the back end can not be connected with statistics.
Further:
A decision content is set in the arbitration node, and the decision content is initialized as 0;
After the other application node sends connection request to the back end, whether the data section will can be connected
The information of point is sent to the arbitration node;
The arbitration node receives the information that whether can connect the back end that all application nodes are sent, and described
Arbitration node often receives the message that can not connect the back end that an application node is sent, and just does the decision content once
Add 1 operation;
When the arbitration node received that all application nodes send after whether can connecting the information of the back end:
If the decision content reaches set threshold value, it is determined that the back end fails;
If the decision content is not up to set threshold value, it is determined that the back end is effective.
Further, the threshold value is the half for all application node quantity for accessing the distributed data base.
Further, after determining back end failure, methods described also includes:
The back end is deleted from the distributed data base;
Enable the backup node of the back end.
Further, after determining that the back end is effective, methods described also includes:
The decision content is reverted into initial value 0;
The application node timing for not connecting the back end sends connection request to the back end, to wait the data
Node recovers connection.
Further, when any one application node does not connect some back end in the distributed data base,
Mask connection of the application node to the back end.
Further, each application node belongs to different IP.
From such scheme as can be seen that the present invention fixed number is according in the method for node failure really, when a certain application node connects
After not connecing some back end, connection request is sent to determine whether to connect to the back end by multiple application nodes
The back end is connect, and then determines whether the back end fails, because each application node belongs to different IP, and then can be kept away
Exempt from the prior art by same IP to back end send synchronization request when because network fluctuation is to caused by the single IP
Influence.The present invention more accurately judges that back end is the temporary failure caused by network than prior art, or hardware
Permanent failure caused by reason.
Brief description of the drawings
Fig. 1 is method flow diagram of the fixed number really of the invention according to node failure;
Fig. 2 is flow chart of the embodiment of the present invention.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, develop simultaneously embodiment referring to the drawings,
The present invention is described in further detail.
Really fixed number is used for distributed data base to the present invention according to the method for node failure, as shown in figure 1, this method includes:
In all application nodes for accessing the distributed data base, when any one application node do not connect it is described
During some back end in distributed data base, the broadcast for not connecting the back end is sent to other application node;
After other application node receives the broadcast, connection request is sent to the back end, to determine whether to connect
Connect the back end;
When the application node quantity that can not connect the back end reaches set threshold value, determine that the back end loses
Effect.
Wherein, the quantity for the application node that statistics can not connect the back end is carried out in an arbitration node.Arbitration
The selection of node is:In all application nodes for accessing the distributed data base, the application node arbitrarily selected is made
For arbitration node.
The arbitration node statistics can not connect the back end and carry out by the following method:
A decision content is set in the arbitration node, and the decision content is initialized as 0;
After the other application node sends connection request to the back end, whether the data section will can be connected
The information of point is sent to the arbitration node;
The arbitration node receives the information that whether can connect the back end that all application nodes are sent, and described
Arbitration node often receives the message that can not connect the back end that an application node is sent, and just does the decision content once
Add 1 operation;
When the arbitration node received that all application nodes send after whether can connecting the information of the back end:
If the decision content reaches set threshold value, it is determined that the back end fails;
If the decision content is not up to set threshold value, it is determined that the back end is effective.
Unlike the prior art, method of the invention is when a certain application node does not connect some back end
Afterwards, connection request is sent to the back end to determine whether to connect the back end by multiple application nodes, and then
Determine whether the back end fails, each application node belongs to different IP, and then can avoid in the prior art by same
Individual IP is influenceed due to network fluctuation when sending synchronization request to back end to caused by the single IP, and then than prior art more
Add and accurately judge that back end is the temporary failure caused by network, or permanent failure caused by hardware reason.
In the above method of the present invention, after it is determined that the back end fails, in addition to:
The back end is deleted from the distributed data base;
Enable the backup node of the back end.
And then realize and fail data node is replaced.
After it is determined that the back end is effective, method of the invention also includes:
The decision content is reverted into initial value 0;
The application node timing for not connecting the back end sends connection request to the back end, to wait the data
Node recovers connection.
When real network is applied, the substantial amounts of the application node of distributed data base are accessed, each application node
IP address is different, and has substantial amounts of back end in distributed data base.Below in conjunction with a specific embodiment, to this
The method of invention illustrates.In the embodiment, it is assumed that the application node for accessing distributed data base shares N number of, N>1, distribution
There is M back end (M in formula database>1), wherein there are application node i (1≤i≤N) connections in N number of application node
Back end j in not upper distributed data base (back end j is any one in M back end).As shown in Fig. 2
The embodiment comprises the following steps:
Step 1, an application node is arbitrarily selected as arbitration node from N number of application node, and in arbitration node
A decision content is set, and decision content is initialized as " 0 ", sets a threshold value, and sets a threshold to N/2, afterwards into step
2。
Step 2, when application node i does not connect the back end j in distributed data base, to other application node send out
Go out not connecting back end j broadcast, afterwards into step 3.
Any one application node in all application nodes does not connect some back end in distributed data base
When, it can also further comprise, mask connection of the application node to the back end.Such as in this step 2, work as application node
When i does not connect back end j, application node i masks it and arrives back end j connection, and then can avoid application node i mono-
Straight hair plays the connection to back end j but does not connect the network resource overhead caused by back end j.
After step 3, other application node receive the broadcast for not connecting back end j, sending connection to back end j please
Ask, afterwards into step 4.
Whether step 4, other application node will can connect back end j information and be sent to the arbitration node,
Enter step 5 afterwards.
Step 5, arbitration node receive the information that whether can connect back end j that all application nodes are sent, and secondary
Cut out node and often receive the message that can not connect back end j that 1 application node is sent, just decision content is carried out plus 1 operates, it
Enter step 6 afterwards.
Step 6, arbitration node judge whether cumulative decision content reaches the threshold value N/2 of setting:If cumulative decision content reaches
To the threshold value N/2 of setting, it is determined that back end j fails, afterwards into step 7;If cumulative decision content is not up to set
Threshold value N/2, it is determined that the back end is effective, afterwards into step 9.
Step 7, back end j deleted from the distributed data base, afterwards into step 8.
Step 8, the backup node j ' for enabling back end j, with alternate data node j.
The decision content is reverted to initial value 0 by step 9, arbitration node, and notifies that application node i back end j is effective,
Enter step 10 afterwards;
After step 10, application node i receive the effective message of back end j that arbitration node is sent, regularly to data
Node j sends connection request, to wait back end j to recover connection.
Method using fixed number really of the invention according to node failure, when a certain application node does not connect some back end
Afterwards, connection request is sent to the back end to determine whether to connect the back end by multiple application nodes, and then
Determine whether the back end fails, because each application node belongs to different IP, and then can avoid passing through in the prior art
Same IP is influenceed due to network fluctuation when sending synchronization request to back end to caused by the single IP.The present invention is than existing
Technology more accurately judges that back end is the temporary failure caused by network, or is forever lost caused by hardware reason
Effect.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
God any modification, equivalent substitution and improvements done etc., should be included within the scope of protection of the invention with principle.
Claims (8)
1. a kind of method for determining back end failure, for distributed data base, this method includes:
In all application nodes for accessing the distributed data base, when any one application node does not connect the distribution
During some back end in formula database, the broadcast for not connecting the back end is sent to other application node;
After other application node receives the broadcast, connection request is sent to the back end, to determine whether to connect
The back end;
When the application node quantity that can not connect the back end reaches set threshold value, determine that the back end fails.
2. the method according to claim 1 for determining back end failure, it is characterised in that:
In all application nodes for accessing the distributed data base, any one application node is selected as arbitration node,
The quantity of the application node of the back end can not be connected with statistics.
3. the method according to claim 2 for determining back end failure, it is characterised in that:
A decision content is set in the arbitration node, and the decision content is initialized as 0;
After the other application node sends connection request to the back end, whether the back end will can be connected
Information is sent to the arbitration node;
The arbitration node receives the information that whether can connect the back end that all application nodes are sent, and the arbitration
Node often receives the message that can not connect the back end that an application node is sent, and just does the decision content and once adds 1
Operation;
When the arbitration node received that all application nodes send after whether can connecting the information of the back end:
If the decision content reaches set threshold value, it is determined that the back end fails;
If the decision content is not up to set threshold value, it is determined that the back end is effective.
4. the method according to claim 1 for determining back end failure, it is characterised in that:The threshold value is described in access
The half of all application node quantity of distributed data base.
5. the method according to claim 1 for determining back end failure, it is characterised in that determine that the back end fails
Afterwards, methods described also includes:
The back end is deleted from the distributed data base;
Enable the backup node of the back end.
6. the method according to claim 3 for determining back end failure, it is characterised in that determine that the back end is effective
Afterwards, methods described also includes:
The decision content is reverted into initial value 0;
The application node timing for not connecting the back end sends connection request to the back end, to wait the back end
Recover connection.
7. the method according to claim 1 for determining back end failure, it is characterised in that when any one application node
When not connecting some back end in the distributed data base, the application node is masked to the company of the back end
Connect.
8. the method according to claim 1 for determining back end failure, it is characterised in that each application node belongs to
Different IP.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410254980.8A CN103995901B (en) | 2014-06-10 | 2014-06-10 | A kind of method for determining back end failure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410254980.8A CN103995901B (en) | 2014-06-10 | 2014-06-10 | A kind of method for determining back end failure |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103995901A CN103995901A (en) | 2014-08-20 |
CN103995901B true CN103995901B (en) | 2018-01-12 |
Family
ID=51310066
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410254980.8A Active CN103995901B (en) | 2014-06-10 | 2014-06-10 | A kind of method for determining back end failure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103995901B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105306545B (en) * | 2015-09-28 | 2018-09-07 | 浪潮(北京)电子信息产业有限公司 | A kind of method and system of the external service node Takeover of cluster |
CN105975212A (en) * | 2016-04-29 | 2016-09-28 | 深圳市永兴元科技有限公司 | Failure detection processing method and device for distributed data system |
CN108616566B (en) * | 2018-03-14 | 2021-02-23 | 华为技术有限公司 | Main selection method of raft distributed system, related equipment and system |
CN112860799A (en) * | 2021-02-22 | 2021-05-28 | 浪潮云信息技术股份公司 | Management method for data synchronization of distributed database |
CN113783735A (en) * | 2021-09-24 | 2021-12-10 | 小红书科技有限公司 | Method, device, equipment and medium for identifying fault node in Redis cluster |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102231681A (en) * | 2011-06-27 | 2011-11-02 | 中国建设银行股份有限公司 | High availability cluster computer system and fault treatment method thereof |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120101987A1 (en) * | 2010-10-25 | 2012-04-26 | Paul Allen Bottorff | Distributed database synchronization |
US10103949B2 (en) * | 2012-03-15 | 2018-10-16 | Microsoft Technology Licensing, Llc | Count tracking in distributed environments |
US9239749B2 (en) * | 2012-05-04 | 2016-01-19 | Paraccel Llc | Network fault detection and reconfiguration |
CN102882792B (en) * | 2012-06-20 | 2015-05-13 | 杜小勇 | Method for simplifying internet propagation path diagram |
-
2014
- 2014-06-10 CN CN201410254980.8A patent/CN103995901B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102231681A (en) * | 2011-06-27 | 2011-11-02 | 中国建设银行股份有限公司 | High availability cluster computer system and fault treatment method thereof |
Also Published As
Publication number | Publication date |
---|---|
CN103995901A (en) | 2014-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103995901B (en) | A kind of method for determining back end failure | |
CN101772918B (en) | Operation, administration and maintenance (OAM) for chains of services | |
CN103338243B (en) | The data cached update method and system of Web node | |
CN107769943B (en) | Method and equipment for switching main and standby clusters | |
US20200099604A1 (en) | Method and device for fingerprint based status detection in a distributed processing system | |
CN110149220A (en) | A kind of method and device managing data transmission channel | |
CN106294357A (en) | Data processing method and stream calculation system | |
CN104579853A (en) | Method for network testing of server cluster system | |
WO2014166265A1 (en) | Method, terminal, cache server and system for updating webpage data | |
KR20190020105A (en) | Method and device for distributing streaming data | |
CN106959820B (en) | Data extraction method and system | |
US20140310372A1 (en) | Method, terminal, cache server and system for updating webpage data | |
CN104023082A (en) | Method for achieving cluster load balance | |
CN106411629A (en) | Method used for monitoring state of CDN node and equipment thereof | |
CN104935481A (en) | Data recovery method based on redundancy mechanism in distributed storage | |
US20170351560A1 (en) | Software failure impact and selection system | |
CN105208058A (en) | Information exchange system based on web session sharing | |
CN104065508A (en) | Application service health examination method, device and system | |
CN109739527A (en) | A kind of method, apparatus, server and the storage medium of the publication of client gray scale | |
CN111181800A (en) | Test data processing method and device, electronic equipment and storage medium | |
WO2017012460A1 (en) | Method and apparatus for detecting failure of random memory, and processor | |
CN104038366B (en) | Clustered node abatement detecting method and system | |
US11341842B2 (en) | Metering data management system and computer readable recording medium | |
CN101505241B (en) | Method and apparatus for generating test instances | |
CN111565133A (en) | Private line switching method and device, electronic equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |