CN106209447A

CN106209447A - The fault handling method of distributed caching and device

Info

Publication number: CN106209447A
Application number: CN201610529541.2A
Authority: CN
Inventors: 雷亚武
Original assignee: SHENZHEN CHUANGMENG TIANDI TECHNOLOGY CO LTD
Current assignee: SHENZHEN CHUANGMENG TIANDI TECHNOLOGY CO LTD
Priority date: 2016-07-07
Filing date: 2016-07-07
Publication date: 2016-12-07
Anticipated expiration: 2036-07-07
Also published as: CN106209447B

Abstract

The disclosure discloses fault handling method and the device of a kind of distributed caching.Described method includes: the master cache example run in monitoring distributed caching, obtain the master cache example of fault, by the master cache example substituting described fault from caching example of described master cache example, corresponding renewal is described from the master slave mode between caching example and the master cache example of fault, monitor the renewal of described master slave mode, according to the address of service carrying out cache data access in the renewal amendment proxy configurations of described master slave mode.The fault handling method of above-mentioned distributed caching and device can be to automatically switching over the master cache example of fault, it is achieved the high availability of caching example, improve the ability to ward off risks of redis.

Description

The fault handling method of distributed caching and device

Technical field

It relates to distributed caching technical field, particularly to fault handling method and the dress of a kind of distributed caching Put.

Background technology

Along with the increase day by day of internet traffic, when separate unit caching server faces large-scale data access, often result in Overload and cause the defect of too high operating lag, existing solution mostly uses distributed caching technology real Now large-scale data buffer storage and access.Distributed caching technology passes through concordance hash algorithm by the distribution of data relative equilibrium In multiple caching servers, and the storage system of redis(key-value type) store system as a kind of distributed caching, by In high efficiency synchronous and the shirtsleeve operation order of data, it is widely used in a variety of applications.

At present, redis is typically equiped with master cache server and from caching server.Redis in master cache server Caching example is mainly responsible for read-write operation, and the redis caching example from caching server is only to by master cache server The data of middle redis caching example read-write carry out backup operation, and this also exists bigger pressure for master cache server.

When master cache server delays machine because of fault, it is impossible to realize the automatic switchover of principal and subordinate's caching server, can only rely on Manual intervention, the most then causes business to stagnate when attendant's special circumstances such as, thus cannot realize redis caching The high availability of example, greatly reduces the ability to ward off risks of redis.

Summary of the invention

In order to solve cannot realize present in correlation technique the high availability of redis caching example, redis caching anti- The problem that risk ability is relatively low, present disclose provides fault handling method and the device of a kind of distributed caching.

The fault handling method of a kind of distributed caching, it is characterised in that described method includes:

The master cache example run in monitoring distributed caching, obtains the master cache example of fault；

By the master cache example substituting described fault from caching example of described master cache example, corresponding renewal is described real from caching Master slave mode between the master cache example of example and fault；

Monitor the renewal of described master slave mode, revise in proxy configurations according to the renewal of described master slave mode and carry out data cached visit The address of service asked.

The fault treating apparatus of a kind of distributed caching, it is characterised in that described device includes:

Failure monitoring module, the master cache example run in monitoring distributed caching, obtain the master cache example of fault；

Fault processing module, for by the master cache example substituting described fault from caching example of described master cache example, phase Should update described from the master slave mode between caching example and the master cache example of fault；

Proxy configurations modified module, for monitoring the renewal of described master slave mode, revises generation according to the renewal of described master slave mode Reason configuration carries out the address of service of cache data access.

Embodiment of the disclosure that the technical scheme of offer can include following beneficial effect:

In the operation of distributed caching, the master cache example that monitoring runs, obtain the master cache example of fault, will be from caching example Substituting the master cache example of fault, corresponding renewal, from the master slave mode between caching example and the master cache example of fault, is monitored The renewal of master slave mode, revises the address of service carrying out cache data access in proxy configurations according to the renewal of master slave mode, from And follow-up carried out digital independent is realized by amended address of service, even and if then master cache realization appearance Data cached read-write also will not be impacted by fault, when master cache example breaks down, automatically will replace from caching example The master cache example that generation breaks down, it is achieved that the high availability of caching example, substantially increases the ability to ward off risks of redis.

It should be appreciated that it is only exemplary that above general description and details hereinafter describe, can not be limited this Open.

Accompanying drawing explanation

Accompanying drawing herein is merged in description and constitutes the part of this specification, it is shown that meet the enforcement of the present invention Example, and in description together for explaining the principle of the present invention.

Fig. 1 is the flow chart of the fault handling method according to a kind of distributed caching shown in an exemplary embodiment；

Fig. 2 is the master cache example run in the monitoring distributed caching of Fig. 1 correspondence embodiment, obtains the master cache example of fault The flow chart of step；

Fig. 3 is the triggering fault handling operation of Fig. 1 correspondence embodiment, makes the master substituting fault from caching example of master cache example Caching example, the corresponding flow chart updated from the master slave mode step between caching example and the master cache example of fault；

Fig. 4 is selecting from the flow chart caching case step from caching example information according to described of Fig. 3 correspondence embodiment；

Fig. 5 be Fig. 4 correspondence embodiment described normally from caching example collection, the selected flow process from caching case step Figure；

Fig. 6 is a concrete application scenarios figure of the troubleshooting of distributed caching；

Fig. 7 is the block diagram of the fault treating apparatus according to a kind of distributed caching shown in an exemplary embodiment；

Fig. 8 is the block diagram of the failure monitoring module shown in Fig. 7 correspondence embodiment；

Fig. 9 is the block diagram of the fault processing module shown in Fig. 7 correspondence embodiment；

Figure 10 is the block diagram from the selected submodule of caching shown in Fig. 9 correspondence embodiment；

Figure 11 is the block diagram of the selected unit shown in Figure 10 correspondence embodiment.

Detailed description of the invention

Here in detail exemplary embodiment will be performed explanation, its example represents in the accompanying drawings.Explained below relates to During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represents same or analogous key element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the present invention.On the contrary, they are only with the most appended The example of the apparatus and method that some aspects that described in detail in claims, the present invention are consistent.

Fig. 1 is the flow chart of the fault handling method according to a kind of distributed caching shown in an exemplary embodiment.As Shown in Fig. 1, the fault handling method of this distributed caching may comprise steps of.

In step s 110, the master cache example run in monitoring distributed caching, obtain the master cache example of fault.

Distributed caching is deployed in machine, and the master cache being realized distributed caching by the example run in machine is real Example.In one exemplary embodiment, the machine of indication is caching server, and one or more caching server is formed in order to reality The server framework of existing data buffer storage.

The redis process that master cache example is operate in caching server.Realized caching number by master cache example According to read-write operation.

In the running of master cache example, this master cache example may be caused out of service because there is fault, and then The data cached read-write operation that impact is corresponding.Now, the principal and subordinate by being deployed in machine monitors sentry to caching server In all of master cache example be monitored, automatically safeguard when master cache example breaks down, in order to data cached Read-write is smoothed out.

It is to monitor the status data of sentry according to principal and subordinate to be monitored each master cache example that principal and subordinate monitors sentry, state Data comprise each master cache example information and correspondence from caching example information.According to status data, principal and subordinate monitors sentry Master cache example is monitored.

In the step s 120, the master cache example substituting fault from caching example of master cache example, corresponding renewal postpones Deposit the master slave mode between the master cache example of example and fault.

In caching server, also run corresponding with master cache example from caching example, from caching example for master The data of caching instance processes carry out backup operation.

When certain master cache example breaks down, principal and subordinate monitors sentry will make the postponing of master cache example of this fault Deposit example substitute this fault from caching example, will from caching example be changed to new master cache example, the master of fault is delayed Deposit example and be changed to from caching example, update principal and subordinate therewith and monitor the status data of sentry.

Such as, in caching server, master cache example is that caching example A, caching example A ground is real for caching from caching example Example a, when monitoring caching example A and breaking down, principal and subordinate monitors sentry will make caching example a substitute caching example A, will be slow Deposit example a and be changed to master cache example by from caching example, caching example A is changed to from caching example by master cache example.

In step s 130, monitor the renewal of master slave mode, revise in proxy configurations according to the renewal of master slave mode and carry out The address of service of cache data access.

Firstly the need of illustrating, the caching export agent storing proxy configurations is deployed in a machine, should Machine is using as the interface carrying out data interaction with caching server.

Proxy configurations is the configuration file in caching export agent, and the address of service in proxy configurations is for pointing to buffer service Address data cached in device, according to the address of service in proxy configurations, it is achieved to data cached read-write operation.

Further, deploying agent monitors client the most in the machine, agent monitors client is used for carrying out principal and subordinate's prison The monitoring of control sentry and the renewal of proxy configurations.

Concrete, principal and subordinate monitors after sentry's master slave mode to master cache example with from caching example changes, agency Monitor client listens to principal and subordinate and monitors the change operation of sentry, will correspondingly revise in proxy configurations and carry out data cached visit The address of service asked, and then carry out data cached read-write operation by the master cache example of new address of service sensing.

By method as above, when master cache example breaks down, it is possible to automatically use the real from caching of correspondence Example substitutes the master cache example of this fault, it is achieved that the high availability of caching example, substantially increases the anti-risk energy of redis Power.

Fig. 2 is the description according to the details to step S110 shown in an exemplary embodiment.This step S110 can include with Lower step.

In step S111, send information request according to preset time interval to master cache example.

Information request is the request signal that principal and subordinate monitors that sentry sends to master cache example, and information request is used for obtaining main delaying Deposit the configuration information of example.After master cache example receives the information request that principal and subordinate monitors sentry's transmission, will be to this information request Responding, monitor sentry to principal and subordinate and reply, reply content includes the configuration information of master cache example self and correspondence From caching example information.

Time interval is set in advance, and such as, time interval is set in advance as 10 seconds, then principal and subordinate monitors sentry every 10 Second sends information request to master cache example.

In step S112, receive the reply to information request of the master cache example, according to replying the master cache identifying fault Example.

After properly functioning master cache example receives information request, this information request is replied, reply content bag Include the configuration information of this master cache example self and corresponding from caching example information.Principal and subordinate monitors sentry according to from caching example Information, updates the status data of self.

Principal and subordinate monitors sentry according to the master cache example reply situation to information request, whether master cache example is existed therefore Barrier is identified.

Preferably, for a master cache example, there is multiple principal and subordinate to monitor sentry and master cache example is monitored, Duo Gezhu Information request is sent to master cache example from monitoring sentry.

When one of them principal and subordinate monitors the reply that sentry is not received by certain master cache example, will be inquired other Principal and subordinate monitors sentry, monitors, without the principal and subordinate receiving the reply of certain master cache example, the quantity that sentry's quantity reaches default Time, just will be considered that this master cache example breaks down.By that analogy, the prison of multiple master cache example is just achieved by this process Control.

Such as, the principal and subordinate being monitored master cache example monitors sentry 3, and respectively principal and subordinate monitors sentry 1, principal and subordinate Monitoring sentry 2 and principal and subordinate monitor sentry 3.When having 2 or above principal and subordinate monitors sentry and is not received by returning of master cache example Time multiple, it be considered as this master cache example and break down.Principal and subordinate monitors sentry 1, principal and subordinate monitors sentry 2 and principal and subordinate monitors sentry 3 points Do not send information request to master cache example A, when principal and subordinate monitors the reply that sentry 2 is not received by master cache example A, to principal and subordinate Monitoring sentry 1 and principal and subordinate monitor sentry 3 and inquire.Finally finding, only principal and subordinate monitors sentry 3 and receives caching example A's Replying, principal and subordinate monitors sentry 1 and principal and subordinate monitors sentry 2 and is not received by the reply of master cache example A, then it is assumed that master cache example A breaks down.

Can also be according to interior during Preset Time, master cache example does not monitor sentry to principal and subordinate and replys, then it is assumed that There is fault in master cache example, it is also possible to is other failure criterion.

Such as, principal and subordinate monitors what master cache example A was monitored by sentry, and principal and subordinate monitors sentry every 10 seconds to master cache Example A sends information request, if more than the reply being not received by master cache example A for 30 seconds, then it is assumed that master cache example A occurs Fault.

By method as above, send information request to each master cache example, please to information according to master cache example The reply asked, identifies the master cache example of fault automatically, provides conveniently for automatically carrying out the switching of principal and subordinate's example.

Fig. 3 is the description according to the details to step S120 shown in an exemplary embodiment.This step S120 can be wrapped Include following steps.

In step S121, be retrieved as fault master cache example preset from caching example information.

Master cache example is to be obtained the reply content of information request by master cache example from caching example information Take.

Master cache example is in the reply content of information request, including the configuration information of this master cache example self, and Corresponding from caching example information.

In step S122, according to selected from caching example from caching example information.

One master cache example corresponding from caching example can be one, it is also possible to be multiple.

When master cache example only one of which corresponding from caching example time, directly determining should be from caching example, in order to Carry out principal and subordinate and cache example change；When a master cache example exist multiple correspondence from caching example time, need to select from caching Example, in order to carry out principal and subordinate and cache example change.

In step S123, by the selected master cache example substituting fault from caching example, the most newly selected from Master slave mode between caching example and the master cache example of fault.

Using selected from caching example as new master cache example, and the master cache example of fault is as real from caching Example.After trouble shooting is resumed operation, the master cache example of fault as new master cache example from caching example number According to backup.

Such as, master cache example is caching example A, is caching example a from caching example, when caching example A breaks down And carrying out after principal and subordinate caches example change, caching example a becomes new master cache example, and caches example A and become caching example a From caching example, after at caching, example A trouble shooting resumes operation, as caching example A from caching example, perform as slow Deposit the backup operation of the data that example a processes.

By method as above, according to the reply content of master cache example, can automatically obtain real with this master cache Example corresponding from caching example information, and when master cache example breaks down, the selected master from caching example with this fault delays Deposit example and carry out master-slave swap, it is achieved that the high availability of caching example, substantially increase the ability to ward off risks of redis.

Fig. 4 is the description according to the details to step S122 shown in an exemplary embodiment.This step 122 can include Following steps.

In step S1221, according to from caching example information determine the master cache example of fault corresponding from caching example.

Master cache example according to fault from caching example information, get corresponding with the master cache example of this fault From caching example.

When according to from caching example information, corresponding when caching example only one of which, using this from caching example as replacing Change fault master cache example from caching example；It is when according to from caching example information, corresponding when caching example has multiple, Will be multiple selected from caching example from caching example from this, in order to the master cache example of fault is replaced.

In step S1222, send information request to from caching example.

Principal and subordinate monitor sentry's master cache example to fault from caching example send information request, information request is used for obtaining Take the reply content that from caching example, principal and subordinate is monitored sentry, in order to know the job information from caching example.

In step S1223, receive from the caching example reply to information request, arranging from caching example according to replying Except abnormal from caching example, formed normally from caching example collection.

Abnormal from caching example can be break down from caching example.Principal and subordinate monitors sentry's master cache to fault The multiple of example send information request from caching example, from caching example, principal and subordinate are monitored sentry and reply, reply content bag Include the configuration information from caching example self and the data message of backup.When receiving certain master from caching example reply content The quantity preset it is not reaching to, then it is assumed that should break down from caching example from monitoring sentry.

Abnormal from caching example can also for data backup abnormal from caching example.Receive when principal and subordinate monitors sentry Certain is from the reply content of caching example, gets this when the final updating time of caching instance backup data exceedes default Between scope, then it is assumed that should occur abnormal from caching instance data backup.

Such as, the time range preset is 50 seconds, from the final updating time of caching example 1 Backup Data when current Between be spaced apart 51 seconds, then it is assumed that occur abnormal from caching example 1 Backup Data.

Get rid of abnormal after caching example, the master cache example of fault from caching example, other can be used for principal and subordinate and delays Deposit being formed normally from caching example collection from caching example of example change.

In step S1224, normally selecting from caching example from caching example collection.

Normally from caching example collection from caching example be all up from caching example, can be according to from caching The priority that example is corresponding, selected from caching example, for substituting the master cache example of fault.Can also be according to from caching example ID Digital size, it is also possible to according to other selected mode, selected from caching example.

By method as above, when master cache example breaks down, automatically at the master cache example of this fault Get rid of from caching example occur abnormal from caching example, it is to avoid occur that to be changed to master cache from caching example real by abnormal The situation of example, improves the efficiency carrying out troubleshooting when master cache example breaks down.

Fig. 5 is the description according to the details to step S1224 shown in an exemplary embodiment.This step S1224 is permissible Comprise the following steps.

In step S12241, obtain normally from caching priority corresponding from caching example example collection.

Normally from caching example collection, it is by self-defined setting from the priority that caching example is corresponding, can arrange To respectively carrying out self-defined cis-position sequence from caching example, it is also possible to arrange according to from caching example carry out Backup Data renewal time Between sort, it is also possible to be other priority arrange, it is also possible to priority other to various priority class is ranked up, at this Do not limit.

In step S12242, normally selecting from caching example from caching example collection according to priority.

According to priority arrange, selected ranking the most front from caching example, as principal and subordinate cache example change from caching Example.

By method as above, when master cache example breaks down, according to corresponding priority arrange automatically from The master cache example of fault respectively selectes out one from caching example from caching example, it is to avoid the master cache example broken down is deposited Multiple from caching example time and situations about cannot select, improve and carry out principal and subordinate when master cache example breaks down and cache example The efficiency of switching.

The fault handling method of distributed caching as above is elaborated below in conjunction with a concrete application scenarios.

Concrete, as shown in Figure 6, principal and subordinate monitors sentry 200 and is monitored master cache example, main slow when monitoring certain Deposit example when breaking down, just by the master cache example substituting this fault from caching example of its correspondence, as agent monitors client End 300 detects that principal and subordinate monitors principal and subordinate's example replacement operation of sentry 200, just to the proxy configurations in caching export agent 100 File is updated, and carries out the address of service of cache data access in amendment proxy configurations, even and if then master cache realization appearance Data cached read-write also will not be impacted by fault.Thus when master cache example breaks down, automatically use and select out Replace this fault master cache example from caching example, it is achieved that the high availability of caching example, improve the anti-risk energy of redis Power.

Following for disclosure device embodiment, the fault handling method that may be used for performing this above-mentioned distributed caching is implemented Example.For the details not disclosed in disclosure device embodiment, the fault handling method that refer to disclosure distributed caching is real Execute example.

Fig. 7 is the block diagram of the fault treating apparatus according to a kind of distributed caching shown in an exemplary embodiment, such as Fig. 7 Shown in, the fault treating apparatus of this distributed caching includes but not limited to: failure monitoring module 110, fault processing module 120 with And proxy configurations modified module 130.

Failure monitoring module 110, the master cache example run in monitoring distributed caching, obtain the master cache of fault Example.

Fault processing module 120, by the master cache example substituting fault from caching example of master cache example, updates accordingly Master slave mode between caching example and the master cache example of fault.

Proxy configurations modified module 130, for monitoring the renewal of master slave mode, revises agency according to the renewal of master slave mode Configuration carries out the address of service of cache data access.

In said apparatus, the function of modules and the process that realizes of effect specifically refer to the fault of above-mentioned distributed caching In processing method corresponding step realize process, do not repeat them here.

Optionally, as shown in Figure 8, failure monitoring module 110 includes but not limited to: information request sends submodule 111 He Fault verification submodule 112.

Information request sends submodule 111, for sending information request according to preset time interval to master cache example.

Fault Identification submodule 112, for receiving the reply to information request of the master cache example, identifies fault according to replying Master cache example.

Optionally, as it is shown in figure 9, fault processing module 120 includes but not limited to: obtain submodule 121 from caching, postpone Deposit selected submodule 122 and principal and subordinate updates submodule 123.

From caching obtain submodule 121, for be retrieved as fault master cache example set from caching example information.

From the selected submodule 122 of caching, for according to selected from caching example from caching example information.

Principal and subordinate updates submodule 123, for by the selected master cache example substituting fault from caching example, updating accordingly Selected from the master slave mode between caching example and the master cache example of fault.

Optionally, as shown in Figure 10, include but not limited to from the selected submodule 122 of caching: from caching acquiring unit 1221, Request transmitting unit 1222, Abnormality remove unit 1223 and selected unit 1224.

From caching acquiring unit 1221, for according to from caching example information determine fault master cache example from caching Example.

Request transmitting unit 1222, for sending information request to from caching example.

Abnormality remove unit 1223, for receiving from the caching example reply to information request, according to replying from caching Example is got rid of abnormal from caching example, formed normally from caching example collection.

Selected unit 1224, for normally from caching example collection, selecting from caching example.

Optionally, as shown in figure 11, selected unit 1224 includes but not limited to: priority obtain subelement 12241 and from The selected subelement 12242 of caching.

Priority obtains subelement 12241, for obtain normally from caching example collection from corresponding excellent of caching example First level.

From the selected subelement 12242 of caching, it is used for according to priority normally selected from caching from caching example collection Example.

It should be appreciated that the invention is not limited in precision architecture described above and illustrated in the accompanying drawings, and And various modifications and changes can performed without departing from the scope.The scope of the present invention is only limited by appended claim.

Claims

1. the fault handling method of a distributed caching, it is characterised in that described method includes:

Method the most according to claim 1, it is characterised in that the master cache run in described monitoring distributed caching is real Example, the step of the master cache example obtaining fault includes:

Information request is sent to described master cache example according to preset time interval；

Receive the reply to described information request of the described master cache example, according to the described master cache example replying identification fault.

Method the most according to claim 1, it is characterised in that described substituting described master cache example from caching example The master cache example of described fault, accordingly from the master slave mode between caching example and the master cache example of fault described in renewal Step includes:

Be retrieved as described fault master cache example preset from caching example information；

According to described selected from caching example from caching example information；

To substitute the master cache example of described fault described in selected from caching example, corresponding update described selected real from caching Master slave mode between the master cache example of example and described fault.

Method the most according to claim 3, it is characterised in that selected from caching from caching example information described in described basis The step of example includes:

According to described from caching example information determine the master cache example of described fault corresponding from caching example；

Information request is sent from caching example to described；

Receive described from the caching example reply to described information request, get rid of from caching example described according to described reply Abnormal from caching example, formed normally from caching example collection；

Normally select from caching example from caching example collection described.

Method the most according to claim 4, it is characterised in that described described normally from caching example collection selected from The step of caching example includes:

Obtain described normally from caching priority corresponding from caching example example collection；

Normally select from caching example from caching example collection described according to described priority.

6. the fault treating apparatus of a distributed caching, it is characterised in that described device includes:

Device the most according to claim 6, it is characterised in that described failure monitoring module includes:

Request sends submodule, for sending information request according to preset time interval to described master cache example；

Fault Identification submodule, for receiving the reply to described information request of the described master cache example, knows according to described reply The master cache example of other fault.

Device the most according to claim 6, it is characterised in that described fault processing module includes:

From caching obtain submodule, for be retrieved as described fault master cache example preset from caching example information；

From the selected submodule of caching, for according to described selected from caching example from caching example information；

Principal and subordinate updates submodule, for the master cache example by substituting described fault described in selected from caching example, the most more New described selected from the master slave mode between caching example and the master cache example of described fault.

Device the most according to claim 8, it is characterised in that described include from the selected submodule of caching:

From caching acquiring unit, for according to described from caching example information determine the master cache example of described fault corresponding from Caching example；

Request transmitting unit, for sending information request to described from caching example；

Abnormality remove unit, described from the caching example reply to described information request for receiving, according to described reply in institute State and get rid of abnormal from caching example from caching example, formed normally from caching example collection；

Selected unit, for normally selecting from caching example described from caching example collection.

Device the most according to claim 9, it is characterised in that described selected unit includes:

Priority obtains subelement, described normally from caching priority corresponding from caching example example collection for obtaining；

Select subelement from caching, be used for according to described priority described normally selected from caching in fact from caching example collection Example.