CN115248751A - Redis cache disaster recovery switching method, device, platform, equipment and storage medium - Google Patents

Redis cache disaster recovery switching method, device, platform, equipment and storage medium Download PDF

Info

Publication number
CN115248751A
CN115248751A CN202110466245.3A CN202110466245A CN115248751A CN 115248751 A CN115248751 A CN 115248751A CN 202110466245 A CN202110466245 A CN 202110466245A CN 115248751 A CN115248751 A CN 115248751A
Authority
CN
China
Prior art keywords
redis
instance
disaster recovery
disaster
switching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110466245.3A
Other languages
Chinese (zh)
Inventor
张丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SF Technology Co Ltd
Original Assignee
SF Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SF Technology Co Ltd filed Critical SF Technology Co Ltd
Priority to CN202110466245.3A priority Critical patent/CN115248751A/en
Publication of CN115248751A publication Critical patent/CN115248751A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2082Data synchronisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The application relates to a Redis cache disaster recovery switching method and device, computer equipment and a storage medium. The method directly determines a failed Redis instance corresponding to the fault notification message when the fault notification message is obtained; then searching a Redis disaster tolerance instance corresponding to the failed Redis instance, wherein the Redis disaster tolerance instance is synchronous data of the failed Redis instance; and carrying out disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance. According to the method and the device, when the Redis instance has a fault and needs disaster recovery and backup switching, the application does not need to modify configuration, the restarting instance can be normally used, the influence and the change caused by disaster recovery and switching are reduced to the minimum, the switching service is ensured to be unaware, and therefore the continuity of the related service of the Redis instance is effectively supported.

Description

Redis cache disaster recovery switching method, device, platform, equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a platform, a device, and a storage medium for disaster recovery switching of a Redis cache.
Background
With the development of computer technology and database technology, a Remote Dictionary Server (Redis) technology appears, and the Redis is a high-performance key-value database. The occurrence of Redis greatly compensates the shortage of key-value storage of the memcached and other databases, and can play a good role in supplementing the relational database in some occasions.
At present, for general single-machine room deployment of Redis instances, when a production machine room breaks down, the Redis instance cannot be recovered in a short time, data of the Redis instances in production is lost, great influence is generated on services, normal service provision for users cannot be guaranteed, and continuity of Redis instance services cannot be guaranteed.
Disclosure of Invention
Therefore, in order to solve the above technical problems, it is necessary to provide a method, an apparatus, a platform, a device, and a storage medium for Redis cache disaster recovery switching, which can ensure the continuity of the Redis instance service.
A Redis cache disaster recovery switching method comprises the following steps:
acquiring Redis fault notification message;
analyzing the Redis fault notification message, and determining a faulted Redis instance corresponding to the Redis fault notification message;
searching a Redis disaster tolerance instance corresponding to the failed Redis instance, wherein the Redis disaster tolerance instance is synchronous data of the failed Redis instance; and carrying out disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance.
In one embodiment, before acquiring the Redis fault notification message, the method further includes:
acquiring a Redis instance resource delivery request;
feeding back the Redis production instance corresponding to the Redis instance resource delivery request, and creating the Redis disaster tolerance instance corresponding to the Redis production instance.
In one embodiment, after the creating of the Redis disaster recovery instance corresponding to the Redis production instance, the method further includes:
acquiring a consistency check request corresponding to the Redis production instance;
according to the consistency check request, consistency check is carried out on the Redis production instance and the Redis disaster recovery instance corresponding to the Redis production instance, and a consistency check result is obtained;
and updating the Redis disaster recovery instance corresponding to the Redis production instance according to the consistency check result.
In one embodiment, after the creating of the Redis disaster recovery instance corresponding to the Redis production instance, the method further includes:
acquiring a data synchronization request corresponding to the Redis production instance, and searching the Redis disaster recovery instance corresponding to the Redis production instance according to the data synchronization request;
acquiring production node information corresponding to the Redis production instance and disaster tolerance node information of the Redis disaster tolerance instance;
establishing a map mapping relation between the production node and the disaster recovery node;
acquiring database files of the production node and the disaster recovery node;
and performing data full synchronization and data increment synchronization on the Redis disaster recovery instance based on the database file and the map mapping relation.
In one embodiment, before performing disaster recovery switching on the failed Redis instance based on the Redis disaster recovery instance, the method further includes:
performing data verification on the Redis disaster recovery instance;
the performing disaster recovery switching on the failed Redis instance based on the Redis disaster recovery instance comprises:
and carrying out disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance passing through the data verification.
In one embodiment, before performing disaster recovery switching on the failed Redis instance based on the Redis disaster recovery instance, the method further includes:
performing static configuration check and dynamic check on the Redis disaster recovery instance and the failed Redis instance;
the performing disaster recovery switching on the failed Redis instance based on the Redis disaster recovery instance comprises:
and when the static configuration check and the dynamic check pass, carrying out disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance.
In one embodiment, before acquiring the failure notification message, the method further includes:
obtaining disaster tolerance switching efficiency data corresponding to the Redis disaster tolerance instance;
and optimizing the switching scene corresponding to the Redis disaster tolerance example according to the disaster tolerance switching efficiency data.
A Redis cache disaster recovery switching apparatus, the apparatus comprising:
the message acquisition module is used for acquiring the fault notification message;
the fault instance searching module is used for analyzing the Redis fault notification message and determining a faulted Redis instance corresponding to the Redis fault notification message;
the disaster tolerance example searching module is used for searching a Redis disaster tolerance example corresponding to the failed Redis example, wherein the Redis disaster tolerance example is synchronous data of the failed Redis example;
and the disaster recovery switching module is used for carrying out disaster recovery switching on the failed Redis instance based on the Redis disaster recovery instance.
A Redis cache disaster recovery switching platform comprises a disaster recovery management server, a Redis production server and a Redis disaster recovery server, wherein the disaster recovery management server is used for: acquiring a Redis fault notification message sent by the Redis production server; analyzing the Redis fault notification message, and determining a failed Redis instance corresponding to the Redis fault notification message; searching a Redis disaster tolerance instance corresponding to the Redis instance which has failed in the Redis disaster tolerance server, wherein the Redis disaster tolerance instance is synchronous data of the Redis instance which has failed in the Redis disaster tolerance server; and carrying out disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a Redis fault notification message;
analyzing the Redis fault notification message, and determining a failed Redis instance corresponding to the Redis fault notification message;
searching a Redis disaster tolerance instance corresponding to the failed Redis instance, wherein the Redis disaster tolerance instance is synchronous data of the failed Redis instance;
and carrying out disaster tolerance switching management on the failed Redis instance based on the Redis disaster tolerance instance to carry out disaster tolerance switching.
A computer storage medium on which a computer program is stored which, when executed by a processor, performs the steps of:
acquiring a Redis fault notification message;
analyzing the Redis fault notification message, and determining a faulted Redis instance corresponding to the Redis fault notification message;
searching a Redis disaster tolerance instance corresponding to the failed Redis instance, wherein the Redis disaster tolerance instance is synchronous data of the failed Redis instance;
and carrying out disaster tolerance switching management on the failed Redis instance based on the Redis disaster tolerance instance to carry out disaster tolerance switching.
According to the Redis cache disaster recovery switching method, the Redis cache disaster recovery switching device, the computer equipment and the storage medium, the Redis fault notification message is acquired; analyzing the Redis fault notification message, and determining a failed Redis instance corresponding to the Redis fault notification message; searching a Redis disaster tolerance instance corresponding to the failed Redis instance, wherein the Redis disaster tolerance instance is synchronous data of the failed Redis instance; and carrying out disaster tolerance switching management on the failed Redis instance based on the Redis disaster tolerance instance to carry out disaster tolerance switching. According to the method and the device, when the Redis instance fails, the application does not need to modify configuration, disaster recovery switching is directly carried out based on the Redis disaster recovery instance, normal use of the Redis instance can be guaranteed by restarting the instance, influence and change caused by the disaster recovery switching are reduced to the minimum, no perception of switching services is guaranteed, and therefore effective support is provided for continuity of related services of the Redis instance.
Drawings
Fig. 1 is an application scenario diagram of a Redis cache disaster recovery switching method in an embodiment;
fig. 2 is a schematic flow chart of a Redis cache disaster recovery switching method in an embodiment;
FIG. 3 is a flow diagram that illustrates the Redis instance creation step in one embodiment;
FIG. 4 is a flowchart illustrating the consistency checking step in one embodiment;
FIG. 5 is a flow chart illustrating the data synchronization step in one embodiment;
fig. 6 is a block diagram of a Redis cache disaster recovery switching device in an embodiment;
FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The Redis cache disaster recovery switching method provided by the application can be applied to an application environment as shown in FIG. 1. The application environment is a Redis cache disaster-tolerant switching platform, and the platform comprises a disaster-tolerant management server, a Redis production server and a Redis disaster-tolerant server. The disaster recovery management server 102 communicates with a Redis production server 104 in the production room and a Redis disaster recovery server 106 in the disaster recovery room via a network. When a Redis production instance in the Redis production server 102 in the production room fails, a failure notification message may be sent to the disaster recovery management server 102 over the network. The disaster recovery management server 102 acquires a Redis fault notification message; analyzing the Redis fault notification message, and determining a faulted Redis instance corresponding to the Redis fault notification message; searching a Redis disaster tolerance instance corresponding to the failed Redis instance, wherein the Redis disaster tolerance instance is synchronous data of the failed Redis instance; and carrying out disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance. The disaster recovery management server 102 can be implemented by an independent server or a server cluster composed of a plurality of servers.
In an embodiment, as shown in fig. 2, a method for Redis cache disaster recovery switching is provided, which is described by taking the application of the method to the disaster recovery management server 102 in fig. 1 as an example, and includes the following steps:
step 201, obtaining Redis fault notification message.
Step 203, resolving the Redis fault notification message, and determining a failed Redis instance corresponding to the Redis fault notification message.
The fault notification message is a message sent to the disaster recovery management server 102 when a Redis production instance in the production room has a fault, and the message may be sent by a maintenance worker in the production room through the Redis production server 102 or through a device such as an intelligent mobile device from another channel. The failed Redis instance corresponds to a Redis production instance in a production machine room, and when the Redis production instance fails and disaster recovery management needs to be performed, corresponding disaster recovery switching management can be performed through the disaster recovery management server 102, so that continuity of Redis service is guaranteed.
Specifically, when a Redis production instance in the production room fails and becomes a failed Redis instance, in order to ensure continuity provided by the Redis production instance service, a maintainer in the production room may perform disaster recovery switching by sending a corresponding failure notification message to the disaster recovery management server 102.
And step 205, searching a Redis disaster recovery instance corresponding to the failed Redis instance, wherein the Redis disaster recovery instance is synchronous data of the failed Redis instance.
The Redis disaster recovery instances correspond to the Redis production instances one to one, the basic configuration of the Redis production instances is consistent with that of the Redis disaster recovery instances, and the configuration of the Redis production instances is changed along with the development of daily business. And the part of configuration change data can also be synchronized to the Redis disaster recovery instance so as to ensure the availability of the Redis disaster recovery instance.
Specifically, the disaster recovery management server 102 is also connected to a Redis disaster recovery server 104 in the disaster recovery computer room. When a Redis production instance fails, the disaster recovery management server 102 may find the corresponding Redis disaster recovery instance according to the specific information of the failed Redis instance, and perform corresponding disaster recovery switching management based on the Redis disaster recovery instance.
And step 207, performing disaster recovery switching on the failed Redis instance based on the Redis disaster recovery instance.
When the Redis disaster tolerance instance corresponding to the failed Redis instance is found, disaster tolerance switching can be performed on the failed Redis instance. The disaster recovery switching management specifically includes an active switching scenario and a back-switching scenario, which mainly aim at a scenario that a production Redis survives, but the performance of an instance suddenly drops due to a network or a host, or a service needs to introduce traffic from a production machine room into a disaster recovery machine room. At this time, the data of the production environment is not lost, the data can be normally synchronized in the synchronization process, the data can be ensured not to be lost at this time, and the influence of the switching on the service is minimal. The risk is also controllable. The switching process can be simply divided into switching operation generation, disaster tolerance instance inspection, connection string information switching, synchronization process reversal, production disaster tolerance mapping relation updating, application restart and connection of a new Redis instance and service verification. After the active switching is completed, the back-switching operation can be executed, and the disaster recovery machine room instance is switched back to the production machine room. The active switching scene is simple, the partial switching functions are mainly preposed, the data synchronization, the data verification and the synchronous state check are completed in advance, the switching process is simpler, the switching steps are fewer, the risk of the switching process is smaller, and the time effect of the switching process can be greatly reduced. The disaster recovery switching management also comprises a passive switching scene, which mainly aims at the situation that when a disaster-level fault occurs, and the biggest difference between the active switching scene and the back-switching scene is that a production environment example is unavailable at this moment, part of data is lost, and services have obvious influence. The switching process is simplified into the steps of generating switching operation, checking the disaster tolerance example, switching the connection string information, updating the production disaster tolerance mapping relation, restarting and connecting a new Redis example by applying, verifying the service, reducing the Redis synchronous reversal, providing a reverse synchronization function, and supporting the synchronization of the data of the disaster tolerance machine room example to the production machine room example after the production example is recovered. The method and the device avoid the occurrence of abnormity of the disaster recovery computer room example, which leads to the thorough loss of data.
According to the Redis cache disaster recovery switching method, the Redis fault notification message is acquired; analyzing the Redis fault notification message, and determining a faulted Redis instance corresponding to the Redis fault notification message; searching a Redis disaster tolerance instance corresponding to the failed Redis instance, wherein the Redis disaster tolerance instance is synchronous data of the failed Redis instance; and carrying out disaster tolerance switching management on the failed Redis instance based on the Redis disaster tolerance instance to carry out disaster tolerance switching. According to the method and the device, when the Redis instance fails, the application does not need to modify configuration, disaster recovery switching is directly carried out based on the Redis disaster recovery instance, normal use of the Redis instance can be guaranteed by restarting the instance, influence and change caused by the disaster recovery switching are reduced to the minimum, no perception of switching services is guaranteed, and therefore effective support is provided for continuity of related services of the Redis instance.
In one embodiment, as shown in fig. 3, before step 201, the method further includes:
step 302, obtain Redis instance resource delivery request.
And step 304, feeding back the Redis production instance corresponding to the Redis instance resource delivery request, and creating the Redis disaster tolerance instance corresponding to the Redis production instance.
Specifically, when the disaster recovery management server 102 delivers a new Redis instance resource, a Redis production instance and a Redis disaster recovery instance need to be built at the same time, and are built in the production machine room and the disaster recovery machine room respectively. And feeding back the Redis production instance corresponding to the Redis instance resource delivery request. The configuration of the Redis disaster tolerance instance part is consistent with that of the Redis production instance, and the process mainly needs to meet the requirement that information such as memory, connection number, a disk dropping strategy, passwords and connection strings is consistent. And unified connection string information is provided for users. The production disaster recovery application remains consistent with respect to the Redis partial configuration. In another embodiment, for a scene where a production instance already exists but a Redis disaster recovery instance is not created, a disaster recovery construction operation can be provided, specifically, the configuration information of the production instance can be obtained in real time, the Redis disaster recovery instance can be directly created in a disaster recovery computer room, and the production disaster recovery association relationship can be completed. In this embodiment, by creating the Redis disaster recovery instance corresponding to the Redis production instance, when the Redis production instance fails, disaster recovery management can be performed through the Redis disaster recovery instance, so as to ensure continuity of the Redis service.
In one embodiment, as shown in fig. 4, after step 304, the method further includes:
step 401, a consistency check request corresponding to the Redis production instance is obtained.
Step 403, according to the consistency check request, performing consistency check on the Redis production instance and the Redis disaster recovery instance corresponding to the Redis production instance to obtain a consistency check result.
And step 405, updating the Redis disaster recovery instance corresponding to the Redis production instance according to the consistency check result.
The consistency check request is used for requesting to check the configuration consistency of the Redis production instance and the Redis disaster tolerance instance corresponding to the Redis production instance.
Specifically, after the Redis production instance is delivered, because the Redis production instance is subjected to configuration change along with the development of daily business, and is not synchronously changed to the configuration of the Redis disaster recovery instance in time, the production/disaster recovery configuration is inconsistent, the disaster recovery instance cannot meet the original design, and the switching is not supported. Therefore, consistency check needs to be performed through the consistency check request, a consistency check result is obtained, and the availability of the Redis disaster recovery instance is guaranteed. The consistency check may be periodically performed by setting a check period in addition to being triggered by a request. The content of the consistency check mainly comprises: 1) Configuration consistency: the production/disaster recovery instance configurations are consistent through the instance of the production/disaster recovery instance delivery or the instance of the disaster recovery and construction function delivery. Specifically, the management of cache disaster recovery switching can be performed through a Redis cache disaster recovery switching platform, and after the configuration of a Redis production instance is changed and the platform is operated, the platform can automatically synchronize the configuration change information of the production environment, so that the configuration information of the production disaster recovery instance is kept consistent; for a manual operation scene, when the configuration change information of the production environment cannot be synchronized to the disaster tolerance environment in time, the inconsistent configuration items are automatically found through the configuration polling function of the platform, and the configuration of the production environment is automatically synchronized to the disaster tolerance environment. 2) Dynamic configuration consistency: and (5) checking the connectivity of the main and standby production and disaster recovery nodes, wherein the number of the production and disaster recovery nodes is consistent. In this embodiment, the validity of the Redis disaster recovery instance can be effectively ensured through operations such as data consistency check and configuration update before disaster recovery processing.
In one embodiment, as shown in fig. 5, after step 304, the method further includes:
step 502, acquiring a data synchronization request corresponding to the Redis production instance, and searching for the Redis disaster recovery instance corresponding to the Redis production instance according to the data synchronization request.
Step 504, obtaining production node information corresponding to the Redis production instance and disaster recovery node information of the Redis disaster recovery instance.
Step 506, a map mapping relationship between the production node and the disaster recovery node is established.
Step 508, obtaining database files of the production node and the disaster recovery node.
And step 510, carrying out data full synchronization and data increment synchronization on the Redis disaster recovery instance based on the database file and the map mapping relation.
The data synchronization request is used for requesting the disaster recovery management server 102 to perform data synchronization processing on the Redis production instance and the Redis disaster recovery instance, because in a production environment, the configuration of the Redis production instance is changed along with the development of daily business, if the configuration of the Redis disaster recovery instance is not changed in time, the production/disaster recovery configuration is inconsistent, and the Redis disaster recovery instance cannot meet the original design, which affects disaster recovery switching. A node (node) is a constituent unit of a Redis cluster, and one Redis cluster is generally composed of a plurality of nodes (nodes). And for map mappings, a map container is a mapping of a key-value to a key-value. The internal implementation is a red-black tree (a kind of balanced tree) with key as key. The key and value of map can be of any data type (including int, double, long input error or special usage, string, struct, vector, queue, etc.). By establishing map mapping relation between the production node and the disaster recovery node, more effective numerical value synchronization operation can be carried out. And the database file is specifically the rdb file.
Specifically, after the Redis disaster recovery instance is created, it is not enough that the Redis disaster recovery instance is only consistent with the production environment configuration, and meanwhile, data synchronization between the Redis production instance and the Redis disaster recovery instance is also required. The problems to be solved during data synchronization include the problem of data synchronization compatibility of different types, synchronous support of different Redis data types, the problem of synchronous timeliness and the problem of synchronous performance. The data synchronization process specifically includes: acquiring production node information corresponding to a Redis production instance and disaster tolerance node information of the Redis disaster tolerance instance; establishing a map mapping relation between a production node and a disaster recovery node; acquiring database files of a production node and a disaster recovery node; and performing data full synchronization and data increment synchronization on the Redis disaster recovery instance based on the database file and the map mapping relation.
In a specific application scenario, the types of the Redis instance are Redis2.8.19 sentinel mode and Redis5.0.7 cluster mode. For the sentinel mode and the cluster mode, the types of the synchronous examples are distinguished in the synchronous configuration files, for the sentinel mode, the production node information and the disaster recovery node information are obtained through the sentinel, the map mapping relation between the production node and the disaster recovery node is established, and the production node and the disaster recovery node are connected respectively. And acquiring the rdb file, analyzing the rdb database file, carrying out one-time full synchronization, and after the full synchronization is finished, entering an incremental synchronization state to synchronize data in real time. For the cluster mode, the information of the node is obtained by instantiating a certain node, and the synchronization process is similar, but it should be noted that: in the process of synchronization, slot migration cannot be performed, otherwise, this synchronization fails. In the update version of Redis, data types include Geo, stream and the like in addition to basic String, set, hash, signed Set and the like, a large number of PINGs, PONG (other commands need to be filtered) and the like are also in the process of Redis operation, and for a synchronization tool, the problem that the synchronization tool needs to solve compatibility is compatible, currently, redis synchronization firstly analyzes data information of a production node and analyzes the data information, so that an important node is needed for the problem of data compatibility, and the problem that an analysis example causes synchronization program abnormity is avoided. For scenarios with more Redis production instance data, the time required for data synchronization increases. For scenes with high requirements on synchronous timeliness, the problem of synchronous timeliness needs to be solved urgently. Through coroutine multiplexing, the efficiency of data analysis is improved, the time of the full-scale synchronization process is reduced, the cpu core number and the network card type of the host are deployed according to the synchronization process, the number of rdb files and the coroutine number are set to be acquired and analyzed simultaneously under the condition that the stability of the host is guaranteed, the synchronization timeliness is improved to the maximum degree, and the problem of data loss caused by slow data synchronization in an extreme scene is reduced as much as possible. In this embodiment, the validity of the Redis disaster recovery instance can be effectively ensured by data synchronization processing before disaster recovery processing.
In one embodiment, before step 201, the method further includes: performing data verification on the Redis disaster tolerance example, wherein the data verification comprises data compatibility verification, mode compatibility verification and timeliness performance verification; step 205 comprises: and carrying out disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance passing the data verification.
Specifically, for a key-value database of Redis, it is difficult to check data, and the key of Redis can set expiration time, and Redis adopts an inert deletion policy for deleted keys that have expired, which means that the problem of determining whether the produced disaster tolerance data is consistent by simply producing the number of key values of disaster tolerance instances is inaccurate, because the keys of disaster tolerance instances are not accessed by an application, there is a scene where part of failed keys are not deleted, which causes the number of produced disaster tolerance keys to be inconsistent, and actually the unexpired keys are already consistent scenes. At this time, the problems to be solved by data verification are several, namely the data compatibility problem, the mode compatibility problem and the timeliness performance problem. The data compatibility problem is solved by supporting multiple data types of Redis, the data comparing process is actually to obtain key-value, then to compare the value, and when obtaining the data value, multiple abnormal captures are needed to ensure the stability of the data comparing program. For the problem of mode compatibility, mainly aiming at the inconsistency of key acquisition modes of a cluster mode and a sentinel mode, compatibility needs to be performed to a certain degree. The most important is the problem of timeliness, when the scenes are synchronously checked, except for the time of daily data comparison and use, the other use scenes are subjected to data verification before disaster recovery switching operation, the problem of data loss is reduced in the switching process, at the moment, the requirement on timeliness of data is high, and the time consumed by the whole Redis switching is directly influenced by the time of data checking. Therefore, multiple fragments can be compared simultaneously in a multithreading (correct or incorrect) mode, and the time effectiveness of data synchronization is improved. In this embodiment, through data verification, the occurrence of data loss can be effectively reduced, and the availability of the Redis disaster recovery instance is ensured.
In one embodiment, before performing disaster recovery switching on a failed Redis instance based on the Redis disaster recovery instance, the method further includes: static configuration inspection and dynamic inspection are carried out on the Redis disaster recovery instance and the failed Redis instance; step 205 comprises: and when the static configuration check and the dynamic check are passed, performing disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance.
Specifically, when the Redis production instance needs to be subjected to disaster recovery switching through the Redis disaster recovery instance, in order to ensure that no problem occurs in the switching process, after the switching is completed, the application can normally run. The instances of handover need to be checked in advance. The dimension of the inspection is mainly divided into two aspects of static configuration inspection and dynamic inspection. The static configuration inspection is mainly to carry out the advanced inspection on the configuration files, configuration items and node quantity of the production disaster tolerance examples. For dynamic checking, mainly status checking of a Redis production instance and a Redis disaster recovery instance, including checking of memory use conditions, actual connection number conditions, connectivity checking of a main node and a standby node, checking of synchronization instance status, checking whether dns analysis is normal, checking whether synchronization is correctly deployed, and the like, the instance is checked in multiple dimensions, potential problems are found in advance, and influences on services caused by failure in a switching process are avoided. In this embodiment, through data checking, the situation of potential problems in the switching process can be effectively reduced, and the availability of the Redis disaster recovery instance is ensured.
In one embodiment, before step 201, the method further includes: obtaining disaster recovery switching efficiency data corresponding to a Redis disaster recovery instance; and optimizing the switching scene corresponding to the Redis disaster tolerance example according to the disaster tolerance switching efficiency data.
Specifically, in a passive handover scenario in the disaster recovery handover management, there are many handover instances involved, and at this time, the concurrency requirement for the backend handover of the disaster recovery management server 102 is high. Therefore, analog simulation tests are required to be performed on different switched scenes to obtain disaster recovery switching performance data. The process mainly aims at a plurality of scenes of interface inspection, active switching, passive switching and back switching before switching, and optimizes the problems found in the process of pressure measurement. Specifically, in one embodiment, 1000 checks before batch handover are supported after performing optimization through disaster recovery handover performance data, the average response time is 1s, and when 200 operations such as active handover and passive handover are performed concurrently, the average response time is 1min. And the concurrency can be further promoted by adding nodes under the condition of ensuring the timeliness. In this embodiment, the disaster recovery switching efficiency data corresponding to the Redis disaster recovery instance is obtained to perform optimization, so that the concurrence processing efficiency of the disaster recovery switching management process can be effectively ensured.
It should be understood that although the various steps in the flow charts of fig. 2-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least some of the steps in fig. 2-5 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 6, there is provided a Redis cache disaster recovery switching apparatus, including:
the message obtaining module 601 is configured to obtain a fault notification message.
And the fault instance searching module 603 is configured to parse the Redis fault notification message, and determine a faulty Redis instance corresponding to the Redis fault notification message.
The disaster tolerance example searching module 605 is configured to search a Redis disaster tolerance example corresponding to the failed Redis example, where the Redis disaster tolerance example is synchronous data of the failed Redis example.
And a disaster recovery switching module 607, configured to perform disaster recovery switching on the failed Redis instance based on the Redis disaster recovery instance.
In one embodiment, the system further comprises an instance creation module, configured to: acquiring a Redis instance resource delivery request; feeding back a Redis production instance corresponding to the Redis instance resource delivery request, and creating a Redis disaster tolerance instance corresponding to the Redis production instance.
In one embodiment, the system further comprises a consistency check module, configured to: acquiring a consistency check request corresponding to a Redis production instance; according to the consistency check request, consistency check is carried out on the Redis production instance and the Redis disaster recovery instance corresponding to the Redis production instance, and a consistency check result is obtained; and updating the Redis disaster recovery instance corresponding to the Redis production instance according to the consistency check result.
In one embodiment, the data synchronization module is further included to: acquiring a data synchronization request corresponding to a Redis production instance, and searching the Redis disaster recovery instance corresponding to the Redis production instance according to the data synchronization request; acquiring production node information corresponding to a Redis production instance and disaster tolerance node information of the Redis disaster tolerance instance; establishing a map mapping relation between a production node and a disaster recovery node; acquiring database files of a production node and a disaster recovery node; and performing data full synchronization and data increment synchronization on the Redis disaster recovery instance based on the database file and the map mapping relation.
In one embodiment, the system further comprises a data checking module, configured to: performing data verification on the Redis disaster tolerance example, wherein the data verification comprises data compatibility verification, mode compatibility verification and timeliness performance verification;
the disaster recovery switching module 605 is specifically configured to: and carrying out disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance passing the data verification.
In one embodiment, the system further comprises an instance checking module, configured to: static configuration inspection and dynamic inspection are carried out on the Redis disaster recovery instance and the failed Redis instance; the disaster recovery switching module 605 is specifically configured to: and when the static configuration check and the dynamic check are passed, performing disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance.
In one embodiment, the system further includes a scene optimization module, specifically configured to: acquiring disaster tolerance switching efficiency data corresponding to a Redis disaster tolerance example; and optimizing the switching scene corresponding to the Redis disaster tolerance instance according to the disaster tolerance switching efficiency data.
For specific limitations of the Redis cache disaster recovery switching device, reference may be made to the above limitations of the Redis cache disaster recovery switching method, and details are not described herein again. All or part of each module in the Redis cache disaster recovery switching device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a Redis cache disaster recovery switching platform is provided, where the platform includes a disaster recovery management server, a Redis production server, and a Redis disaster recovery server, and the disaster recovery management server is configured to: acquiring a Redis fault notification message sent by a Redis production server; analyzing the Redis fault notification message, and determining a failed Redis instance corresponding to the Redis fault notification message; searching a Redis disaster recovery instance corresponding to the failed Redis instance in a Redis disaster recovery server, wherein the Redis disaster recovery instance is synchronous data of the failed Redis instance; and carrying out disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance.
In one embodiment, a computer device is provided, which may be a disaster recovery switching management server for disaster recovery switching management, and the disaster recovery switching management server may be connected to a Redis production server in a production environment and a Redis disaster recovery server in a disaster recovery environment. The method is deployed in a disaster recovery computer room and used as a disaster recovery master node and a disaster recovery backup node, when a Redis production server in a production environment fails, redis cache disaster recovery switching can be performed based on the Redis disaster recovery server, and continuity of related services of Redis instances is guaranteed. The internal structure thereof may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The database of the computer device is used for storing disaster recovery management data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a Redis cache disaster recovery switching method.
It will be appreciated by those skilled in the art that the configuration shown in fig. 7 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program:
acquiring a Redis fault notification message;
analyzing the Redis fault notification message, and determining a failed Redis instance corresponding to the Redis fault notification message;
searching a Redis disaster tolerance instance corresponding to the failed Redis instance, wherein the Redis disaster tolerance instance is synchronous data of the failed Redis instance; and carrying out disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance.
In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a Redis instance resource delivery request; feeding back a Redis production instance corresponding to the Redis instance resource delivery request, and creating a Redis disaster tolerance instance corresponding to the Redis production instance.
In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a consistency check request corresponding to a Redis production instance; according to the consistency check request, consistency check is carried out on the Redis production instance and the Redis disaster recovery instance corresponding to the Redis production instance, and a consistency check result is obtained; and updating the Redis disaster recovery instance corresponding to the Redis production instance according to the consistency check result.
In one embodiment, the processor when executing the computer program further performs the steps of: acquiring a data synchronization request corresponding to a Redis production instance, and searching a Redis disaster tolerance instance corresponding to the Redis production instance according to the data synchronization request; acquiring production node information corresponding to a Redis production instance and disaster tolerance node information of the Redis disaster tolerance instance; establishing a map mapping relation between a production node and a disaster recovery node; acquiring database files of a production node and a disaster recovery node; and performing data full synchronization and data increment synchronization on the Redis disaster recovery instance based on the database file and the map mapping relation.
In one embodiment, the processor, when executing the computer program, further performs the steps of: and carrying out data check on the Redis disaster recovery instance.
In one embodiment, the processor, when executing the computer program, further performs the steps of: and performing static configuration check and dynamic check on the Redis disaster recovery instance and the failed Redis instance.
In one embodiment, the processor when executing the computer program further performs the steps of: acquiring disaster tolerance switching efficiency data corresponding to a Redis disaster tolerance example; and optimizing the switching scene corresponding to the Redis disaster tolerance example according to the disaster tolerance switching efficiency data.
In one embodiment, a computer storage medium is provided, having a computer program stored thereon, the computer program, when executed by a processor, implementing the steps of:
acquiring Redis fault notification message;
analyzing the Redis fault notification message, and determining a faulted Redis instance corresponding to the Redis fault notification message;
searching a Redis disaster tolerance instance corresponding to the failed Redis instance, wherein the Redis disaster tolerance instance is synchronous data of the failed Redis instance;
and carrying out disaster tolerance switching management on the failed Redis instance based on the Redis disaster tolerance instance to carry out disaster tolerance switching.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a Redis instance resource delivery request; feeding back the Redis production instance corresponding to the Redis instance resource delivery request, and creating the Redis disaster tolerance instance corresponding to the Redis production instance.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a consistency check request corresponding to a Redis production instance; according to the consistency check request, consistency check is carried out on the Redis production instance and the Redis disaster recovery instance corresponding to the Redis production instance, and a consistency check result is obtained; and updating the Redis disaster recovery instance corresponding to the Redis production instance according to the consistency check result.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a data synchronization request corresponding to a Redis production instance, and searching the Redis disaster recovery instance corresponding to the Redis production instance according to the data synchronization request; acquiring production node information corresponding to a Redis production instance and disaster tolerance node information of the Redis disaster tolerance instance; establishing a map mapping relation between a production node and a disaster recovery node; acquiring database files of a production node and a disaster recovery node; and performing data full synchronization and data increment synchronization on the Redis disaster recovery instance based on the database file and the map mapping relation.
In one embodiment, the computer program when executed by the processor further performs the steps of: and carrying out data check on the Redis disaster recovery instance.
In one embodiment, the computer program when executed by the processor further performs the steps of: and performing static configuration check and dynamic check on the Redis disaster recovery instance and the failed Redis instance.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring disaster tolerance switching efficiency data corresponding to a Redis disaster tolerance example; and optimizing the switching scene corresponding to the Redis disaster tolerance example according to the disaster tolerance switching efficiency data.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct Rambus Dynamic RAM (DRDRAM), and Rambus Dynamic RAM (RDRAM), among others.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (11)

1. A Redis cache disaster recovery switching method comprises the following steps:
acquiring a Redis fault notification message;
analyzing the Redis fault notification message, and determining a faulted Redis instance corresponding to the Redis fault notification message;
searching a Redis disaster tolerance instance corresponding to the failed Redis instance, wherein the Redis disaster tolerance instance is synchronous data of the failed Redis instance;
and carrying out disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance.
2. The method according to claim 1, wherein before said obtaining the Redis fault notification message, further comprising:
acquiring a Redis instance resource delivery request;
feeding back a Redis production instance corresponding to the Redis instance resource delivery request;
and creating a Redis disaster recovery instance corresponding to the Redis production instance.
3. The method according to claim 2, wherein after creating the Redis disaster recovery instance corresponding to the Redis production instance, further comprising:
acquiring a consistency check request corresponding to the Redis production instance;
according to the consistency check request, consistency check is carried out on the Redis production instance and the Redis disaster recovery instance corresponding to the Redis production instance, and a consistency check result is obtained;
and updating the Redis disaster recovery instance corresponding to the Redis production instance according to the consistency check result.
4. The method according to claim 2, wherein after creating the Redis disaster recovery instance corresponding to the Redis production instance, further comprising:
acquiring a data synchronization request corresponding to the Redis production instance, and searching the Redis disaster recovery instance corresponding to the Redis production instance according to the data synchronization request;
acquiring production node information corresponding to the Redis production instance and disaster tolerance node information of the Redis disaster tolerance instance;
establishing a map mapping relation between the production node and the disaster recovery node;
acquiring database files of the production node and the disaster recovery node;
and performing data full synchronization and data increment synchronization on the Redis disaster recovery instance based on the database file and the map mapping relation.
5. The method according to claim 1, wherein before the performing disaster recovery switching on the failed Redis instance based on the Redis disaster recovery instance, further comprising:
performing data verification on the Redis disaster recovery instance;
the performing disaster recovery switching on the failed Redis instance based on the Redis disaster recovery instance comprises:
and carrying out disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance passing the data verification.
6. The method according to claim 1, wherein before the performing disaster recovery switching on the failed Redis instance based on the Redis disaster recovery instance, further comprising:
performing static configuration check and dynamic check on the Redis disaster recovery instance and the failed Redis instance;
the performing disaster recovery switching on the failed Redis instance based on the Redis disaster recovery instance comprises:
and when the static configuration check and the dynamic check pass, carrying out disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance.
7. The method of claim 1, wherein before obtaining the failure notification message, further comprising:
acquiring disaster tolerance switching efficiency data corresponding to the Redis disaster tolerance example;
and optimizing the switching scene corresponding to the Redis disaster tolerance example according to the disaster tolerance switching efficiency data.
8. A Redis cache disaster recovery switching device, the device comprising:
the message acquisition module is used for acquiring the fault notification message;
the fault instance searching module is used for analyzing the Redis fault notification message and determining a faulted Redis instance corresponding to the Redis fault notification message;
the disaster tolerance example searching module is used for searching a Redis disaster tolerance example corresponding to the failed Redis example, wherein the Redis disaster tolerance example is synchronous data of the failed Redis example;
and the disaster recovery switching module is used for carrying out disaster recovery switching on the failed Redis instance based on the Redis disaster recovery instance.
9. The Redis cache disaster recovery switching platform is characterized by comprising a disaster recovery management server, a Redis production server and a Redis disaster recovery server, wherein the disaster recovery management server is used for: acquiring a Redis fault notification message sent by the Redis production server; analyzing the Redis fault notification message, and determining a faulted Redis instance corresponding to the Redis fault notification message; searching a Redis disaster tolerance instance corresponding to the failed Redis instance in the Redis disaster tolerance server, wherein the Redis disaster tolerance instance is synchronous data of the failed Redis instance; and carrying out disaster tolerance switching on the failed Redis instance based on the Redis disaster tolerance instance.
10. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
11. A computer storage medium on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202110466245.3A 2021-04-28 2021-04-28 Redis cache disaster recovery switching method, device, platform, equipment and storage medium Pending CN115248751A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110466245.3A CN115248751A (en) 2021-04-28 2021-04-28 Redis cache disaster recovery switching method, device, platform, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110466245.3A CN115248751A (en) 2021-04-28 2021-04-28 Redis cache disaster recovery switching method, device, platform, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115248751A true CN115248751A (en) 2022-10-28

Family

ID=83696774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110466245.3A Pending CN115248751A (en) 2021-04-28 2021-04-28 Redis cache disaster recovery switching method, device, platform, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115248751A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118101428A (en) * 2024-04-24 2024-05-28 浪潮云信息技术股份公司 Redis chain type replication fault detection and repair method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118101428A (en) * 2024-04-24 2024-05-28 浪潮云信息技术股份公司 Redis chain type replication fault detection and repair method and device

Similar Documents

Publication Publication Date Title
US8301600B1 (en) Failover recovery in a distributed data store
US9146839B2 (en) Method for pre-testing software compatibility and system thereof
CN108932309B (en) Cross-platform database management method, device, computer equipment and storage medium
CN108710673B (en) Method, system, computer device and storage medium for realizing high availability of database
CN109308227B (en) Fault detection control method and related equipment
CN110555041A (en) Data processing method, data processing device, computer equipment and storage medium
CN106648994B (en) Method, equipment and system for backing up operation log
CN110063042B (en) Database fault response method and terminal thereof
CN111611009A (en) Database script management method and device, computer equipment and storage medium
CN112698926B (en) Data processing method, device, equipment, storage medium and system
CN111737227A (en) Data modification method and system
CN110737719A (en) Data synchronization method, device, equipment and computer readable storage medium
CN113946559A (en) Data processing method, target database system and data processing system
CN110377664B (en) Data synchronization method, device, server and storage medium
CN111752577B (en) Upgrading method and equipment for system version
CN115248751A (en) Redis cache disaster recovery switching method, device, platform, equipment and storage medium
CN109002263B (en) Method and device for adjusting storage capacity
CN113849352A (en) Business data storage exception processing method and device and server
CN117493458A (en) Method for automatically constructing chain type replication of redis multi-slave nodes
CN112069152A (en) Database cluster upgrading method, device, equipment and storage medium
CN113420081A (en) Data verification method and device, electronic equipment and computer storage medium
CN114500289B (en) Control plane recovery method, device, control node and storage medium
CN111338848B (en) Failure application copy processing method and device, computer equipment and storage medium
CN110489208B (en) Virtual machine configuration parameter checking method, system, computer equipment and storage medium
CN114490196A (en) Database switching method, system, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination