CN114253774A

CN114253774A - Disaster recovery method, device and storage medium for service management platform

Info

Publication number: CN114253774A
Application number: CN202111391905.2A
Authority: CN
Inventors: 张伟强; 冯毅; 蔡超
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2021-11-23
Filing date: 2021-11-23
Publication date: 2022-03-29

Abstract

The method comprises the steps of sending a test message to each of a plurality of sub-platforms, determining a detection result according to a response result returned by the sub-platforms, generating a first switching instruction according to the detection result of the plurality of sub-platforms, and sending the first switching instruction to a GSLB (global system for Mobile communications) of the sub-platform with a fault in the plurality of sub-platforms, so that the GSLB switches an IP address of the sub-platform with the fault from a production IP address of the sub-platform to a production IP address of the sub-platform without the fault according to the first switching instruction, wherein the production IP address is an IP address which is distributed by the sub-platform and used for accessing the sub-platform of the sub-platform. The method and the system save the development cost of network management software, realize automatic switching, improve the efficiency, and because the GSLB is adopted for IP address switching, all the sub-platforms do not need to be arranged in the same region.

Description

Disaster recovery method, device and storage medium for service management platform

Technical Field

The present application relates to the field of communications technologies, and in particular, to a disaster recovery method, device, and storage medium for a service management platform.

Background

With the development of communication technology, more and more services are carried by the service management platform. The user's demand for security of the service management platform is increasing.

In the prior art, a service management platform can adopt a 1+1 disaster tolerance mode to ensure service safety, and the following two main implementation modes are provided: the first is load sharing, two sets of platforms simultaneously undertake services, when a network manager monitors that one set of the platforms has a fault, the network manager informs operation and maintenance personnel to modify an analytic address on a Domain Name Server (DNS), and the DNS analytic address points to the fault-free platform. The second type is main and standby disaster tolerance, wherein one set of platform is a main platform, the other set of platform is a standby platform, the two sets of platforms share the same IP, and after the main platform fails, the standby platform undertakes services through the same IP.

However, in the process of implementing the present application, the inventors found that at least the following problems exist in the prior art: in the first load sharing scheme, network management software needs to be specially developed, DNS resolution addresses need to be manually modified, high software development cost and labor cost are spent, and as the traffic volume increases, a large amount of traffic congestion and loss are caused during manual switching. In the second scheme of main/standby disaster recovery, the main platform and the standby platform both use the same IP address, and because of the problem of IP address segment allocation, the devices are generally required to be arranged under the same network device, which limits the deployment area of the standby platform.

Disclosure of Invention

The application provides a disaster recovery method, equipment and a storage medium of a service management platform, which are used for reducing software development cost and labor cost, realizing automatic switching among platforms and not limiting the deployment area of the platforms.

In a first aspect, the present application provides a disaster recovery method for a service management platform, where the service management platform includes a plurality of sub-platforms; data of a plurality of the sub-platforms is synchronized; the method comprises the following steps:

sending a test message to each of the plurality of sub-platforms, and determining a detection result according to a response result returned by the sub-platform based on the test message;

generating a first switching instruction according to the detection results of the plurality of sub-platforms, and sending the first switching instruction to global load balancing equipment GSLB of the sub-platform with the fault in the plurality of sub-platforms, so that the GSLB switches the IP address of the sub-platform with the fault from the production IP address of the sub-platform with the fault to the production IP address of the sub-platform without the fault according to the first switching instruction; and the production IP address is an IP address which is distributed by the sub-platform and used for accessing the sub-platform.

In one possible design, a plurality of the sub-platforms are respectively assigned a plurality of production IP addresses; the plurality of production IP addresses comprises a first production IP address and a second production IP address; the first production IP address is used for accessing when the sub-platform of the first production IP address is not in fault, and the second production IP address is used for switching the IP addresses of other sub-platforms to the second production IP address when other sub-platforms are in fault.

In one possible design, the test message is any one of: issuing a notification message, uploading a multimedia file and downloading the multimedia file.

In one possible design, the sending the test message to the sub-platform includes:

and periodically sending a test message to the sub-platform.

In one possible design, the determining, according to the response result returned by the sub-platform based on the test message, a detection result includes:

if the correct response message is not received within the first preset time, the message is judged to be failed to be sent, and a detection result of failing detection is obtained;

if a correct response message is received within a first preset time, waiting for receiving a delivery result report of the test message;

and if the delivery result report is not received within second preset time, judging that the message is failed to be received, and obtaining a detection result of failing to pass the detection.

sending a test message to the sub-platform through a plurality of accounts; the plurality of accounts belong to networks in different regions respectively;

the determining a detection result according to a response result returned by the sub-platform based on the test message includes:

and if the number of the account numbers with the detection results of failing to detect in the plurality of account numbers is larger than a preset threshold value, determining that the detection results of the sub-platform are failing to detect.

In one possible design, the plurality of sub-platforms includes a first sub-platform and a second sub-platform; the generating a first switching instruction according to the detection result includes:

if the continuous times of the detection result of the first sub-platform that the detection fails are larger than a first preset time and the continuous times of the detection result of the second sub-platform that the detection passes are larger than a second preset time, it is determined that the first sub-platform fails and the second sub-platform fails, and a first switching instruction is generated so that the GSLB can switch the IP address of the first sub-platform from the production IP address of the first sub-platform to the production IP address of the second sub-platform.

In one possible design, after the generating the first switching instruction according to the detection results of the plurality of sub-platforms, the method further includes:

and if the detection result of the first sub-platform is that the passing detection continuous time is longer than a third preset time, judging that the first sub-platform is free from the fault, and generating a second switching instruction, so that the GSLB switches the IP address of the first sub-platform from the production IP address of the second sub-platform back to the production IP address of the first sub-platform according to the second switching instruction.

In a second aspect, the present application provides a disaster recovery device for a service management platform, including:

the detection module is used for sending a test message to each of the plurality of sub-platforms and determining a detection result according to a response result returned by the sub-platform based on the test message;

the switching module is used for generating a first switching instruction according to the detection results of the plurality of sub-platforms, and sending the first switching instruction to global load balancing equipment GSLB of the sub-platform with the fault in the plurality of sub-platforms, so that the GSLB switches the IP address of the sub-platform with the fault from the production IP address of the sub-platform with the fault to the production IP address of the sub-platform without the fault according to the first switching instruction; and the production IP address is an IP address which is distributed by the sub-platform and used for accessing the sub-platform.

In a third aspect, the present application provides a disaster recovery device for a service management platform, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes computer-executable instructions stored by the memory to cause the at least one processor to perform the method as set forth in the first aspect above and in various possible designs of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, perform a method as set forth in the first aspect above and in various possible designs of the first aspect.

In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, performs the method as set forth in the first aspect above and in various possible designs of the first aspect.

The method comprises the steps of sending a test message to each of a plurality of sub-platforms, determining a detection result according to a response result returned by the sub-platforms based on the test message, generating a first switching instruction according to the detection result of the plurality of sub-platforms, and sending the first switching instruction to a global load balancing device (GSLB) of a sub-platform with a fault in the plurality of sub-platforms, so that the GSLB switches the IP address of the sub-platform with the fault from a production IP address of the sub-platform to a production IP address of the sub-platform without the fault according to the first switching instruction, wherein the production IP address is an IP address which is distributed by the sub-platform and used for accessing the sub-platform of the sub-platform. According to the disaster recovery switching method of the service management platform, the test message is sent to each sub-platform respectively to obtain the detection result of each sub-platform, and the switching instruction is generated according to the detection result of each sub-platform and sent to the GSLB, so that the IP address of the sub-platform with the fault is automatically switched to the IP address of the sub-platform without the fault through the GLSB. The development cost of network management software is saved, automatic switching is realized, labor cost is saved, efficiency is improved, service blocking during switching is avoided, and due to the fact that GSLB is adopted for IP address switching, all sub-platforms do not need to be arranged in the same region, and region limitation is eliminated.

Drawings

In order to more clearly illustrate the technical solutions in the present application or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic view of an application scenario of a disaster recovery method for a service management platform according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a disaster recovery method for a service management platform according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a GSLB operating process according to an embodiment of the present application;

fig. 4 is a schematic flowchart of step 201 in fig. 2 according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a disaster recovery device of a service management platform according to an embodiment of the present application;

fig. 6 is a block diagram of a disaster recovery device of a service management platform according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the prior art, a service management Platform may use a 1+1 disaster tolerance mode to ensure service security, and taking a Messaging as a Platform (MaaP) as an example, the 1+1 disaster tolerance mode of the MaaP includes the following two types: the first type is load sharing, specifically, two sets of MaaP platforms simultaneously undertake services, when the first set of MaaP platform fails, the second set of MaaP platform can acquire the failure through heartbeat, and notify operation and maintenance personnel of the failure, so that the operation and maintenance personnel modify an analytic address on a Domain Name Server (DNS), point the DNS analytic address to a healthy MaaP platform, that is, the second set of MaaP platform, and point the DNS analytic address to the first set of MaaP platform after the failure is recovered. The second is main and standby disaster recovery, specifically, two sets of maaps, one set of which is used as a main MaaP and the other set of which is used as a standby MaaP, wherein only the main MaaP platform undertakes services during normal operation, and when the main MaaP fails, the standby MaaP starts and uses the same IP address to take over the services. However, in the first load sharing scheme, the DNS resolution address needs to be modified manually, which increases the workload of operation and maintenance personnel on one hand, and on the other hand, with the increase of the 5G message traffic, a large amount of traffic congestion and loss will be caused during manual switching. In the second scheme of main/standby disaster recovery, the main MaaP platform and the standby MaaP platform both use the same IP address, which can implement seamless switching of services, but because of the problem of IP address segment allocation, generally requires devices to be disposed under the same network device, which limits the deployment area of the standby devices.

In order to solve the technical problem, the inventor of the present application finds that at least two sets of platforms can be deployed in different areas, the two sets of platforms are used as main and standby for each other, the two sets of platforms keep data (such as office data and Chatbot data) completely consistent, a fault is found by sending a test message to each sub-platform to obtain a detection result, and the IP address switching during domain name access is realized in a GSLB domain name resolution mode. Based on this, this embodiment provides a disaster recovery method for a service management platform, which obtains a detection result of each sub-platform by sending a test message to each sub-platform, and generates a switching instruction according to the detection result of each sub-platform, and sends the switching instruction to the GSLB of the sub-platform with the fault, so as to automatically switch the IP address of the sub-platform with the fault to the IP address of the sub-platform without the fault through the GLSB. The development cost of network management software is saved, automatic switching is realized, labor cost is saved, efficiency is improved, service blocking during switching is avoided, and due to the fact that GSLB is adopted for IP address switching, all sub-platforms do not need to be arranged in the same region, and region limitation is eliminated.

Fig. 1 is a schematic application scenario diagram of a disaster recovery method for a service management platform according to an embodiment of the present application. As shown in fig. 1, the scenario includes a probe switching device and a service management platform. The service management platform, such as the MaaP platform, may include a plurality of sub-platforms, for example, a first sub-platform disposed in the a room and a second sub-platform disposed in the B room. Each sub-platform comprises a database Server DB, a core Server, and the like, and a Server Load balancing device (SLB), a Global Load balancing device (GSLB), a firewall (not shown), a switch (not shown), and other network devices are provided for the first sub-platform and the second sub-platform, and the sub-platforms of the two-place machine rooms and the network devices such as the firewall, the SLB, and the GSLB are all in an active state during normal operation, and each processes a service accessed to implement Load balancing. The two switches (not shown) are connected through a dedicated line for balanced distribution of traffic and data synchronization. And the detection switching equipment is used for sending test messages to each sub-platform, carrying out fault detection and generating a switching instruction according to a detection result so as to enable the GSLB to carry out IP switching. It should be noted that the probe switching device may include a plurality of devices, a part of the devices is configured to send a test message to each sub-platform for fault detection, and a part of the devices is configured to generate a switching instruction according to a detection result, so that the GSLB performs IP switching. This embodiment is not limited to this.

In the specific implementation process, when an external network element such as a chat robot Chatbot, a 5G Message Center (5G Message Center, 5GMC), a User Equipment (UE), and the like accesses a service management platform or a MaaP platform through a Domain Name, Domain Name resolution of a Domain Name Server (DNS) is performed, that is, an IP address corresponding to a Domain Name of the platform is found out, so that the external network element sends a service request such as portal access or interface call after establishing a connection with the IP address, specifically, GSLB of the platform can be used as an authorized Domain Name resolution Server, and access switching is implemented through a global load balancing function, where the process is as follows: an external network element initiates a query request of domain name resolution to a local DNS, the local DNS reads NS records preset on a GSLB upper level DNS through internal query among a series of DNS, and the domain name resolution work is processed by the GSLB, wherein the NS records are used for pointing to IP addresses located in the GSLB. After receiving the query request sent by the local DNS, the GSLB returns the IP address of the selected sub-platform to the local DNS through calculation, where the IP address may be a production IP address of the platform itself or a production IP address of another sub-platform to which the platform is switched after a failure occurs. And the external network element performs subsequent operations such as portal access or interface calling and the like based on the IP address of the selected sub-platform returned by the local DNS. When the work is normally performed, the detection switching device may send a test message to each sub-platform, detect whether each sub-platform fails, and generate a switching instruction according to a detection result, send the switching instruction to the GSLB, and the GSLB switches the production IP address of the sub-platform that fails to the production IP address of the other sub-platform that does not fail, for example, if the first sub-platform fails, the GSLB may switch the production IP address of the first sub-platform to the production IP address of the second sub-platform that does not fail, so that when an external network element accesses the first sub-platform, the external network element actually accesses the second sub-platform that does not fail, and the first sub-platform fails to affect normal service processing of the external network element.

In the disaster recovery method for the service management platform provided in this embodiment, the IP address of the sub-platform with the fault is automatically switched to the IP address of the sub-platform without the fault through the GLSB. The development cost of network management software is saved, automatic switching is realized, labor cost is saved, efficiency is improved, service blocking during switching is avoided, and due to the fact that GSLB is adopted for IP address switching, all sub-platforms do not need to be arranged in the same region, and region limitation is eliminated.

The technical solution of the present application will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 2 is a schematic flow chart of a disaster recovery method for a service management platform according to an embodiment of the present application. As shown in fig. 2, the service management platform includes a plurality of sub-platforms; data of a plurality of the sub-platforms is synchronized. The method comprises the following steps:

201. and aiming at each of the plurality of sub-platforms, sending a test message to the sub-platform, and determining a detection result according to a response result returned by the sub-platform based on the test message.

In this embodiment, the test message may be any one of the following: issuing a notification message, uploading a multimedia file and downloading the multimedia file.

In this embodiment, the sending the test message to the sub-platform may include: sending a test message to the sub-platform through a plurality of accounts; the plurality of accounts belong to networks in different regions respectively; the determining a detection result according to a response result returned by the sub-platform based on the test message includes: and if the number of the account numbers with the detection results of failing to detect in the plurality of account numbers is larger than a preset threshold value, determining that the detection results of the sub-platform are failing to detect.

Specifically, the main body of the step may be a data processing device installed with detection server software, taking MaaP platform shown in fig. 1 as an example, the detection server may simulate 2 chatbots, and perform registration on 2 sets of MaaP platforms, the first sub-platform of the machine room a, and the second sub-platform of the machine room B, respectively. Usually, one Chatbot is used for polling the system A of the first sub-platform, and the other is used for polling the system B of the second sub-platform. In order to avoid that the GSLB returns the IP of the nearby sub-platform on the nearby principle, the detection server may be set to perform fault detection through the real IP addresses of the first sub-platform and the second sub-platform, instead of using the domain name to access the first sub-platform and the second sub-platform. And after the detection result is obtained, sending the detection result to a switching server so that the switching server generates a switching instruction according to the detection result. The test message may include issuing a notification message, uploading and downloading a multimedia file. The aim is that the detection server simulates an external network element to complete a set of complete communication flow.

One or more probe servers may be deployed, for example, one probe server may be deployed in two different regions. By deploying multiple sets of detection servers, the network problems and false alarms caused by the detection servers can be avoided. Under the condition of deploying a plurality of sets of detection servers, if the switching server only receives the detection result of the detection server which fails to pass the detection, the first switching instruction is not directly generated, and only when 2 sets of detection servers report the detection result of the detection server which fails to pass the detection, the first switching instruction is generated.

In addition, for the Chatbot account number for sending the test message and the test mobile phone number for receiving the test message, the test number can be preset in the MaaP platform in advance, only the message log is sent out, and no charging ticket is sent out.

202. Generating a first switching instruction according to the detection results of the plurality of sub-platforms, and sending the first switching instruction to global load balancing equipment GSLB of the sub-platform with the fault in the plurality of sub-platforms, so that the GSLB switches the IP address of the sub-platform with the fault from the production IP address of the sub-platform with the fault to the production IP address of the sub-platform without the fault according to the first switching instruction; and the production IP address is an IP address which is distributed by the sub-platform and used for accessing the sub-platform.

Optionally, a plurality of said sub-platforms are respectively assigned a plurality of production IP addresses; the plurality of production IP addresses comprises a first production IP address and a second production IP address; the first production IP address is used for accessing when the sub-platform of the first production IP address is not in fault, and the second production IP address is used for switching the IP addresses of other sub-platforms to the second production IP address when other sub-platforms are in fault.

Optionally, the plurality of sub-platforms comprises a first sub-platform and a second sub-platform; the generating a first switching instruction according to the detection result may include: if the continuous times of the detection result of the first sub-platform that the detection fails are larger than a first preset time and the continuous times of the detection result of the second sub-platform that the detection passes are larger than a second preset time, it is determined that the first sub-platform fails and the second sub-platform fails, and a first switching instruction is generated so that the GSLB can switch the IP address of the first sub-platform from the production IP address of the first sub-platform to the production IP address of the second sub-platform.

Optionally, after generating the first switching instruction according to the detection results of the plurality of sub-platforms, the method may further include: and if the detection result of the first sub-platform is that the passing detection continuous time is longer than a third preset time, judging that the first sub-platform is free from the fault, and generating a second switching instruction, so that the GSLB switches the IP address of the first sub-platform from the production IP address of the second sub-platform back to the production IP address of the first sub-platform according to the second switching instruction.

Specifically, the generation of the first switching instruction according to the detection results of the plurality of sub-platforms may be completed by switching server software, the switching server and the probe server may be respectively run on different terminal devices, and the first switching instruction or the second switching instruction is generated according to the detection result sent by the probe server, so that the GSLB performs the switching operation of the IP address according to the switching instruction.

The switching server may generate a corresponding switching instruction according to a preset policy rule. In order to avoid the detection result misinformation caused by network reasons, the switching server can simultaneously acquire the detection results of a plurality of sets of detection servers and comprehensively judge whether the switching operation is to be triggered.

For example, the basic parameters that can be used for the set policy may include at least one of the following: a test script ID, a maximum number of consecutive failures allowed, a probe server ID, a MaaP platform ID under test, etc.

For example, the preset policy rules may include the following possible designs:

in a first possible design, assuming that 2 sets of detection servers are used for respectively detecting a first sub-platform and a second sub-platform, the first sub-platform continuously detects for 3 times, and the results are all failed to detect, namely, the detection fails, and the second sub-platform continuously detects for 3 times, namely, the detection succeeds, the switching server generates a first switching instruction, so that the GSLB switches the IP address of the first sub-platform from the production IP address of the GSLB to the production IP address of the second sub-platform according to the first switching instruction.

In a second possible design, it is assumed that the first sub-platform and the second sub-platform are respectively detected by 2 sets of detection servers, one of the first sub-platform and the second sub-platform continuously detects for 3 times, the result is that the platform fails to pass the detection, that is, the detection fails, and the second sub-platform continuously detects for 3 times that the platform passes the detection, that is, the detection succeeds, since which platform the sub-platform that fails to detect is specifically unknown, an alarm is triggered, and whether to generate the first switching instruction is determined after manual review.

In a third possible design, if the first sub-platform or the second sub-platform of the service management platform has already switched the IP address once, for example, the service of the first sub-platform has already been switched to the second sub-platform. If the detection server detects that the service of the first sub-platform is automatically recovered, namely the detection result is a passing detection result, and the detection results are the detection results of the passing detection after continuous detection for 30 minutes, a second switching instruction can be generated, so that the GSLB can automatically switch the IP address of the first sub-platform from the production IP address of the second sub-platform back to the production IP address of the first sub-platform according to the second switching instruction.

The operation of the GSLB is illustrated below in connection with fig. 3.

As shown in fig. 3, an external network element, e.g. Chatbot, performs service processing, and an access procedure to a service management platform may include the following steps:

301. chatbot sends a domain name query request to the local DNS.

302. The local DNS sends a domain name query request to an upper level DNS of the GSLB.

303. The upper level DNS of GSLB returns NS records to the local DNS. The NS record is obtained by registering the IP address of the GLSB in the upper DNS of the GSLB.

304. The local DNS sends a domain name query request to the GSLB based on the NS record.

305. GSLB returns the IP address of the sub-platform corresponding to the domain name to the local DNS.

306. The local DNS returns the IP address returned by GSLB to Chatbot.

In this embodiment, an external network element accesses a local DNS to query an IP address of a domain name of a service management platform, if the DNS does not have the IP address of the domain name of the service management platform in a cache, queries a root DNS, and if the root DNS is not a higher-level DNS of a GLSB, continues querying until the higher-level DNS of a GSLB is queried, obtains the NS record, queries the GSLB according to the NS record, determines a target IP according to a matching rule and the address of the local DNS, and returns the target IP to the local DNS. And the local DNS returns the target IP to the external network element.

Specifically, there are two common DNS types: an authoritative resolution server and a recursive resolution server. The recursive resolution server is also the local DNS.

The authority resolving server stores data of a part of domain in the domain name space. If the DNS is responsible for policing one or more regions, the DNS is called an authoritative server for the regions. A resource record (NS) flag in a root authority DNS or a secondary authority Server is designated as a DNS of a regional authority Server. By listing the server in the NS record, the other server considers it to be the authoritative server for the zone. This means that any server specified in the NS record is treated by the other servers as authoritative servers and can answer queries for names contained within the region.

The recursive server, i.e. the local server, normally has no domain name resolution data therein at the initial Time, and all the domain name resolution data therein are from the query result from the recursive server to the authoritative resolution server, and once the query is completed, the recursive server locally forms a cache record according to the Time To Live (TTL) and provides the query service of DNS resolution for the user, which is a function of the recursive server.

The GSLB is an authority resolution server, and stores a domain name corresponding to a service management platform on the authority resolution server, and if the domain name is a botplatform, rcs, china, com, a corresponding NS record must be configured in an upper level DNS of the GSLB to point to an IP where the GSLB is located (in order to ensure security, the GSLB may deploy multiple IP addresses, that is, multiple GSLB), and when the NS record is queried by a local DNS, the GSLB may query the corresponding IP of the botplatform, rcs, china, com, and the GSLB is an intelligent DNS, and may determine what content is returned to a client according to a source IP address, and at this time, the GSLB may determine to return an IP address of a first sub-platform or a second sub-platform according to a pre-configured IP matching relationship.

GSLB may also open an interface through which the IP may be dynamically modified to perform a switch of IP addresses upon receiving a switch instruction sent by a switch server. Specifically, the switching server sends a switching instruction to the GSLB, and after receiving the switching instruction, the GSLB can modify the IP address of the corresponding botplatform. And the data can be stored, so that the data can not be lost after the GSLB device is restarted.

In some embodiments, the GSLB may further be configured with a master/slave device for load balancing. The reliability of the service is guaranteed. The modified data is executed on the primary device, and the modified record is automatically synchronized to the standby device.

In some embodiments, assuming that 2 GSLB servers are installed (IP1, IP2 is 2 extranet IPs providing services), first, the IP addresses of GSLB (IP1 and IP2) may be registered in the upper level DNS of GSLB. Configuring IP1 and IP2 addresses on the GSLB, setting access rules (generally set according to a near principle) according to the addresses of client IP, and returning IPA (production IP address of the first sub-platform) or IPB (production IP address of the second sub-platform). Since the GSLB also needs to be disaster-tolerant, the DNS cache time cannot be too long, and a timeout of 1 minute can be configured. Therefore, the external network element does not use the cached address for a long time, which causes the disaster-tolerant switching time to be too long.

In some embodiments, the deployment of the probe servers, the switch servers, and the GSLB may be performed with reference to the following tables.

In the disaster recovery method for the service management platform provided in this embodiment, the test message is sent to each sub-platform to obtain the detection result of each sub-platform, and a switching instruction is generated according to the detection result of each sub-platform and sent to the GSLB, so that the IP address of the sub-platform with the fault is automatically switched to the IP address of the sub-platform without the fault through the GLSB. The development cost of network management software is saved, automatic switching is realized, labor cost is saved, efficiency is improved, service blocking during switching is avoided, and due to the fact that GSLB is adopted for IP address switching, all sub-platforms do not need to be arranged in the same region, and region limitation is eliminated.

Fig. 4 is a schematic flowchart of step 201 in fig. 2 according to an embodiment of the present disclosure. As shown in fig. 4, step 201 may specifically include:

401. and periodically sending a test message to the sub-platform.

402. And judging whether the response message is received within the first preset time, if so, executing the step 403, and if not, executing the step 406.

403. And judging whether the received response message is correct or not, if so, executing a step 404, and if not, executing a step 406.

404. And judging whether the delivery result report is received within a second preset time, if so, executing the step 405, and if not, executing the step 407.

405. And judging that the message is successfully sent and received, obtaining a detection result passing the detection, and sending the detection result to the switching server.

406. And judging that the message is failed to be sent, obtaining a detection result of failing to pass the detection, and sending the detection result to the switching server.

407. And judging that the message is failed to be received, obtaining a detection result of failing to pass the detection, and sending the detection result to the switching server.

In a specific detection process, the detection server starts polling regularly, starts to simulate Chatbot to send a message to a service management platform, and sets a report of a result needing to be delivered; checking whether the MaaP platform responds normally or not, and checking whether the returned result is correct or not. If so, waiting for a delivery result report; if a delivery result report is received within a preset time, the result of the delivery result report is checked to see if the delivery was successful. And finishing the inspection. And sending the detection result to the switching server. The detection server is responsible for simulating the message sending and checking the returned message result, and whether the detection is successful or failed, the detection server needs to report the detection result to the switching server in real time so that the switching server can generate a corresponding switching instruction according to a preset strategy.

According to the disaster recovery method for the service management platform provided by the embodiment, the uplink and downlink functions of the platform can be comprehensively detected by sending the test message and receiving the delivery result report, so that the comprehensiveness of fault detection is realized. So as to generate the switching instruction in time and avoid influencing the normal communication of the user.

Fig. 5 is a schematic structural diagram of a disaster recovery device of a service management platform according to an embodiment of the present application. As shown in fig. 5, the disaster recovery device 50 of the service management platform includes: a detection module 501 and a switching module 502.

A detection module 501, configured to send a test message to each of the multiple sub-platforms, and determine a detection result according to a response result returned by the sub-platform based on the test message;

a switching module 502, configured to generate a first switching instruction according to the detection results of the multiple sub-platforms, and send the first switching instruction to a global load balancing device GSLB of a sub-platform with a fault in the multiple sub-platforms, so that the GSLB switches, according to the first switching instruction, the IP address of the sub-platform with the fault from a production IP address of the sub-platform with the fault to a production IP address of the sub-platform without the fault; and the production IP address is an IP address which is distributed by the sub-platform and used for accessing the sub-platform.

According to the disaster recovery device of the service management platform provided by the embodiment of the application, the test message is respectively sent to each sub-platform to obtain the detection result of each sub-platform, and the switching instruction is generated according to the detection result of each sub-platform and sent to the GSLB, so that the IP address of the sub-platform with the fault is automatically switched to the IP address of the sub-platform without the fault through the GLSB. The development cost of network management software is saved, automatic switching is realized, labor cost is saved, efficiency is improved, service blocking during switching is avoided, and due to the fact that GSLB is adopted for IP address switching, all sub-platforms do not need to be arranged in the same region, and region limitation is eliminated.

The disaster recovery device of the service management platform provided in the embodiment of the present application may be used to implement the method embodiment described above, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 6 is a block diagram of a disaster recovery device of a service management platform according to an embodiment of the present disclosure, where the disaster recovery device may be a data processing device such as a computer, a messaging device, a tablet device, and the like.

The apparatus 60 may include one or more of the following components: a processing component 601, a memory 602, a power component 603, a multimedia component 604, an audio component 605, an input/output (I/O) interface 606, a sensor component 607, and a communication component 608.

The processing component 601 generally controls overall operation of the device 60, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 601 may include one or more processors 609 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 601 may include one or more modules that facilitate interaction between the processing component 601 and other components. For example, the processing component 601 may include a multimedia module to facilitate interaction between the multimedia component 604 and the processing component 601.

The memory 602 is configured to store various types of data to support operations at the apparatus 60. Examples of such data include instructions for any application or method operating on the device 60, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 602 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 603 provides power to the various components of the device 60. The power components 603 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 60.

The multimedia component 604 includes a screen providing an output interface between the device 60 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 604 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 60 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

Audio component 605 is configured to output and/or input audio signals. For example, audio component 605 includes a Microphone (MIC) configured to receive external audio signals when apparatus 60 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 602 or transmitted via the communication component 608. In some embodiments, audio component 605 also includes a speaker for outputting audio signals.

The I/O interface 606 provides an interface between the processing component 601 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 607 includes one or more sensors for providing various aspects of status assessment for the device 60. For example, the sensor component 607 may detect the open/closed state of the device 60, the relative positioning of components, such as a display and keypad of the device 60, the sensor component 607 may also detect a change in the position of the device 60 or a component of the device 60, the presence or absence of user contact with the device 60, the orientation or acceleration/deceleration of the device 60, and a change in the temperature of the device 60. The sensor component 607 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor component 607 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 607 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 608 is configured to facilitate wired or wireless communication between the apparatus 60 and other devices. The device 60 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 608 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 608 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 60 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as the memory 602 including instructions executable by the processor 609 of the apparatus 60 to perform the above-described method. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The computer-readable storage medium may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method for disaster recovery of a service management platform, which is executed by the disaster recovery device of the service management platform, is implemented.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A disaster recovery switching method of a service management platform is characterized in that the service management platform comprises a plurality of sub-platforms; data of a plurality of the sub-platforms is synchronized; the method comprises the following steps:

2. The method of claim 1, wherein a plurality of said sub-platforms are respectively assigned a plurality of production IP addresses; the plurality of production IP addresses comprises a first production IP address and a second production IP address; the first production IP address is used for accessing when the sub-platform of the first production IP address is not in fault, and the second production IP address is used for switching the IP addresses of other sub-platforms to the second production IP address when other sub-platforms are in fault.

3. The method of claim 1, wherein the test message is any one of: issuing a notification message, uploading a multimedia file and downloading the multimedia file.

4. The method of claim 1, wherein sending the test message to the sub-platform comprises:

and periodically sending a test message to the sub-platform.

5. The method of claim 1, wherein determining the detection result according to the response result returned by the sub-platform based on the test message comprises:

6. The method of claim 5, wherein sending the test message to the sub-platform comprises:

7. The method of any of claims 1-6, wherein the plurality of sub-platforms comprises a first sub-platform and a second sub-platform; the generating a first switching instruction according to the detection result includes:

8. The method of claim 7, wherein after generating the first switching instruction according to the detection results of the plurality of sub-platforms, further comprising:

9. A disaster recovery device of a service management platform is characterized in that the service management platform comprises a plurality of sub-platforms; data of a plurality of the sub-platforms is synchronized; the method comprises the following steps:

10. A disaster recovery device of a service management platform is characterized by comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the disaster recovery method of the traffic management platform according to any of claims 1 to 8.

11. A computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, the method for disaster recovery of a service management platform according to any one of claims 1 to 8 is implemented.