CN108011737B - Fault switching method, device and system - Google Patents

Fault switching method, device and system Download PDF

Info

Publication number
CN108011737B
CN108011737B CN201610964275.6A CN201610964275A CN108011737B CN 108011737 B CN108011737 B CN 108011737B CN 201610964275 A CN201610964275 A CN 201610964275A CN 108011737 B CN108011737 B CN 108011737B
Authority
CN
China
Prior art keywords
level node
node
disaster recovery
recovery group
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610964275.6A
Other languages
Chinese (zh)
Other versions
CN108011737A (en
Inventor
张书兵
孙艳
黄泽旭
黄凯耀
徐日东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201610964275.6A priority Critical patent/CN108011737B/en
Priority to PCT/CN2017/102802 priority patent/WO2018076972A1/en
Publication of CN108011737A publication Critical patent/CN108011737A/en
Application granted granted Critical
Publication of CN108011737B publication Critical patent/CN108011737B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/10Architectures or entities
    • H04L65/1016IP multimedia subsystem [IMS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1073Registration or de-registration

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Telephonic Communication Services (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Hardware Redundancy (AREA)

Abstract

The embodiment of the invention provides a fault switching method, a fault switching device and a fault switching system, relates to the technical field of communication, and can solve the problems of packet loss or surge and the like caused by cross-DC data access. The method comprises the following steps: the first-level node acquires a target disaster recovery group identifier corresponding to target user equipment, wherein the target disaster recovery group identifier is used for indicating: the corresponding relation between the main DC where the second-level node is located and the backup DC (the first-level node is the front-end node of the second-level node), wherein the main DC and the backup DC both store the service data of the target user equipment; the first-level node receives a service request sent by the target user equipment; and if the second-level node in the main DC fails, the first-level node switches the service request to the second-level node in the backup DC according to the target disaster recovery group identifier.

Description

Fault switching method, device and system
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a method, an apparatus, and a system for failover.
Background
Generally, a plurality of DCs (Data centers) are set in an IMS (IP Multimedia Subsystem) network or an LTE (Long Term Evolution) network, and when a certain DC fails, in order to ensure that a user equipment belonging to the DC can still normally perform a related service, Data in each DC (i.e., a main DC) can be backed up to any one DC (i.e., a backup DC) except the main DC in a disaster-tolerant backup manner.
As shown in fig. 1, service data in DC1 may be backed up in DC2, service data in DC2 may be backed up in DC3, and service data in DC3 may be backed up in DC1, so that, if DC1 fails, services of DC1 may be shared by other DCs (i.e., DC2 and DC3), and when service data in DC1 needs to be acquired, the other DCs (i.e., DC2 and DC3) may acquire from a backup DC (i.e., DC2) of DC1, that is, the backup DC is accessed in a cross-DC manner, so as to implement data access to the failed DC.
However, when the number of the DCs is large, hundreds of DCs may intensively initiate access to the backup DC in the same time period, and at this time, the occupancy rate of the network bandwidth suddenly increases, which may cause problems such as packet loss or surge.
Disclosure of Invention
Embodiments of the present invention provide a method, an apparatus, and a system for failover, which can reduce the problems of packet loss or surge caused by accessing data across a DC.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
in a first aspect, an embodiment of the present invention provides a failover method, including: the first-level node acquires a target disaster recovery group identifier corresponding to target user equipment, wherein the target disaster recovery group identifier is used for indicating: the corresponding relation between the main DC where the second-level node is located and the backup DC (the first-level node is the front-end node of the second-level node), wherein the main DC and the backup DC both store the service data of the target user equipment; the first-level node receives a service request sent by the target user equipment; if the second node in the main DC fails, the first node switches the service request to the second node in the backup DC according to the target disaster recovery group identifier. That is to say, when a second-level node in the main DC fails, the first-level node at the front end may directly switch the service request of the target user equipment to the backup DC in which the service data of the target user equipment is stored according to the target disaster recovery group identifier, and execute the service request by the second-level node in the backup DC, thereby avoiding the need to access and execute the service data required by the service request of the target user equipment across the DC as in the prior art, and further reducing the problems of packet loss or surge caused by data access across the DC.
In a possible design manner, the obtaining, by a first-level node, a target disaster recovery group identifier corresponding to a target user equipment includes: the first-level node receives a registration request sent by the target user equipment; the first level node sends the registration request to the second level node, so that the second level node determines a target disaster recovery group identifier corresponding to the target user equipment; and the first-level node receives the target disaster recovery group identifier sent by the second-level node.
In a possible design, the determining, by the second level node, a target disaster recovery group identifier corresponding to the target user equipment includes: the second-level node determines a disaster tolerance group from a plurality of disaster tolerance groups as a target disaster tolerance group of the second-level node according to the registration request, wherein each disaster tolerance group comprises a main DC and a backup DC; the second-level node backs up the service data of the target user equipment in the main DC and the backup DC in the target disaster recovery group; and the second-level node sends the target disaster recovery group identification to the first-level node.
In a possible design, when the first-level node is an S-CSCF node and the second-level node is an AS, the sending, by the second-level node, the destination disaster recovery group identifier to the first-level node includes: and the AS sends a registration response message to the S-CSCF node, wherein the registration response message contains a first private parameter, and the first private parameter is used for carrying the target disaster recovery group identifier.
In a possible design manner, the registration request carries DC information of the first-stage node, where the DC information is used to indicate a main DC and a backup DC where the first-stage node is located; wherein, the second level node determines a disaster recovery group from the plurality of disaster recovery groups as a target disaster recovery group of the second level node according to the registration request, and the method comprises the following steps: and the second-level node takes the main DC of the first-level node as the main DC of the second-level node and takes the backup DC of the first-level node as the backup DC of the second-level node according to the registration request so as to determine the target disaster recovery group.
In a possible design, when the first-level node is an S-CSCF node and the second-level node is an AS, the registration request includes a second private parameter, and the second private parameter is used to carry DC information of the S-CSCF node.
In a possible design, the determining, by the second level node, one disaster recovery group from the plurality of disaster recovery groups as a target disaster recovery group of the second level node according to the registration request includes: and the second-level node takes the current DC as the main DC of the second-level node, and takes any one DC except the main DC as the backup DC of the second-level node so as to determine the target disaster recovery group.
In a possible design, the determining, by the second level node, one disaster recovery group from the plurality of disaster recovery groups as a target disaster recovery group of the second level node according to the registration request includes: the second-level node receives the special identification information sent by the first-level node; the second-level node takes the disaster recovery group corresponding to the special identification information as the target disaster recovery group, and the second-level node stores the corresponding relation between the special identification information and the target disaster recovery group.
In a possible design manner, after the first-level node obtains the target disaster recovery group identifier corresponding to the target user equipment, the method further includes: and if the second-level node in the main DC does not have a fault, the first-level node sends the service request to the second-level node in the main DC according to the disaster recovery group identifier.
In a possible design, after the second-level node determines, according to the registration request, one disaster recovery group from the plurality of disaster recovery groups as a target disaster recovery group of the second-level node, the method further includes: the second level node records the corresponding relation between the target disaster recovery group identification and the target user equipment in the HSS; wherein, if the second node in the main DC fails, the first node switches the service request to the second node in the backup DC according to the target disaster recovery group identifier, and then further includes: a second-level node in the backup DC determines a new disaster recovery group identifier after the service request is switched, wherein a correspondence between the main DC and the backup DC indicated by the new disaster recovery group identifier is opposite to a correspondence between the main DC and the backup DC indicated by the target disaster recovery group identifier; and the second-level node in the backup DC updates the recorded corresponding relation between the target disaster recovery group identifier and the target user equipment into the corresponding relation between the new disaster recovery group identifier and the target user equipment in the HSS.
In a possible design, before the first-level node switches the service request to the second-level node in the backup DC according to the target disaster recovery group identifier, the method further includes: the GSC node monitors whether each level of node in each DC has faults or not; if the fault of the second-level node in the main DC is detected, the GSC node sends a capacity expansion instruction to the second-level node in the backup DC, and the capacity expansion instruction is used for indicating the second-level node in the backup DC to perform capacity expansion operation, so that the second-level node in the backup DC can prepare corresponding resources and acts on the second-level node in the fault DC to process corresponding service requests.
In a second aspect, an embodiment of the present invention provides a first-level node, including: an obtaining unit, configured to obtain a target disaster recovery group identifier corresponding to a target user equipment, where the target disaster recovery group identifier is used to indicate: the corresponding relation between the main DC and the backup DC where the second-level node is located, wherein the first-level node is a front-end node of the second-level node, and the main DC and the backup DC both store the service data of the target user equipment; receiving a service request sent by the target user equipment; and the sending unit is used for switching the service request to the second level node in the backup DC according to the target disaster recovery group identifier if the second level node in the main DC fails.
In a possible design, the obtaining unit is further configured to receive a registration request sent by the target user equipment; the sending unit is further configured to send the registration request to the second level node, so that the second level node determines a target disaster recovery group identifier corresponding to the target user equipment; the obtaining unit is further configured to receive the target disaster recovery group identifier sent by the second-level node.
In a possible design, the sending unit is further configured to send the service request to the second level node in the main DC according to the disaster recovery group identifier if the second level node in the main DC does not have a fault.
In a third aspect, an embodiment of the present invention provides a second-level node, including: a determining unit, configured to determine, according to a registration request sent by a first-stage node, a disaster recovery group from multiple disaster recovery groups as a target disaster recovery group of the second-stage node, where each disaster recovery group includes a primary DC and a backup DC, and the first-stage node is a front-end node of the second-stage node; a backup unit, configured to backup service data of the target user equipment in a main DC and a backup DC in the target disaster recovery group; and the sending unit is used for sending the target disaster recovery group identifier to the first-level node.
In a possible design manner, when the first-level node is an S-CSCF node and the second-level node is an AS, the sending unit is specifically configured to send a registration response message to the S-CSCF node, where the registration response message includes a first private parameter, and the first private parameter is used to carry the target disaster recovery group identifier.
In a possible design manner, the registration request carries DC information of the first-stage node, where the DC information is used to indicate a main DC and a backup DC where the first-stage node is located; the determining unit is specifically configured to determine the target disaster recovery group by using the primary DC of the first-level node as the primary DC of the second-level node and using the backup DC of the first-level node as the backup DC of the second-level node according to the registration request.
In a possible design manner, the determining unit is specifically configured to use a current DC as a main DC of the second-stage node, and use any one of the DCs except the main DC as a backup DC of the second-stage node, so as to determine the target disaster recovery group.
In a possible design manner, the second-level node further includes an obtaining unit, where the obtaining unit is configured to receive the special identification information sent by the first-level node; the determining unit is specifically configured to use the disaster recovery group corresponding to the special identifier information as the target disaster recovery group, and the second-level node stores a corresponding relationship between the special identifier information and the target disaster recovery group.
In one possible embodiment, the second level node further comprises a recording unit, wherein,
the recording unit is configured to record a corresponding relationship between the target disaster recovery group identifier and the target user equipment in the HSS; the determining unit is further configured to determine a new disaster recovery group identifier after the service request is switched, where a correspondence between the primary DC and the backup DC indicated by the new disaster recovery group identifier is opposite to a correspondence between the primary DC and the backup DC indicated by the target disaster recovery group identifier; and updating the recorded corresponding relation between the target disaster recovery group identifier and the target user equipment into the corresponding relation between the new disaster recovery group identifier and the target user equipment.
In a fourth aspect, an embodiment of the present invention provides a first-level node, including: a processor, a memory, a bus, and a communication interface; the memory is used for storing computer execution instructions, the processor is connected with the memory through the bus, and when the first-level node runs, the processor executes the computer execution instructions stored in the memory, so that the first-level node executes any one of the fault switching methods.
In a fifth aspect, an embodiment of the present invention provides a second-level node, including: a processor, a memory, a bus, and a communication interface; the memory is used for storing computer execution instructions, the processor is connected with the memory through the bus, and when the second-level node runs, the processor executes the computer execution instructions stored in the memory, so that the second-level node executes any one of the fault switching methods.
In a sixth aspect, an embodiment of the present invention provides a GSC node, including: the monitoring unit is used for monitoring whether each level of nodes in each DC have faults or not; and the sending unit is used for sending a capacity expansion instruction to the second-stage node in the backup DC if the second-stage node in the main DC is detected to have a fault, wherein the capacity expansion instruction is used for indicating the second-stage node in the backup DC to perform capacity expansion operation.
In a seventh aspect, an embodiment of the present invention provides a failover system, including the first-level node in any one of the second aspects and the second-level node in any one of the third aspects.
In a possible design, the failover system further includes a GSC node as described in the sixth aspect, where the GSC node is connected to both the first-level node and the second-level node.
In an eighth aspect, embodiments of the present invention provide a computer storage medium for storing computer software instructions for the first level node, and/or the second level node, and/or the GSC node, which includes a program designed to perform any of the above aspects for the first level node and/or the second level node.
In the present invention, the names of the first level node, the second level node, and the GSC node do not limit the devices themselves, and in actual implementation, the devices may appear by other names. Provided that the respective devices function similarly to the present invention, they are within the scope of the claims of the present invention and their equivalents.
In addition, for technical effects brought by any one of the design manners in the second aspect to the eighth aspect, reference may be made to different design manners in the first aspect and technical effects detailed in subsequent embodiments, which are not described herein again.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a schematic diagram of an application scenario for performing failover in the prior art;
fig. 2 is a first schematic diagram of a failover system according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a second architecture of a failover system according to an embodiment of the present invention;
fig. 4 is a first interaction diagram of a failover method according to an embodiment of the present invention;
fig. 5 is a second interaction diagram of a failover method according to an embodiment of the present invention;
fig. 6 is a third interaction diagram of a failover method according to an embodiment of the present invention;
fig. 7 is a third schematic diagram of an architecture of a failover system according to an embodiment of the present invention;
fig. 8 is a fourth schematic diagram of a failover system according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an NFV system according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a first-level node according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of a second-level node according to an embodiment of the present invention;
fig. 12 is a first hardware structure diagram of a first level node/a second level node according to an embodiment of the present invention;
fig. 13 is a second hardware structure diagram of the first-level node/the second-level node according to the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments.
In addition, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.
An embodiment of the present invention provides a failover method, which can be applied to the failover system 100 shown in fig. 2, where the failover system 100 at least includes a first-stage node 21 and a second-stage node 22, the first-stage node 21 is a front-end node of the second-stage node 22, when the number of the first-stage node 21 and the second-stage node 22 is multiple, the multiple first-stage nodes 21 may be disposed in the same or different DCs, and the multiple second-stage nodes 22 may also be disposed in the same or different DCs.
For example, in the IMS system, when the first-stage node 21 is a P-CSCF (Proxy-Call Session Control Function) node, the second-stage node 22 may be an S-CSCF (Serving-Call Session Control Function) node at the back end of the P-CSCF node; when the first level node 21 is an S-CSCF node, the second level node 22 may be an AS (Application Server) at the back end of the S-CSCF node.
In the LTE system, when the first-level node 21 is a RAN (Residential Access Network) node, the second-level node 22 may be a control node such as an MME (Mobility Management Entity) at the back end of the RAN node, and when the first-level node 21 is an MME, the second-level node 22 may be a forwarding device at the back end of the MME, such as an SGW (Serving Gateway) or a PGW (Packet Data Gateway).
Illustratively, AS shown in fig. 3, a P-CSCF node 1, an S-CSCF node 1, and an AS1 are provided in the DC 1; the DC2 is provided with a P-CSCF node 2, an S-CSCF node 2 and an AS 2; the DC3 has disposed therein a P-CSCF node 3, an S-CSCF node 3, and an AS 3.
In an embodiment of the present invention, service data (e.g., registration data and/or session data) of different user equipments within each DC may be divided into a plurality of copies, and the plurality of copies of the service data are respectively backed up to other DCs except the DC, taking DC1 in fig. 3 as a main DC for example, assuming that traffic data of 20 user equipments are stored in DC1, i.e., all 20 user devices are assigned to DC1, then, when each user device registers, the service data of the user equipment 1-the user equipment 10 can be backed up into the DC2, the service data of the user equipment 11-the user equipment 20 can be backed up into the DC3, each level of node is a disaster recovery group identifier assigned to the registered user equipment, and the disaster recovery group identifier is sent to the front-end node (i.e., the first-level node 11) interacting with the node, that is, the disaster recovery group identifier acquired by the first-level node 11 is used to indicate: the correspondence between the primary and backup DC where the second level node 22 is located.
That is, in the failover system 100 composed of X (X > 1) DCs, the first-level node 11 in each DC stores, for different user equipments, the disaster recovery group id of the corresponding second-level node 22, and the service data of the user equipment is stored in both the primary DC and the backup DC indicated by the disaster recovery group id.
For example, taking the first level node 21 shown in fig. 2 as the P-CSCF node 1 shown in fig. 3 as an example, when the P-CSCF node 1 receives a service request sent by the user equipment 1 (i.e., a target user equipment), the P-CSCF node 1 may determine, according to the obtained disaster recovery group identifier (i.e., a target disaster recovery group identifier) corresponding to the user equipment 1, that a primary DC where the second level node 22 (i.e., a backend node of the P-CSCF node 1: the S-CSCF node) is located is DC1, and a backup DC where the secondary DC is DC 2. Then, when the S-CSCF node 1 in the primary DC (i.e. DC1) fails, the P-CSCF node 1 may switch the service request to the S-CSCF node 2 in the backup DC (i.e. DC2) directly according to the target disaster recovery group identifier, and since the service data of the user equipment 1 is stored in the DC2, the S-CSCF node 2 in the DC2 may directly use the service data to execute the service request.
That is to say, according to the fault switching method provided in the embodiment of the present invention, when the second-level node 22 in the main DC fails, the first-level node 21 may directly switch the service request of the target user equipment to the backup DC in which the service data of the target user equipment is stored according to the target disaster recovery group identifier, and the second-level node 22 in the backup DC executes the service request of the target user equipment, so as to avoid the need to access and execute the service data required by the service request of the target user equipment across the DC as in the prior art, thereby reducing the problems of packet loss or surge caused by accessing the data across the DC.
For example, when the failover system 100 is applied to an IMS system, the disaster recovery group identifier may be a domain name or a floating service IP address, and when the failover system 100 is applied to a CS (Circuit Switch)/PS (Packet Switch) system, the disaster recovery group identifier may be a GT code, such as an MSC (Mobile Switching Center) number or an MME number, and the like, which is not limited in this embodiment of the present invention.
Based on the failover system 100 shown in fig. 3, the P-CSCF node will be taken as the first-level node 21, and the S-CSCF node will be taken as the second-level node 22, respectively; and the S-CSCF node is the first level node 21, the AS is the second level node 22, for example, which explains in detail the method for the first level node 21 to obtain the target disaster recovery group identifier corresponding to the target user equipment during the registration process of the user equipment, AS shown in fig. 4, the method includes:
101. the P-CSCF node 1 receives a first registration request sent by a target user equipment.
When any user equipment, e.g. the above target user equipment, accesses the IMS network, a corresponding P-CSCF node, e.g. P-CSCF node 1 within DC1, may be selected according to a preset policy, and a first registration request is sent to P-CSCF node 1.
102. The P-CSCF node 1 sends the first registration request to the S-CSCF node 1.
At this time, since the corresponding S-CSCF node has not been allocated to the target user equipment, the first registration request may be sent to a default S-CSCF node.
Preferably, the first registration request may be sent to an S-CSCF node, i.e., S-CSCF node 1, in the local DC (i.e., DC1), so that the inter-DC data access behavior generated when the first level node interacts with the second level node may be reduced.
103. The S-CSCF node 1 determines, according to the first registration request, one disaster recovery group from a plurality of disaster recovery groups (e.g., N disaster recovery groups, N > 1) as a first target disaster recovery group corresponding to the target user equipment.
Each disaster recovery group comprises a main DC and a backup DC, if the current IMS network comprises 3 DCs, the IMS network comprises 6 disaster recovery groups, and the 6 disaster recovery groups are respectively: DC1-DC2, DC1-DC3, DC2-DC1, DC2-DC3, DC3-DC1 and DC3-DC 2.
In step 102, the PATH (delivery) parameter may be used to carry DC information of the first level node in the first registration request, where the DC information is used to indicate a primary DC and a backup DC where the first level node is located, that is, the primary DC and the backup DC of the P-CSCF node 1, for example, the primary DC of the P-CSCF node 1 is DC1, and the backup DC is DC 2.
At this time, in step 103, the S-CSCF node 1 may determine the first target disaster recovery group by using the primary DC of the P-CSCF node 1 as its primary DC and the backup DC of the P-CSCF node 1 as its backup DC according to the first registration request, and then when the primary DC of the P-CSCF node 1 is DC1 and the backup DC is DC2, the primary DC of the S-CSCF node 1 is also DC1 and the backup DC is also DC 2.
Thus, when the DC1 fails, the P-CSCF node 1 and the S-CSCF node 1 can be switched together to the same standby DC (i.e., DC2), and no cross-DC access operation is required during subsequent interaction between the P-CSCF node 2 and the S-CSCF node 2.
Or, when the first registration request does not carry the DC information of the first level node, the second level node, i.e., S-CSCF node 1, may take the DC currently located as the primary DC of the second level node, any one of the DCs except the primary DC as the backup DC of the second level node, to determine that the first target disaster recovery group, e.g., the DC at which S-CSCF node 1 is currently located is DC1, i.e., DC1 is the primary DC for S-CSCF node 1, the backup DC for S-CSCF node 1 may be any DC other than DC1, this is because the P-CSCF node 1 will typically be connected to an S-CSCF node within the present DC (i.e., S-CSCF node 1), and if DC1 is also the primary DC of S-CSCF node 1, then, at least in normal traffic flows where the primary DC is not failing, the S-CSCF node 1 and the P-CSCF node 1 will not generate an access operation across DC.
104. The S-CSCF node 1 backs up the service data of the target user equipment in the main DC and the backup DC in the first target disaster recovery group.
It can be seen that, compared with the prior art in which data backup is performed with the entire DC as the granularity, in the embodiment of the present invention, data backup is performed with the user equipment as the granularity, so that, once a certain network element in the DC fails, different user equipments belonging to the network element may be migrated to different backup DCs, instead of migrating all the user equipments to the same backup DC, thereby reducing the load pressure of the backup DC.
105. The S-CSCF node 1 records a correspondence between the first target disaster recovery group identifier and the target user equipment in an HSS (Home Subscriber Server).
Subsequently, the network element interacting with the S-CSCF node 1 may obtain, from the HSS, a correspondence between the main and standby DCs of the S-CSCF node 1 corresponding to the target user equipment.
106. The S-CSCF node 1 sends the first target disaster recovery group identity to the P-CSCF node 1.
Specifically, the S-CSCF node 1 may carry a Service-Route header field in the 200ok message, and transmit the first target disaster recovery group identifier in the Service-Route, so that the first-level node (P-CSCF node 1) obtains the target disaster recovery group identifier of the second-level node corresponding to the target user equipment, and subsequently, when the second node in the primary DC fails, the first-level node may switch the Service request of the target user equipment to the second node in the backup DC according to the target disaster recovery group identifier, and the second node in the backup DC stores the Service data of the target user equipment, so that the second node in the backup DC may execute the Service request without performing a DC-crossing operation.
To this end, as shown in table 1, through step 101 and step 106, the P-CSCF node 1 may obtain the disaster recovery group identifier of the P-CSCF node corresponding to each registered user equipment.
TABLE 1
User equipment Disaster recovery group identification Main DC Backup DC
User equipment
1 01 DC1 DC2
User equipment
2 02 DC1 DC3
…… …… …… ……
Further, when the S-CSCF node 1 may further download the subscription information of the target user equipment from the HSS, and if the subscription information includes iFC (initial Filter Criteria) data, the S-CSCF node 1 sends a third party registration request to the AS1, at this time, the S-CSCF node is the first-level node 21, and the AS is the second-level node 22, then the method for the first-level node 21 to obtain the target disaster tolerance group identifier corresponding to the target user equipment, AS shown in fig. 4, further includes the following steps 201 and 205:
201. the S-CSCF node 1 sends a second registration request to the AS 1.
202. The AS1 determines one disaster recovery group from the N disaster recovery groups AS a second target disaster recovery group corresponding to the target user equipment according to the second registration request.
Unlike step 102, in step 201, when the S-CSCF node 1 sends a registration request to the AS1, the existing standard may be extended, and a second private parameter is defined in the second registration request, for example, a contact parameter is defined in the second registration request AS the second private parameter, and the second private parameter may be used to carry DC information of the S-CSCF node 1 (i.e., the first-level node), so AS to implement transfer of DC information between the S-CSCF node 1 and the AS 1.
At this time, in step 202, the AS1 may determine the second target disaster recovery group by using the primary DC of the S-CSCF node 1 AS its primary DC and the backup DC of the S-CSCF node 1 AS its backup DC according to the second registration request. At this time, the main/standby DC relationship of the P-CSCF node, the main/standby DC relationship of the S-CSCF node, and the main/standby DC relationship of the AS are the same, so that the cross-DC access operation can be avoided to the greatest extent.
Or, when the second registration request does not carry the DC information of the S-CSCF node 1, the AS1 may use the current DC AS the primary DC of the second-level node, and use any one of the DCs except the primary DC AS the backup DC of the second-level node, so AS to determine the second target disaster tolerance group.
In addition, for some special group services, such AS family packages, etc., it requires that the user equipments in the group perceive their service processing states, but there is no necessary association relationship between the user equipments in the group, at this time, special identification information, such AS a predefined special domain name, may be carried in the iFC criterion, so that after the S-CSCF node 1 acquires the special domain name carried in the iFC criterion, it sends the special domain name to the AS1 through a Request-URI or Route header field of the registration message, and the AS1, that is, the second-level node stores the correspondence relationship between the special domain name and the target disaster tolerance group, so the AS1 may use the disaster tolerance group corresponding to the received special domain name AS the second target disaster tolerance group, so that the main and standby DCs corresponding to different user equipments in these special group services are the same.
Of course, the special identification information may also be a predefined character string or any other identification, and the second-level node stores a corresponding relationship between the special identification information and the target disaster recovery group, and it can be understood that a person skilled in the art may set the special identification information according to actual applications, and the embodiment of the present invention is not limited thereto.
203. The AS1 backs up the service data of the target user equipment in the main DC and the backup DC in the second target disaster recovery group.
204. The AS1 records the corresponding relationship between the second target disaster recovery group identifier and the target user equipment in the HSS.
205. The AS1 sends the second target disaster recovery group identity to the S-CSCF node 1.
Specifically, in step 205, since the SIP (Session Initiation Protocol) and 3GPP (3rd Generation Partnership Project) protocols between the S-CSCF node and the AS do not define a specific header field, the AS may transfer its own information, such AS the second target disaster recovery group identifier, to the S-CSCF node, so that a first private parameter may be defined in the 200ok message (i.e., the registration response message), for example, the contact parameter is defined in the 200ok message AS the first private parameter, and the first private parameter may be used to carry the target disaster recovery group identifier of the AS to the S-CSCF node.
Or, on the premise of not changing the existing standard, the contact header field in the redirection message can be used to carry the target disaster recovery group identifier of the AS to the S-CSCF node. At this time, the S-CSCF node may record the received target disaster recovery group identifier of the AS, and send a redirection message to the AS using the target disaster recovery group identifier, that is, the redirection message carries the target disaster recovery group identifier, and finally the AS sends a 200ok message to the S-CSCF node to end the registration process.
206. The S-CSCF node 1 sends a 200OK message to the target user equipment through the P-CSCF node 1 to complete the registration procedure.
To this end, through step 101-.
Further, after the target ue completes registration, it may initiate a service request, for example, a call request, to the P-CSCF node 1 in the primary DC, because the P-CSCF node 1 (i.e., the first level node) has already obtained the target disaster recovery group identifier of the S-CSCF node (i.e., the second level node) in the registration process, the P-CSCF node may directly send the service request to the S-CSCF node 1 in the primary DC indicated by the target disaster recovery group identifier according to the target disaster recovery group identifier, and because the S-CSCF node 1 in the primary DC has already stored service data of the target ue, the S-CSCF node 1 may directly execute the service request according to the service data.
And when the S-CSCF node 1 in the main DC fails, at this time, the P-CSCF node may send the service request to the S-CSCF node 2 in the backup DC indicated by the target disaster recovery group identifier according to the target disaster recovery group identifier, and since the service data of the target user equipment has been backed up in the S-CSCF node 2 in the backup DC in the registration process, the S-CSCF node 2 may directly execute the service request according to the service data, thereby avoiding a data access operation across the DC.
Further, in the subsequent interaction process between the S-CSCF node and the AS, the S-CSCF node is a first-level node, and the AS is a second-level node, and the specific process thereof may refer to the interaction process between the P-CSCF node and the S-CSCF node, so that the detailed description thereof is omitted herein.
It should be noted that, when the service request is a call service, the failover method may be performed in a steady-state call process or a new call process, and this is not limited in this embodiment of the present invention.
In the following, the above failover method will be described in detail by taking successive failures of a calling S-CSCF node and a called S-CSCF node in a new call as an example, and specifically, as shown in fig. 5, the method includes:
301. the calling P-CSCF node receives a new call service request, namely an invite request, sent by target user equipment.
In the registration process of the target user equipment, the calling P-CSCF node already acquires the target disaster recovery group identifier of the second-level node, that is, the calling S-CSCF node, and the target disaster recovery group identifier indicates the main DC and the backup DC where the calling S-CSCF node is located, for example, the calling S-CSCF node in the main DC is the calling S-CSCF node 1, and the calling S-CSCF node in the backup DC is the calling S-CSCF node 2.
Then, when the calling S-CSCF node 1 does not fail, the calling P-CSCF node may directly send the invite request to the calling S-CSCF node 1, and when the calling P-CSCF node detects that the calling S-CSCF node 1 fails, the following step 302 is performed.
302. If the calling S-CSCF node 1 fails, the calling P-CSCF node sends an invite request to a calling S-CSCF node 2 in the backup DC indicated by the target disaster recovery group identifier according to the target disaster recovery group identifier of the S-CSCF node.
Since the backup DC also stores the service data of the target user equipment, the calling S-CSCF node 2 in the backup DC can directly process the invite request according to the service data.
303. And the calling S-CSCF node 2 sends the invite request to the I-CSCF node.
304. The I-CSCF node queries the target disaster recovery group identification of the called S-CSCF node from the HSS.
The I-CSCF node is an entry point of the IMS network, during the calling process, a call destined to the IMS network is firstly routed to the I-CSCF node, and then the I-CSCF node acquires the address of the S-CSCF node registered by the user equipment from the HSS, so that the service request is routed to the corresponding S-CSCF node.
Therefore, in step 304, after receiving the invite request sent by the calling S-CSCF node 2, the I-CSCF node queries, from the HSS, a target disaster recovery group identifier of a second-level node, that is, a called S-CSCF node, where the target disaster recovery group identifier indicates a main DC and a backup DC where the called S-CSCF node is located, for example, the called S-CSCF node in the main DC is called S-CSCF node 1, and the called S-CSCF node in the backup DC is called S-CSCF node 2.
Then, when the called S-CSCF node 1 does not fail, the I-CSCF node may directly send the invite request to the called S-CSCF node 1, and when the I-CSCF node detects that the called S-CSCF node 1 fails, the following step 305 is performed.
305. And if the called S-CSCF node 1 fails, the I-CSCF node sends the invite request to the called S-CSCF node 2 in the backup DC indicated by the target disaster recovery group identifier according to the target disaster recovery group identifier of the called S-CSCF node.
306. And the called S-CSCF node 2 sends the updated new disaster tolerance group identification of the called S-CSCF node to the calling S-CSCF node 2.
307. And the calling S-CSCF node 2 sends the updated new disaster tolerance group identification of the calling S-CSCF node to the calling P-CSCF node.
Specifically, taking the current call process as an example, in order to enable a subsequent network element directly interacting with the called S-CSCF node 2, for example, the calling S-CSCF node 2, to directly send the received service request (for example, update message or bye message) to the called S-CSCF node 2 that has not failed, then, in step 306, the called S-CSCF node 2 may send the updated new disaster tolerance group identifier of the called S-CSCF node to the calling S-CSCF node 2 through the I-CSCF node. At this time, the corresponding relationship between the primary DC and the backup DC indicated by the new disaster recovery group identifier of the called S-CSCF node is opposite to the corresponding relationship between the primary DC and the backup DC indicated by the target disaster recovery group identifier of the called S-CSCF node.
Similarly, since the calling S-CSCF node 1 also fails, in order that the calling P-CSCF node may directly send the service request to the calling S-CSCF node 2 that has not failed in the subsequent process of the call, in step 307, the main/standby DC relationship indicated in the target disaster recovery group identifier of the calling S-CSCF node may be switched, so as to obtain the new disaster recovery group identifier of the updated calling S-CSCF node.
Further, after the execution of the current call flow shown in 301-307 is completed, the target disaster recovery group identifier may be updated to the new disaster recovery group identifier in the peripheral network element by using a standard protocol through a re-registration flow of the target user equipment, so that when the peripheral network element needs to interact with the second-level node, the peripheral network element may directly interact with the second-level node in the backup DC without a fault according to the new disaster recovery group identifier of the second-level node.
Exemplarily, taking the S-CSCF node 1 as an example of failure, the re-registration process of the target user equipment is shown in fig. 6, where the re-registration process includes:
401. the target user equipment initiates a re-registration request to the P-CSCF node.
402. And the P-CSCF node initiates a re-registration request of the target user equipment to the I-CSCF node.
403. And if the S-CSCF node 1 fails, the I-CSCF node sends the re-registration request to an S-CSCF node 2 in the backup DC according to the target disaster recovery group identification of the S-CSCF node.
404. The S-CSCF node 2 instructs the HSS to update the recorded correspondence between the target disaster recovery group identity and the target user equipment to the correspondence between the new disaster recovery group identity and the target user equipment.
That is to say, in the re-registration process, the second level node in the backup DC (i.e. the S-CSCF node 2) may update, in the HSS, the target disaster tolerance group identifier of the second level node (i.e. the S-CSCF node) that has been recorded in step 105 to the new disaster tolerance group identifier, so that a network element interacting with the second level node in the future may obtain, from the HSS, the new disaster tolerance group identifier corresponding to the target user equipment, so as to interact with the updated second level node in the backup DC according to the new disaster tolerance group identifier.
405. And the S-CSCF node 2 carries the new disaster tolerance group identification in a 200OK message and sends the message to the P-CSCF node.
Therefore, when receiving a service request sent by target user equipment, a subsequent P-CSCF node can directly send the service request of the target user equipment to the S-CSCF node 2 which does not have a fault according to the new disaster recovery group identifier, and since service data of the target user equipment is stored in a DC (i.e. a backup DC indicated by the original target disaster recovery group identifier) where the S-CSCF node 2 is located, the S-CSCF node 2 can directly execute the service request according to the service data, thereby avoiding a data access operation across DCs, and reducing risks such as packet loss or surge caused by data access across DCs.
406. The P-CSCF node sends a 200OK message to the target user equipment to complete the registration procedure.
It should be noted that the failover system 100 and any failover method provided in the foregoing embodiments may be applied to an IMS Network and an LTE Network, and the embodiments of the present invention are not limited to this, for example, when the failover system 100 is applied to an LTE Network, as shown in fig. 7, each DC may be provided with one or more Network elements such as a RAN (Residential Access Network) node, an MME (Mobility Management Entity), an SGW (Serving Gateway) or a PGW (Packet Data Network Gateway).
For example, taking a registration process of a user equipment in an EPC (Evolved Packet Core, 4G Core network) network as an example, as shown in fig. 7, when the user equipment initiates an attach request to a Core network through a RAN node 1, an MME 1 may allocate a corresponding SGW 1 and a PGW1 to the user equipment, at this time, the SGW 1 may be used as a first-level node, and the PGW1 may be used as a second-level node, so that the PGW1 may determine, from N disaster tolerance groups, that a disaster tolerance group is a target disaster tolerance group of the PGW1, and carry a target disaster tolerance group identifier in a response message for establishing a bearer request to send to the SGW 1, and further, the SGW 1 sends the target disaster tolerance group identifier of the PGW1 to the RAN node 1 through an attach response message.
The target disaster recovery group identifier may be a floating service IP address, where the floating service IP address refers to: the IP address will float across the DC to the backup DC indicated by its target disaster recovery group to take effect after the primary DC fails.
Of course, the SGW 1 may also store the target disaster recovery group identifier of the PGW1 to the local, and subsequently, when the ue executes a service request such as related data and voice, once the SGW 1 detects that the PGW1 fails, the SGW 1 may switch the service of the ue to a backup DC (for example, DC2) indicated by the target disaster recovery group identifier of the PGW1, and execute the service request by the PGW 2 in the DC2, and the DC2 already backs up the service data of the ue, so that a data access operation across the DC is not caused after the service of the ue is switched to the DC2, and a risk of packet loss or surge caused by data access across the DC is reduced.
Of course, in the above embodiment, only the SGW 1 is taken as the first-level node, and the PGW1 is taken as the second-level node for example, it is understood that the first-level node may be any network element in the EPC network, such as an MME, a RAN node, or a PGW, and when the first-level node is determined, the second-level node is a backend network element of the first-level node.
Further, based on any of the above-mentioned failover systems 100 and failover methods, as shown in fig. 8, in the failover system 100 provided in the embodiment of the present invention, a GSC (Global Service Control) node 23 may be further added, where the GSC node 23 is connected to each DC in the failover system 100, and specifically, the GSC node 23 may be configured to send an indication of elastic capacity expansion to a second level node 22 in a backup DC to be taken over for fault tolerance, so that, before failover, the second level node 22 in the backup DC may prepare a corresponding resource, and proxy the second level node in the fault DC to process a corresponding Service request.
Further, the above-mentioned and arbitrary failover methods can also be applied to an NFV (network function virtualization) system shown in fig. 9, where the NFV system includes: network function virtualization scheduling Node (NFVO), virtual network function management node (VNFM), virtual machine infrastructure management node (VIM), Operation Support System (OSS) or Business Support System (BSS), element management node (EM), VNF node, and NFVI).
Among them, in the NFV system, NFVO, VNFM, and VIM constitute an NFV management and organization (NFV-MANO) domain of the NFV system, wherein NFVO may also be referred to as a network function virtualization orchestrator. Specifically, the VNFM is responsible for lifecycle management of VNF instances, such as instantiating, expanding/contracting, querying, updating, terminating, and so on; the VIM is a management entrance of infrastructure and resources, provides resource management for the VNF instance, and comprises functions of configuration maintenance, resource monitoring, alarming, performance management and the like of infrastructure-related hardware and virtualized resources for the VNF instance; and the NFVO can perform scheduling functions such as operation, management and coordination on the VIM, and is connected with all VIMs and VNFMs in the NFV system.
Specifically, as still shown in fig. 9, in the embodiment of the present invention, a GSC node 23 may be additionally arranged in a conventional NFV system, where the GSC node 23 may be specifically deployed on a VNF, a NFV-MANO, an EM, or an independent network node, and the GSC node 23 may monitor and maintain a state of the VNF, where the VNF may be regarded as any node in the DC, for example, the first-level node or the second-level node.
For example, an interface may be added between the GSC node 23 and the VNFM, so that after detecting a failure of the second-stage node in the main DC, the GSC node 23 indicates to the VNFM to perform elastic capacity expansion on the second-stage node in the backup DC.
For example, when the step 302 is executed, if the calling S-CSCF node 1 (i.e., the second-level node in the main DC) fails, at this time, the GSC node 23 may send a capacity expansion instruction to the calling S-CSCF node 2 (i.e., the second-level node in the backup DC) to the VNFM, so that the VNFM performs a capacity expansion operation on the VNF where the calling S-CSCF node 2 is located in the backup DC, and thus, the calling S-CSCF node 2 may prepare corresponding resources in advance, thereby receiving the service request of the failed calling S-CSCF node 1.
It may be understood that, a person skilled in the art may set information such as a specific capacity expansion policy and a capacity expansion size according to an actual application scenario, for example, the GSC node 23 may carry the number of the target user devices in the capacity expansion instruction, so that the VNFM may determine information such as the specific capacity expansion policy and the capacity expansion size according to the number of the target user devices, or the GSC node 23 may carry the number of VMs (Virtual machines) that need to be subjected to capacity expansion in the capacity expansion instruction, so that the VNFM may determine information such as the specific capacity expansion policy and the capacity expansion size according to the number of the VMs, which is not limited in this embodiment of the present invention.
In addition, the GSC node 23 may also expose an interface to the NFVO, so as to provide the NFVO with information about each DC in the VNF, so as to implement orchestration of the entire network resources by the NFVO, which is not limited in this embodiment of the present invention.
To this end, an embodiment of the present invention provides a failover method, where a first-level node serving as a front-end node may obtain a target disaster recovery group identifier corresponding to a target user equipment in a registration process of the target user equipment, where the target disaster recovery group identifier is used to indicate: the corresponding relation between a main DC and a backup DC is used as a second-level node of the back-end node, and the main DC and the backup DC both store the service data of the target user equipment; then, after the first-level node receives the service request sent by the target user equipment, if it is detected that the second-level node in the main DC fails, the first-level node may directly switch the service request to the second-level node in the backup DC according to the target disaster recovery group identifier, and since the service data of the target user equipment is stored in the second-level node in the backup DC, the second-level node in the backup DC may directly execute the service request, thereby avoiding the need to access the service data required for executing the service request of the target user equipment across the DC as in the prior art, and further reducing the problems of packet loss or surge and the like caused by accessing the data across the DC.
The foregoing embodiments mainly introduce the solutions provided in the embodiments of the present invention from the perspective of interaction between network elements. It will be appreciated that each network element, such as the first level node 21, the second level node 22 and the GSC node 23, for implementing the above functions, comprises corresponding hardware structures and/or software modules for performing each function. Those of skill in the art will readily appreciate that the present invention can be implemented in hardware or a combination of hardware and computer software, with the exemplary elements and algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiment of the present invention, the first-level node 21 and the second-level node 22 may be divided into functional modules according to the above method examples, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, the division of the modules in the embodiment of the present invention is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
In the case of dividing the functional modules by corresponding functions, fig. 10 shows a possible structural diagram of the first-level node 21 in the above embodiment, where the first-level node 21 includes: an acquisition unit 31 and a transmission unit 32. Wherein, the obtaining unit 31 is configured to support the first level node 21 to perform the process 101 in fig. 4 and the processes 301 and 304 in fig. 5, and the sending unit 32 is configured to support the first level node 21 to perform the process 102 in fig. 4 and the processes 302 and 305 in fig. 5. All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.
Accordingly, in the case of dividing the functional modules by corresponding functions, fig. 11 shows a possible structural diagram of the second-level node 22 involved in the above embodiment, where the second-level node 22 includes: an acquisition unit 41, a transmission unit 42, a backup unit 43, a recording unit 44, and a determination unit 45. The obtaining unit 41 is configured to support the second level node 22 to perform the processes 101 and 201 in fig. 4 and the process 302 in fig. 5, and the sending unit 42 is configured to support the second level node 22 to perform the processes 106 and 206 in fig. 4 and the processes 303, 306, and 307 in fig. 5; backup unit 43 is used to support second level node 22 to execute processes 104 and 203 in fig. 4; the recording unit 44 is configured to support the second level node 22 to perform the processes 105, 204 in fig. 4 and the process 308 in fig. 5; the determination unit 45 is used to support the second level node 22 to perform the processes 103 and 202 in fig. 4. All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.
In the case of integrated units, fig. 12 shows a possible structural diagram of the first level node 21/second level node 22 involved in the above embodiment. The first level node 21/second level node 22 includes: a processing module 1302 and a communication module 1303. The processing module 1302 is configured to control and manage the actions of the first level node 21/the second level node 22, for example, the processing module 1302 is configured to support the node 21 to execute the processes 101 and 106 and 201 and 205 in fig. 4, the process 301 and 307 in fig. 5, the process 401 and 406 in fig. 6, and/or other processes for the technologies described herein. The communication module 1303 is configured to support communication between the first level node 21/the second level node 22 and other network entities, and the first level node 21/the second level node 22 may further include a storage module 1301 configured to store program codes and data of the first level node 21/the second level node 22.
The Processing module 1302 may be a Processor or a controller, such as a Central Processing Unit (CPU), a general purpose Processor, a Digital Signal Processor (DSP), an Application-Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others. The communication module 1303 may be a transceiver, a transceiver circuit, a communication interface, or the like. The storage module 1301 may be a memory.
When the processing module 1302 is a processor, the communication module 1303 is a communication interface, and the storage module 1301 is a memory, the node according to the embodiment of the present invention may be the first-level node 21/the second-level node 22 shown in fig. 13.
Referring to fig. 13, the first level node 21/second level node 22 includes: a processor 1312, a communication interface 1313, a memory 1311, and a bus 1314. Wherein the communication interface 1313, the processor 1312, and the memory 1311 are connected to each other through a bus 1314; the bus 1314 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 13, but this is not intended to represent only one bus or type of bus.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware or in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in Random Access Memory (RAM), flash Memory, Read Only Memory (ROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable disk, a compact disc Read Only Memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in a core network interface device. Of course, the processor and the storage medium may reside as discrete components in a core network interface device.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (19)

1. A failover method, comprising:
a first-level node receives a registration request sent by target user equipment;
the first level node sends the registration request to a second level node, so that the second level node determines a target disaster recovery group identifier corresponding to the target user equipment, where the target disaster recovery group identifier is used to indicate: the corresponding relation between the main data center DC where the second-level node is located and the backup DC, wherein the first-level node is a front-end node of the second-level node, and the main DC and the backup DC both store the service data of the target user equipment;
the first-level node receives the target disaster recovery group identification sent by the second-level node;
the first-level node receives a service request sent by the target user equipment;
and if the second-level node in the main DC fails, the first-level node switches the service request to the second-level node in the backup DC according to the target disaster recovery group identifier.
2. The method of claim 1, wherein the determining, by the second level node, a target disaster recovery group identity corresponding to the target user equipment comprises:
the second-level node determines a disaster tolerance group from a plurality of disaster tolerance groups as a target disaster tolerance group of the second-level node according to the registration request, wherein each disaster tolerance group comprises a main DC and a backup DC;
the second-level node backs up the service data of the target user equipment in a main DC and a backup DC in the target disaster recovery group;
and the second-level node sends the target disaster recovery group identification to the first-level node.
3. The method of claim 2, wherein when the first level node is a serving call session control function (S-CSCF) node and the second level node is an Application Server (AS), the second level node sending a target disaster recovery group identifier to the first level node, comprising:
and the AS sends a registration response message to the S-CSCF node, wherein the registration response message contains a first private parameter, and the first private parameter is used for carrying the target disaster recovery group identifier.
4. The method according to claim 2 or 3, wherein the registration request carries DC information of the first-stage node, and the DC information is used for indicating a main DC and a backup DC where the first-stage node is located;
wherein, the second level node determining, according to the registration request, one disaster recovery group from a plurality of disaster recovery groups as a target disaster recovery group of the second level node, includes:
and the second-level node takes the main DC of the first-level node as the main DC of the second-level node and takes the backup DC of the first-level node as the backup DC of the second-level node according to the registration request so as to determine the target disaster recovery group.
5. The method of claim 4, wherein when the first level node is an S-CSCF node and the second level node is an AS, the registration request comprises a second private parameter, and the second private parameter is used for carrying DC information of the S-CSCF node.
6. The method according to claim 2 or 3, wherein the determining, by the second level node, one disaster recovery group from the plurality of disaster recovery groups as the target disaster recovery group of the second level node according to the registration request comprises:
and the second-level node takes the current DC as the main DC of the second-level node, and takes any one DC except the main DC as the backup DC of the second-level node so as to determine the target disaster recovery group.
7. The method according to claim 2 or 3, wherein the determining, by the second level node, one disaster recovery group from the plurality of disaster recovery groups as the target disaster recovery group of the second level node according to the registration request comprises:
the second-level node receives the special identification information sent by the first-level node;
and the second-level node takes the disaster recovery group corresponding to the special identification information as the target disaster recovery group, and the corresponding relation between the special identification information and the target disaster recovery group is stored in the second-level node.
8. The method according to any one of claims 1-3, wherein after the first level node obtains the target disaster recovery group identity corresponding to the target user equipment, the method further comprises:
and if the second-level node in the main DC does not have a fault, the first-level node sends the service request to the second-level node in the main DC according to the disaster recovery group identification.
9. The method according to claim 2 or 3, wherein after the second level node determines, according to the registration request, one disaster recovery group from a plurality of disaster recovery groups as a target disaster recovery group of the second level node, the method further comprises:
the second-level node records the corresponding relation between the target disaster recovery group identification and the target user equipment in a Home Subscriber Server (HSS);
wherein, if a second level node in the main DC fails, the first level node switches the service request to the second level node in the backup DC according to the target disaster recovery group identifier, and then further includes:
a second-level node in the backup DC determines a new disaster recovery group identifier after the service request is switched, where a correspondence between the main DC and the backup DC indicated by the new disaster recovery group identifier is opposite to a correspondence between the main DC and the backup DC indicated by the target disaster recovery group identifier;
and the second-level node in the backup DC updates the recorded corresponding relation between the target disaster recovery group identifier and the target user equipment into the corresponding relation between the new disaster recovery group identifier and the target user equipment in the HSS.
10. The method according to any of claims 1-3, wherein before the first level node switches the service request to a second level node within the backup DC according to the target disaster recovery group identification, further comprising:
the global service control GSC node monitors whether each level of nodes in each DC have faults or not;
and if detecting that a second-stage node in the main DC fails, the GSC node sends a capacity expansion instruction to the second-stage node in the backup DC, wherein the capacity expansion instruction is used for indicating the second-stage node in the backup DC to perform capacity expansion operation.
11. A first level node, comprising:
the device comprises an acquisition unit, a registration unit and a registration unit, wherein the acquisition unit is used for receiving a registration request sent by target user equipment;
a sending unit, configured to send the registration request to a second level node, so that the second level node determines a target disaster recovery group identifier corresponding to the target user equipment, where the target disaster recovery group identifier is used to indicate: the corresponding relation between the main DC and the backup DC where the second-level node is located, wherein the first-level node is a front-end node of the second-level node, and the main DC and the backup DC both store the service data of the target user equipment; receiving a service request sent by the target user equipment;
the obtaining unit is further configured to receive the target disaster recovery group identifier sent by the second-level node;
and the sending unit is used for switching the service request to the second level node in the backup DC according to the target disaster recovery group identifier if the second level node in the main DC fails.
12. The first level node of claim 11,
the sending unit is further configured to send the service request to a second-level node in the main DC according to the disaster recovery group identifier if the second-level node in the main DC does not have a fault.
13. A second level node, comprising:
a determining unit, configured to determine, according to a registration request sent by a first-level node, a disaster recovery group from multiple disaster recovery groups as a target disaster recovery group of the second-level node, where each disaster recovery group includes a primary DC and a backup DC, and the first-level node is a front-end node of the second-level node;
the backup unit is used for backing up the service data of the target user equipment in the main DC and the backup DC in the target disaster recovery group;
a sending unit, configured to send a target disaster recovery group identifier to the first-level node, so that if the second-level node in the main DC fails, the first-level node switches a service request to the second-level node in the backup DC.
14. The second level node of claim 13, wherein when the first level node is a serving Call Session control function (S-CSCF) node and the second level node is an Application Server (AS),
the sending unit is specifically configured to send a registration response message to the S-CSCF node, where the registration response message includes a first private parameter, and the first private parameter is used to carry the target disaster recovery group identifier.
15. The second-level node according to claim 13 or 14, wherein the registration request carries DC information of the first-level node, and the DC information is used to indicate a main DC and a backup DC where the first-level node is located;
the determining unit is specifically configured to determine the target disaster recovery group by using the primary DC of the first-level node as the primary DC of the second-level node and using the backup DC of the first-level node as the backup DC of the second-level node according to the registration request.
16. The second stage node of claim 13 or 14,
the determining unit is specifically configured to use a current DC as a primary DC of the second-stage node, and use any one of the DCs except the primary DC as a backup DC of the second-stage node, so as to determine the target disaster recovery group.
17. The second level node according to claim 13 or 14, further comprising an acquisition unit, wherein,
the acquiring unit is used for receiving the special identification information sent by the first-level node;
the determining unit is specifically configured to use the disaster recovery group corresponding to the special identifier information as the target disaster recovery group, and the second-level node stores a corresponding relationship between the special identifier information and the target disaster recovery group.
18. The second level node of claim 13 or 14, further comprising a recording unit, wherein,
the recording unit is configured to record a corresponding relationship between the target disaster recovery group identifier and the target user equipment in the HSS;
the determining unit is further configured to determine a new disaster recovery group identifier after the service request is switched, where a correspondence between the main DC and the backup DC indicated by the new disaster recovery group identifier is opposite to a correspondence between the main DC and the backup DC indicated by the target disaster recovery group identifier; and updating the recorded corresponding relation between the target disaster recovery group identification and the target user equipment into the corresponding relation between the new disaster recovery group identification and the target user equipment.
19. A fail-over system comprising a first level node according to claim 11 or 12 and a second level node according to any of claims 13-18.
CN201610964275.6A 2016-10-28 2016-10-28 Fault switching method, device and system Active CN108011737B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610964275.6A CN108011737B (en) 2016-10-28 2016-10-28 Fault switching method, device and system
PCT/CN2017/102802 WO2018076972A1 (en) 2016-10-28 2017-09-21 Failover method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610964275.6A CN108011737B (en) 2016-10-28 2016-10-28 Fault switching method, device and system

Publications (2)

Publication Number Publication Date
CN108011737A CN108011737A (en) 2018-05-08
CN108011737B true CN108011737B (en) 2021-06-01

Family

ID=62024356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610964275.6A Active CN108011737B (en) 2016-10-28 2016-10-28 Fault switching method, device and system

Country Status (2)

Country Link
CN (1) CN108011737B (en)
WO (1) WO2018076972A1 (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109450604A (en) * 2018-09-25 2019-03-08 国家电网有限公司客户服务中心 A kind of strange land dual-active system business rank division method standby towards calamity
CN109995765A (en) * 2019-03-11 2019-07-09 河北远东通信系统工程有限公司 A kind of dual-homed register method of IMS network AGCF
CN112199240B (en) * 2019-07-08 2024-01-30 华为云计算技术有限公司 Method for switching nodes during node failure and related equipment
CN112311566B (en) * 2019-07-25 2023-10-17 中国移动通信集团有限公司 Service disaster recovery method, device, equipment and medium
CN112954264B (en) * 2019-12-10 2023-04-18 浙江宇视科技有限公司 Platform backup protection method and device
CN113535464B (en) * 2020-04-17 2024-02-02 海能达通信股份有限公司 Disaster recovery backup method, server, cluster system and storage device
CN114079612B (en) * 2020-08-03 2024-06-04 阿里巴巴集团控股有限公司 Disaster recovery system and management and control method, device, equipment and medium thereof
CN112099990A (en) * 2020-08-31 2020-12-18 新华三信息技术有限公司 Disaster recovery backup method, device, equipment and machine readable storage medium
CN112615903A (en) * 2020-11-30 2021-04-06 中科热备(北京)云计算技术有限公司 Intelligent redundancy method for cloud backup
CN113268378A (en) * 2021-05-18 2021-08-17 Oppo广东移动通信有限公司 Data disaster tolerance method and device, storage medium and electronic equipment
CN114095342B (en) * 2021-10-21 2023-12-26 新华三大数据技术有限公司 Backup realization method and device
CN114422331B (en) * 2022-01-21 2024-04-05 中国工商银行股份有限公司 Disaster recovery switching method, device and system
CN115102872B (en) * 2022-07-05 2024-02-27 广东长天思源环保科技股份有限公司 Environment-friendly monitoring data self-proving system based on industrial Internet identification analysis
CN115396296B (en) * 2022-08-18 2023-06-27 中电金信软件有限公司 Service processing method, device, electronic equipment and computer readable storage medium
CN115277376B (en) * 2022-09-29 2022-12-23 深圳华锐分布式技术股份有限公司 Disaster recovery switching method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101094237A (en) * 2007-07-30 2007-12-26 中兴通讯股份有限公司 Method for sharing in load among net elements in IP multimedia sub system
CN101447890A (en) * 2008-04-15 2009-06-03 中兴通讯股份有限公司 Improved application server disaster tolerance system of next generation network and method thereof
CN102546544A (en) * 2010-12-20 2012-07-04 中兴通讯股份有限公司 Networking structure of application servers in IMS (Information Management System) network
CN104461779A (en) * 2014-11-28 2015-03-25 华为技术有限公司 Distributed data storage method, device and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7783618B2 (en) * 2005-08-26 2010-08-24 Hewlett-Packard Development Company, L.P. Application server (AS) database with class of service (COS)
CN101459533B (en) * 2008-04-16 2011-10-26 中兴通讯股份有限公司 System and method for improved application server disaster tolerance in next generation network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101094237A (en) * 2007-07-30 2007-12-26 中兴通讯股份有限公司 Method for sharing in load among net elements in IP multimedia sub system
CN101447890A (en) * 2008-04-15 2009-06-03 中兴通讯股份有限公司 Improved application server disaster tolerance system of next generation network and method thereof
CN102546544A (en) * 2010-12-20 2012-07-04 中兴通讯股份有限公司 Networking structure of application servers in IMS (Information Management System) network
CN104461779A (en) * 2014-11-28 2015-03-25 华为技术有限公司 Distributed data storage method, device and system

Also Published As

Publication number Publication date
WO2018076972A1 (en) 2018-05-03
CN108011737A (en) 2018-05-08

Similar Documents

Publication Publication Date Title
CN108011737B (en) Fault switching method, device and system
CN109391979B (en) Method, device and system for restoring P-CSCF (proxy-Call Session control function) fault
CN110536330B (en) UE migration method, device, system and storage medium
US11316708B2 (en) Gx session recovery for policy and charging rules function
RU2449501C2 (en) Method, device and system for ims emergency recovery
CN110535676B (en) SMF dynamic disaster tolerance realization method, device, equipment and storage medium
US20120023360A1 (en) Mobility management entity failover
CN101536464B (en) Method and apparatus for controlling communications
CN105592486A (en) Disaster tolerance method, network element and server
CN108009018B (en) Load adjusting method, device and system
CN110958719B (en) UE migration method, NRF, standby SMF, system and storage medium
US20140359340A1 (en) Subscriptions that indicate the presence of application servers
US20170063628A1 (en) Function binding and selection in a network
WO2006053502A1 (en) A method for ensuring the information consistency between the network nodes
EP2487986A1 (en) Method, device and system for processing connection of called party
CN108141440A (en) Sip server with multiple identifiers
JP2012521115A (en) Reassigning the serving proxy function in IMS
US9264946B2 (en) Homogeneous circuit switched voice support indication in a mobile network
CN110024358B (en) Access to services provided by a distributed data storage system
EP3736696A1 (en) Early gx/rx session failure detection and response
US20160302055A1 (en) Information processing system
EP2815549B1 (en) Method and apparatus for improved handling of ims node blacklisting
CN104168275A (en) Registration control method and device
EP3677060A1 (en) A method and devices of notifying a first user equipment, ue, of a subscriber in a telecommunication network on a dialog status of a second ue of said same subscriber.
CN118042449A (en) Network storage function fault detection and disaster recovery method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant