WO2021000647A1 - Service protection method, network device, distributed service processing system, and storage medium - Google Patents
Service protection method, network device, distributed service processing system, and storage medium Download PDFInfo
- Publication number
- WO2021000647A1 WO2021000647A1 PCT/CN2020/088318 CN2020088318W WO2021000647A1 WO 2021000647 A1 WO2021000647 A1 WO 2021000647A1 CN 2020088318 W CN2020088318 W CN 2020088318W WO 2021000647 A1 WO2021000647 A1 WO 2021000647A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cpu
- information
- protected
- tunnel
- service
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/46—Interconnection of networks
- H04L12/4633—Interconnection of networks using encapsulation techniques, e.g. tunneling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/46—Interconnection of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/46—Interconnection of networks
- H04L12/4641—Virtual LANs, VLANs, e.g. virtual private networks [VPN]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0663—Performing the actions predefined by failover planning, e.g. switching to standby network elements
Definitions
- the present invention relates to the field of communications, in particular to a service protection method, network equipment, distributed service processing system and storage medium.
- L2TP Layer 2 Tunneling Protocol
- LAC Layer 2 Tunneling Protocol
- AAA Authentication, Authorization, Accounting, authentication and authorization and accounting
- LNS L2TP Network Server, L2TP network server
- the server completes the final authorization and verification of users, receives tunnels and connection requests from LAC, and establishes a PPP channel connecting LNS and users.
- the L2TP services carried by LNS equipment and LAC equipment are gradually increasing.
- the L2TP multi-core service processing based on the distributed architecture is proposed.
- the multi-core service board greatly decentralizes the main control.
- the board has strong pressure and load capacity.
- the L2TP service will be interrupted, which will cause losses to telecommunications operations.
- the service protection method, network equipment, distributed service processing system, and storage medium provided by the embodiments of the present invention solve the technical problem to a certain extent: the L2TP multi-core service based on the distributed architecture in the related technology is easy to be caused by the multi-core service board or The failure of the chip on the multi-core service board is interrupted, thereby affecting the user's communication service experience.
- an embodiment of the present invention provides a service protection method, including:
- tunnel information is information of a tunnel carried by the CPU
- session information is information of a session carried by the tunnel
- the tunnel information and the session information are sent to the main control board, and the tunnel information and the session information are used by the main control board to send the services on the CPU to the standby CPU for the standby CPU to continue processing Business on this CPU.
- the embodiment of the present invention also provides a service protection method, including:
- tunnel information is information about a tunnel carried by the protected CPU
- session information is information about a session carried by the tunnel
- the embodiment of the present invention also provides a service protection method, including:
- the embodiment of the present invention also provides a network device, which includes a processor, a memory, and a communication bus;
- the communication bus is used to realize connection and communication between the processor and the memory
- the processor is configured to execute the first service protection program stored in the memory to implement the steps of the foregoing first service protection method; or, the processor is configured to execute the second service protection program stored in the memory to implement the foregoing first service protection program.
- the steps of the two service protection methods; the processor is used to execute the third service protection program stored in the memory to realize the steps of the third service protection method.
- the embodiment of the present invention also provides a distributed service processing system, which includes a main control board and a plurality of CPUs.
- the main control board is a network device that executes a second service protection program by the above-mentioned processor.
- Part is a network device where the processor executes the first service protection program, and part is a network device where the processor executes the third service protection program.
- An embodiment of the present invention also provides a storage medium that stores at least one of a first service protection program, a second service protection program, and a third service protection program.
- the first service protection program may be configured by one or more
- the second service protection program can be executed by one or more processors to implement the steps of the second service protection method; the third The service protection program may be executed by one or more processors to implement the steps of the third service protection method described above.
- FIG. 1 is an interaction flowchart of the service protection method provided in Embodiment 1 of the present invention.
- FIG. 2 is an architecture diagram of a distributed service processing system provided in Embodiment 1 of the present invention.
- FIG. 3 is a flow chart of an interaction between the LAC device and the CPU on the LNS device side shown in the first embodiment of the present invention
- FIG. 4 is a flow chart of determining a backup CPU in the first solution shown in the first embodiment of the present invention
- FIG. 5 is a flowchart of determining a backup CPU in the third solution shown in the first embodiment of the present invention.
- FIG. 6 is a flowchart of the protected CPU side of the service protection method provided in the second embodiment of the present invention.
- FIG. 7 is a flow chart on the main control board side of the service protection method provided in Embodiment 2 of the present invention.
- FIG. 8 is a schematic diagram of a hardware structure of a network device provided in Embodiment 3 of the present invention.
- LNS equipment and LAC equipment carry more and more services, and the load pressure is also increasing.
- LNS Take LNS as an example: one LNS device may establish VPN tunnels with multiple LAC devices Therefore, the LNS equipment carries a lot of L2TP services.
- the traditional centralized L2TP is deployed on the main control board, but because the main control board has limited CPU resources and low processing efficiency, it cannot meet the increasing business load demand.
- the LNS equipment can be deployed in a distributed manner, and multiple distributed service boards are used to carry the services that the original main control board needs to undertake, thereby distributing the pressure of the main control board.
- S102 The protected CPU collects tunnel information and session information on the CPU.
- CPUs are divided into protected CPUs and standby CPUs.
- the protected CPUs refer to the services carried by the standby CPUs to continue processing and the protected CPUs.
- the protected CPUs usually refer to those that have to Middle refers to the faulty CPU of business processing.
- the spare CPU refers to the CPU that continues to process the services of the protected CPU. It is understandable that any CPU may serve as a protected CPU in some scenarios, and as a backup CPU in other scenarios.
- the tunnel information refers to the information of the tunnel carried by the protected CPU
- the session information refers to the information of the session carried in each tunnel.
- one CPU can carry multiple tunnels, and one tunnel can carry multiple sessions.
- the protected CPU collects tunnel information and session information on the CPU when it fails and cannot continue to process the services carried by it. For example, in some examples of this embodiment, if a CPU determines that it needs to be reset to deal with the current failure, the CPU can determine that it is a protected CPU, and thus collect its own tunnel information and session information.
- S104 The protected CPU sends the tunnel information and the session information to the main control board.
- Figure 2 shows a distributed business processing system, please refer to Figure 2:
- the distributed service processing system 2 includes a main control board 21, a first service board 22 and a second service board 23, and a message transceiving processing board 24.
- the main control board 21 is responsible for system management, protocol message processing, and routing management of the entire distributed service processing system 2
- the message receiving and sending processing board 24 is responsible for interface traffic management, message forwarding, and switching traffic management, and can transmit L2TP reports.
- the document is distributed to each service board according to the processing rules set by the main control board 21.
- each service board is independent of each other and performs distributed processing in parallel, thereby improving the throughput of the system.
- the first service board 22 includes multiple CPUs
- the second service board 23 also includes multiple CPUs. It is assumed that the distributed service processing system 2 is a distributed LNS device. Please refer to the schematic diagram of the L2TP establishment process shown in Fig. 3:
- LAC initiates a tunnel establishment request SCCRQ message
- LAC returns a confirmation SCCCN message to LNS after receiving the response
- the session is established.
- the LNS can perform a PPP (Point to Point Protocol) interaction process with the user, and assign an IP address to the user, and then the user can access the network.
- PPP Point to Point Protocol
- the main control board determines the backup CPU corresponding to the protected CPU.
- the main control board After the main control board receives the tunnel information and session information sent by the protected CPU, it can be determined that the protected CPU cannot continue to process its services temporarily. Therefore, the main control board needs to determine a backup CPU for the protected CPU for backup.
- the CPU can process services that cannot be performed on the protected CPU, thereby protecting these services and avoiding interruption of these services.
- the backup CPU selected by the main control board for the protected CPU is also the CPU in the distributed service processing system, and the CPU in the distributed service processing system originally carries some services. Therefore, the backup CPU uses While processing its own business, the remaining resources are used to protect the business on the protected CPU. Therefore, when determining the backup CPU corresponding to the protected CPU, the main control board will refer to the resource vacancy information of each CPU, that is, the ability of a CPU to handle additional services while processing its own services.
- the main control board may determine its resource vacancy information according to at least one of the CPU utilization rate, the memory usage rate, the number of available tunnels, and the number of available sessions of the CPU.
- PRI can characterize the vacancy of CPU resources. The higher the PRI value, the more vacant CPU resources, and vice versa, the less vacant CPU resources are. Therefore, the higher the PRI value of a CPU, the higher the probability of the CPU being selected as the backup CPU.
- CPU Rate refers to the CPU utilization, W1 is the weight of the CPU free rate; Mem Rate is the memory usage rate, W2 is the weight of the memory remaining rate; T can represent the number of available tunnels, and W3 is the weight of the available tunnels ; S can represent the number of available sessions, and W4 is the weight of the number of available sessions.
- the number of available sessions and the number of available tunnels are inconsistent with the measurements of the CPU vacancy rate and memory remaining rate. Therefore, in this embodiment, the number of available sessions and the number of available tunnels need to be normalized, so that four The measures of the people are consistent.
- the value of T can be the ratio of the number of available tunnels to the rated total number of tunnels in the distributed service processing system, and the value range is (0,1); the value of S is the number of available sessions and the number of The ratio of the rated total number of sessions, the value range is (0,1).
- a CPU can be used as a backup CPU.
- the state of the CPU itself is also very important. For example, in some cases, although a CPU still has a lot of processing resources remaining, these processing resources are sufficient A lot of extra services are processed, but if the CPU itself is abnormal, it cannot be used as a backup CPU. Therefore, in some examples of this embodiment,
- Stat represents the running status of the CPU. If the value of Stat is 1, it means that the running status of the CPU is normal. When the running status of the CPU is abnormal, the value of Stat is 0. Therefore, regardless of the CPU vacancy rate, memory remaining rate and the number of available tunnels for a CPU. As long as the number of available sessions is abnormal, the PRI value of the CPU is 0.
- the main control board may determine the resource free information of the CPU only according to the CPU free rate or the memory free rate of the CPU.
- the main control board can also determine the resource vacancy information of a CPU only based on the number of available tunnels or the number of available sessions of a CPU.
- the main control board determines the spare CPU corresponding to the protected CPU, the main control board should first obtain the resource vacancy information of each CPU in the distributed business processing system.
- the board obtains the spare information of each CPU resource and determines the scheme of the spare CPU:
- the main control board periodically obtains the resource vacancy information of each CPU in the distributed service processing system, and periodically determines the spare CPU for each CPU, please refer to Figure 4:
- the main control board periodically determines the resource vacancy information of each CPU in the distributed service processing system.
- each CPU in the distributed service processing system periodically reports its own resource vacancy information to the main control board.
- the CPU reports to the main control board to indicate its own resource vacancy.
- the information includes its own CPU utilization, memory utilization, the number of available tunnels, and the number of available sessions.
- the CPU can uniquely characterize its identity through the L(a)N(b) identification, where L stands for "Location", which can characterize the business board where the CPU is located, where a is The number of the service board where the CPU is located, and N represents the serial number of the CPU on the service board where it is located, and b is the unique identifier of the CPU on the service board.
- L stands for "Location”
- the main control board can determine which CPU on which service board the resource vacancy information carried in the report information it receives belongs to.
- the main control board determines the corresponding standby CPU for each CPU according to the resource vacancy information of each CPU acquired last time, and stores the mapping relationship between each CPU and the corresponding standby CPU.
- the main control board whenever the main control board reacquires the resource vacancy information of each CPU, it will reconfigure a spare CPU for each CPU.
- the main control board selects a backup CPU from other CPUs in the distributed service processing system to which the CPU belongs. Therefore, a CPU and its backup CPU are both CPUs belonging to the same distributed service processing system.
- the main control board determines a backup CPU for each CPU in the distributed service processing system in the following manner:
- the main control board calculates the PRI value of each CPU according to the resource vacancy information of each CPU. For the calculation method, please refer to the previous introduction and will not be repeated here. After calculating the PRI value of each CPU, the main control board determines the CPU with the highest PRI value, and uses this CPU as the CPU used by all CPUs in the distributed service processing system except the CPU itself. In addition, the main control board also needs to select a spare CPU for the CPU with the highest PRI value. In one example, the main control board can use the CPU with the next highest PRI value as the spare CPU for the CPU with the highest PRI value.
- mapping relationship can be stored for use when a CPU needs to be protected before the mapping relationship is updated next time.
- S406 The main control board queries the backup CPU corresponding to the protected CPU according to the mapping relationship.
- the main control board can query the standby CPU corresponding to the CPU according to the pre-stored mapping relationship.
- the main control board does not need to temporarily obtain the resource vacancy information of each CPU in the distributed business processing system, nor does it need to perform temporary calculations, so it can increase the speed of determining the backup CPU for faster
- the service of the protected CPU is switched to the backup CPU to avoid the impact of the service due to the time process of selecting the backup CPU, which is beneficial to reduce the user's perception of the failure of the protected CPU.
- the main control board periodically determines the backup CPU for each CPU in the distributed business processing system, so sometimes, after the main control board determines the backup CPU for each CPU, the backup CPU may not work. , Because it is possible that in a certain cycle, each CPU in the distributed business processing system is normal, and there is no protected CPU.
- the main control board periodically obtains the resource vacancy information of each CPU in the distributed service processing system, but temporarily determines the backup CPU corresponding to the protected CPU.
- the main control board can periodically obtain the resource vacancy information of each CPU in the distributed service processing system, as in solution 1, for example, the main control board periodically sends to each CPU in the distributed service processing system
- the vacant information request allows each CPU to report its own resource vacancy information according to the vacant information request.
- the main control board does not need to periodically send vacant information requests, but each CPU performs periodic monitoring by itself. When the reporting period arrives, each CPU automatically reports itself to the main control board. This can reduce the burden on the main control board.
- the main control board does not determine the spare CPU for each CPU every time it obtains the resource vacancy information of each CPU.
- the CPU resource vacancy information is stored.
- the failed CPU that is, the protected CPU
- the main control board does not frequently determine the backup CPU for each CPU in the distributed business processing system, and when determining the backup CPU for the protected CPU, it does not need to determine the backup CPU for other CPUs in the distributed business processing system. , So it can reduce the occupation of its own processing resources.
- the main control board since the main control board periodically obtains the resource vacancy information of each CPU in the distributed business processing system, the main control board always determines the spare CPU based on it. The newly acquired resource vacancy information, therefore, for the purpose of reducing the consumption of storage resources, the main control board can use the latest resource vacancy information of each CPU to overwrite the previous resource vacancy information.
- the main control board periodically obtains the resource vacancy information of each CPU in the distributed business processing system, but in this scheme, the main control board will only determine the backup for a certain protected CPU.
- the CPU only temporarily obtains the resource vacancy information of each CPU in the distributed business processing system. Please refer to a flow chart of determining the backup CPU for the protected CPU by the main control board shown in Figure 5 below:
- S502 Send a request for spare information to CPUs other than the protected CPU in the distributed service processing system.
- S504 Receive its own resource vacancy information reported by each CPU according to the vacancy information request.
- S506 Determine a corresponding backup CPU for the protected CPU according to the resource vacancy information of each CPU.
- the main control board After acquiring the resource vacancy information of other CPUs except the protected CPU, the main control board can determine the backup CPU of the protected CPU according to the resource vacancy information. It is understandable that, since the main control board only needs to select a backup CPU for the protected CPU at this time, the main control board can directly select the one with the best resource vacancy situation represented by the resource vacancy information as the backup CPU. Of course, in some other examples of this embodiment, the main control board may also select only the CPU with better resource vacancy as the backup CPU, instead of selecting the optimal one.
- the main control board determines through calculation that the resource vacancy of 3 CPUs is better, and they are all sufficient to carry all the services of the protected CPU, then in this case, the main control board can choose from these 3 CPUs arbitrarily One is used as the backup CPU of the protected CPU, and even the main control board can choose the one with the least spare resources among the three CPUs as the spare CPU, because in this way, the other two CPUs with better spare resources can be reserved. It is necessary to select a backup CPU to prevent the subsequent failure of a CPU that carries a larger amount of traffic.
- S108 The main control board sends the services of the protected CPU to the standby CPU according to the tunnel information and the session information.
- the main control board After the main control board determines the backup CPU of the protected CPU, it can send the services of the protected CPU to the backup CPU according to the tunnel information and session information reported by the protected CPU, so that the backup CPU can process the services of the protected CPU.
- the spare CPU selected by the main control board for the protected CPU has a large reserve of resources, and the spare CPU has sufficient resources to carry all the services of the protected CPU.
- the main control The board can directly distribute all the services of the protected CPU to the standby CPU.
- the spare CPU selected by the main control board has few spare resources, and may only be able to handle part of the business on the protected CPU while processing its own services. In this case, the main control board needs Select from the services of the protected CPU, and only filter out some services and deliver them to the standby CPU.
- the main control board may randomly select a part of services from the services of the protected CPU according to the spare CPU resources and deliver it to the spare CPU.
- the main control board screens the services of the protected CPU, it can be selected according to the importance of the services.
- the main control board selects the services of the protected CPU in units of tunnels, that is, if a tunnel is selected by the main control board, all services carried on the tunnel will be delivered to On the standby CPU, if a tunnel is filtered out, all services carried on the tunnel can only be interrupted.
- the main control board can determine the protection sensitivity of each tunnel according to the tunnel state Ts, the tunnel keep-alive time Tk, and the session volume Tn in the tunnel.
- Tid refers to the tunnel number
- Sent refers to the protection sensitivity corresponding to a tunnel. From the above formula, it can be seen that the protection sensitivity of a tunnel is equal to the corresponding tunnel status, tunnel keep-alive time, and session in the tunnel. Measure the product of the three. There is no doubt that in some other examples of this embodiment, the main control board may also use other methods to calculate the protection sensitivity corresponding to each tunnel, or use other methods to filter the services of the protected CPU.
- the standby CPU After the standby CPU receives the services of the protected CPU issued by the main control board, it can process these services. It is understandable that, since the backup CPU itself has its own services to be processed, the backup CPU provided in this embodiment is actually using its own redundant resources to protect the protected services. Therefore, the service protection provided by this embodiment is The scheme is actually a redundant protection scheme.
- the main control board can switch back the services issued to the standby CPU back to the restored protected CPU, allowing the protected CPU to continue its original services. deal with. Since then, the relationship between protection and protection between the protected CPU and the standby CPU can be lifted.
- the main control board selects a backup CPU for the CPU so as to continue to process all or part of the services of the failed CPU, thereby reducing the failure of the failed CPU The impact on user business and enhance user experience.
- the selection of the backup CPU is based on the resource vacancy of each CPU, a CPU with more resources can be selected as the backup CPU, so that the backup CPU can undertake as much business on the failed CPU as possible.
- the main control board can filter the services of the faulty CPU that the backup CPU needs to carry to avoid the problem of excessive load on the backup CPU and affecting the backup CPU's own services.
- the CPU can determine that it cannot continue to perform business processing currently.
- S604 The CPU collects its own tunnel information and session information.
- the tunnel information collected by the CPU refers to the information of the tunnel carried by the CPU
- the session information refers to the information of the session carried in each tunnel.
- One CPU can carry multiple tunnels, and one tunnel can carry multiple sessions.
- S606 The CPU sends its own tunnel information and session information to the main control board.
- the CPU After the CPU collects its own tunnel information and session information, it can send these information to the main control board.
- the CPU when the CPU does not fail, the CPU will report its own resource vacancy information to the main control board regularly, or report its own resource vacancy to the main control board at the request of the main control board. information.
- the resource vacancy information reported by the CPU includes, but is not limited to, the L(a)N(b) identifier of the CPU itself, running status flag, CPU utilization, memory utilization, available tunnel resource data, available session resource data, etc.
- the resource vacancy information reported by the CPU can be used by the main control board to determine whether the CPU is suitable for the backup CPU of the failed CPU after other CPUs fail.
- S702 The main control board receives resource vacancy information regularly reported by each CPU in the distributed service processing system.
- the main control board determines the used CPU corresponding to each CPU in the distributed service processing system by using the scheme-one correspondence method in the first embodiment. Therefore, the main control board can periodically obtain the resource vacancy information reported by each CPU.
- the main control board determines a backup CPU for each CPU in the distributed service processing system according to the latest reported resource vacancy information.
- the main control board After the main control board obtains the resource vacancy information reported by each CPU in the distributed business processing system, it can determine the CPU according to one or more of the CPU utilization rate, memory utilization rate, number of available tunnels, and number of available sessions. Then, based on the PRI value of each CPU, the standby CPU corresponding to each CPU is determined.
- the main control board stores the latest mapping relationship between each CPU and the corresponding standby CPU.
- the main control board After determining the mapping relationship between each CPU and the corresponding standby CPU, the main control board can store the mapping relationship. It is understandable that every time each CPU reports the resource vacancy information, the main control board will determine a mapping relationship, but because the main control board determines the backup CPU for a protected CPU, it always depends on the current latest Mapping relationship, so when the main control board stores the mapping relationship, it can perform overwriting storage, that is, always overwrite the previous mapping relationship with the latest mapping relationship, which can reduce the impact of the mapping relationship storage on the main control board side. Occupation of storage resources.
- S708 The main control board receives the tunnel information and the session information sent by the protected CPU.
- the main control board When the main control board receives the tunnel information and session information sent by a CPU, it can determine that the CPU should be faulty and cannot continue to process its own services. Therefore, the main control board determines that the CPU is the current protected CPU.
- S710 The main control board queries the backup CPU corresponding to the protected CPU according to the stored mapping relationship.
- the main control board Since the main control board has determined that except for the used CPU corresponding to each CPU in the distributed service processing system, after the main control board determines the protected CPU, it can determine the corresponding protected CPU by querying the stored mapping relationship Which is the spare CPU?
- S712 The main control board judges whether the spare CPU resources are sufficient to carry all the services of the protected CPU.
- the main control board After querying the backup CPU corresponding to the protected CPU, the main control board can determine whether the spare CPU resources are enough to carry all the services of the protected CPU. If the judgment result is yes, then go to S714, otherwise, go to S716.
- the main control board delivers all services of the protected CPU to the standby CPU according to the tunnel information and session information of the protected CPU.
- the main control board can directly send all the services of the protected CPU to Standby CPU.
- S716 The main control board determines the protection sensitivity corresponding to each tunnel on the protected CPU.
- the main control board needs to select from the business of the protected CPU , Only select part of the business and send it to the standby CPU.
- the main control board selects the part that can be carried by the standby CPU from the services of the protected CPU in the order of tunnel protection sensitivity according to the resource vacancy of the standby CPU.
- the main control board can select the part that can be carried by the backup CPU from the services of the protected CPU in the order of the protection sensitivity of the tunnel according to the resource vacancy of the backup CPU. Since the amount of sessions carried in each tunnel on the protected CPU is not fixed, the main control board cannot directly determine how many tunnels are selected based on the amount of traffic carried by each tunnel. In an example of this embodiment, the main control board may first select the tunnel with the highest protection sensitivity value on the protected CPU, and determine whether there are resources left after the backup CPU carries all the services in the tunnel.
- the main control board further selects the tunnel with the second highest protection sensitivity value, and determines whether there are free resources to carry the services of other tunnels after the backup CPU further carries the services in the tunnel... and so on, until the backup CPU has no resources or resources The vacancy is not enough to carry the business in a certain tunnel.
- S720 The main control board delivers the selected service to the standby CPU.
- the main control board After the main control board selects the business, it delivers the selected business to the standby CPU, and the standby CPU can process the delivered business.
- S722 The main control board monitors whether the protected CPU is restored.
- the main control board delivers all or part of the services of the protected CPU to the standby CPU, the main control board can monitor the state of the protected CPU to determine whether the state of the protected CPU has been restored. If the judgment result is yes, Then enter S724, otherwise continue to execute S722.
- the main control board may periodically send the status query information to the protected CPU, and determine the status of the protected CPU according to the feedback of the protected CPU.
- the protected CPU may actively report the information of its state restoration to the main control board after its state is restored.
- S724 The main control board switches the services of the protected CPU back to the protected CPU.
- the main control board determines that the state of the protected CPU is restored, it can switch the services that originally belonged to the protected CPU back to the protected CPU for processing.
- These services include the services carried by the spare CPU as well as the priority due to spare CPU resources. The business that has not been issued to the standby by the main control board.
- the main control board determines the spare CPU for each CPU in the distributed service processing system in advance. Therefore, when a CPU fails, the main control board can quickly query the spare CPU of the CPU , So that after a CPU failure, the business migration on the failed CPU is realized as soon as possible, avoiding long-term business interruption and user experience problems.
- This embodiment provides a storage medium that can store one or more computer programs that can be read, compiled, and executed by one or more processors.
- the storage medium can store At least one of the first service protection program, the second service protection program, and the third service protection program, wherein the first service protection program can be executed by one or more processors to implement any of the service protection methods introduced in the foregoing embodiments
- the network device 80 includes a processor 81, a memory 82, and a communication bus 83 for connecting the processor 81 and the memory 82, where the memory 82 may be the aforementioned storage
- the storage medium of the first business protection program The processor 81 may read the first service protection program, compile it, and execute the process on the protected CPU side in the service protection method introduced in the foregoing embodiment:
- the processor 81 collects tunnel information and session information on the CPU, where the tunnel information is the information of the tunnel carried by the CPU, and the session information is the information of the session carried by the tunnel;
- the processor 81 sends the tunnel information and session information to the main control board.
- the tunnel information and the session information are used by the main control board to send the services on the CPU to the standby CPU for the standby CPU to continue processing the services on the CPU.
- the processor 81 may collect tunnel information and session information on the local CPU when the local CPU needs to be reset.
- the processor 81 may also read the second service protection program, compile and execute the process on the main control board side in the service protection method introduced in the foregoing embodiment:
- the processor 81 receives the tunnel information and session information sent by the protected CPU, then determines the backup CPU corresponding to the protected CPU, and sends the services of the protected CPU to the backup CPU according to the tunnel information and the session information.
- the standby CPU is determined according to the resource vacancy information of each CPU in the distributed service processing system, and the resource vacancy information can represent the resource vacancy of the CPU.
- the resource vacancy information of a CPU is determined according to one or more of the CPU utilization rate, the memory usage rate, the number of available tunnels, and the number of available sessions.
- the processor 81 before the processor 81 receives the tunnel information and session information sent by the protected CPU, it also periodically determines the resource vacancy information of each CPU in the distributed service processing system, and then according to the latest obtained The resource vacancy information of each CPU determines the corresponding backup CPU for each CPU, and stores the mapping relationship between each CPU and the corresponding backup CPU. When it is necessary to determine the backup CPU corresponding to the protected CPU, query the backup CPU corresponding to the protected CPU according to the mapping relationship.
- the processor 81 before the processor 81 receives the tunnel information and session information sent by the protected CPU, it also periodically determines the resource vacancy information of each CPU in the distributed service processing system. When it is necessary to determine the backup CPU corresponding to the protected CPU, the corresponding backup CPU is determined for the protected CPU according to the resource vacancy information of each CPU obtained last time.
- the processor 81 when the processor 81 determines the backup CPU corresponding to the protected CPU, it sends a request for spare information to other CPUs in the distributed service processing system except the protected CPU, and then receives each CPU According to the vacant information request, report its own resource vacancy information and determine the corresponding spare CPU for the protected CPU according to the resource vacancy information of each CPU.
- the processor 81 sends the services of the protected CPU to the backup CPU according to the tunnel information and the session information, after the protected CPU returns to the normal operating state, it switches the services belonging to the protected CPU back to the protected CPU.
- the processor 81 will determine whether the resource vacancy of the backup CPU is sufficient to carry all the services on the protected CPU according to the resource vacancy information of the backup CPU; if not, it will screen the services of the protected CPU and will screen it The reserved business is delivered to the standby CPU. If it is determined that the spare CPU resources are sufficient to carry all the services on the protected CPU, the processor 81 directly issues all the services of the protected CPU to the spare CPU.
- the processor 81 uses the tunnel as a unit to determine the protection sensitivity corresponding to each tunnel.
- the protection sensitivity represents the degree of protection required for the services in the tunnel. The higher the protection sensitivity, the degree of protection required for the services in the tunnel. The higher is; after the protection sensitivity is determined, the processor 81 selects reserved services in the order of protection sensitivity from high to low according to the spare CPU resource vacancy.
- the processor 81 may determine the protection sensitivity of each tunnel according to the tunnel state Ts of the tunnel, the tunnel keep-alive time Tk, and the session volume Tn in the tunnel.
- the processor 81 may also read the third service protection program, compile and execute the process on the standby CPU side in the service protection method introduced in the foregoing embodiment:
- the processor 81 reports the resource vacancy information of the CPU to the main control board, and then receives the services of the protected CPU sent by the main control board, and processes the services of the protected CPU.
- This embodiment also provides a distributed service processing system, which includes a main control board and multiple CPUs.
- the main control board is a network device for the processor 81 to execute the second service protection program, and part of the multiple CPUs is the aforementioned processing
- the processor 81 is a network device that executes the first service protection program, and part of it is a network device that the processor 81 executes the third service protection program.
- a CPU when a CPU cannot continue to process service, it can send its own tunnel information and session information to the main control board, allowing the main control board to process distributed services from Determine a spare CPU for yourself among other CPUs in the system, and let the spare CPU continue to process your own services, so as to avoid interruption of all services carried by itself due to your own reasons, which will affect the user experience, which is beneficial to improvement
- the disaster tolerance performance of the distributed business processing system enhances system stability and enhances the user's business experience.
- the functional modules/units in the system, and the device can be implemented as software (which can be implemented by program code executable by a computing device) , Firmware, hardware and their appropriate combination.
- the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may consist of several physical components. The components are executed cooperatively.
- Some physical components or all physical components can be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit .
- the computer-readable medium may include computer storage Medium (or non-transitory medium) and communication medium (or temporary medium).
- computer storage medium includes volatile and non-volatile memory implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data).
- flexible, removable and non-removable media are examples of flexible, removable and non-removable media.
- Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassette, tape, magnetic disk storage or other magnetic storage devices, or Any other medium used to store desired information and that can be accessed by a computer.
- communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media . Therefore, the present invention is not limited to any specific combination of hardware and software.
Abstract
Description
CPUCPU | 备用CPUSpare CPU |
CPU1CPU1 | CPU3CPU3 |
CPU2CPU2 | CPU3CPU3 |
CPU3CPU3 | CPU4CPU4 |
CPU4CPU4 | CPU3CPU3 |
CPU5CPU5 | CPU3CPU3 |
Claims (17)
- 一种业务保护方法,包括:A method of business protection, including:收集本CPU上的隧道信息与会话信息,所述隧道信息为本CPU所承载隧道的信息,所述会话信息为所述隧道所承载会话的信息;Collect tunnel information and session information on the CPU, where the tunnel information is information of a tunnel carried by the CPU, and the session information is information of a session carried by the tunnel;将所述隧道信息与所述会话信息发送给主控板,所述隧道信息与所述会话信息用于所述主控板将本CPU上的业务发送给备用CPU,供所述备用CPU继续处理本CPU上的业务。The tunnel information and the session information are sent to the main control board, and the tunnel information and the session information are used by the main control board to send the services on the CPU to the standby CPU for the standby CPU to continue processing Business on this CPU.
- 如权利要求1所述的业务保护方法,其中,所述收集本CPU上的隧道信息与会话信息包括:The service protection method according to claim 1, wherein the collecting tunnel information and session information on the CPU includes:在本CPU需要进行复位时,收集本CPU上的隧道信息与会话信息。When the CPU needs to be reset, collect tunnel information and session information on the CPU.
- 一种业务保护方法,包括:A method of business protection, including:接收被保护CPU发送的隧道信息与会话信息,所述隧道信息为所述被保护CPU所承载的隧道的信息,所述会话信息为所述隧道所承载会话的信息;Receiving tunnel information and session information sent by a protected CPU, where the tunnel information is information about a tunnel carried by the protected CPU, and the session information is information about a session carried by the tunnel;确定所述被保护CPU对应的备用CPU,所述备用CPU与所述被保护CPU属于同一分布式业务处理系统;Determine a backup CPU corresponding to the protected CPU, where the backup CPU and the protected CPU belong to the same distributed service processing system;根据所述隧道信息与所述会话信息将所述被保护CPU的业务发送给所述备用CPU。Sending the service of the protected CPU to the standby CPU according to the tunnel information and the session information.
- 如权利要求3所述的业务保护方法,其中,所述备用CPU根据所述分布式业务处理系统中各CPU的资源空余信息确定,所述资源空余信息能够表征CPU的资源空余。The service protection method according to claim 3, wherein the standby CPU is determined according to the resource vacancy information of each CPU in the distributed service processing system, and the resource vacancy information can represent the resource vacancy of the CPU.
- 如权利要求4所述的业务保护方法,其中,一个CPU的资源空余信息根据所述CPU的CPU利用率、内存使用率、可用隧道数以及可用会话数中的一个或多个确定。The service protection method of claim 4, wherein the resource vacancy information of a CPU is determined according to one or more of the CPU utilization rate, the memory usage rate, the number of available tunnels, and the number of available sessions of the CPU.
- 如权利要求4所述的业务保护方法,其中,所述接收被保护CPU发送的隧道信息与会话信息之前,还包括:The service protection method according to claim 4, wherein before receiving the tunnel information and session information sent by the protected CPU, the method further comprises:周期性确定所述分布式业务处理系统中各CPU的资源空余信息;Periodically determining the resource vacancy information of each CPU in the distributed service processing system;根据最近一次获取的各CPU的资源空余信息为各CPU确定出对应的备用CPU,并存储各CPU与对应备用CPU间的映射关系;Determine the corresponding spare CPU for each CPU according to the resource vacancy information of each CPU obtained last time, and store the mapping relationship between each CPU and the corresponding spare CPU;所述确定所述被保护CPU对应的备用CPU包括:The determining the backup CPU corresponding to the protected CPU includes:根据所述映射关系查询所述被保护CPU对应的备用CPU。Query the backup CPU corresponding to the protected CPU according to the mapping relationship.
- 如权利要求4所述的业务保护方法,其中,所述接收被保护CPU发送的隧道信息与会话信息之前,还包括:The service protection method according to claim 4, wherein before receiving the tunnel information and session information sent by the protected CPU, the method further comprises:周期性确定所述分布式业务处理系统中各CPU的资源空余信息;Periodically determining the resource vacancy information of each CPU in the distributed service processing system;所述确定所述被保护CPU对应的备用CPU包括:The determining the backup CPU corresponding to the protected CPU includes:根据最近一次获取的各CPU的资源空余信息为所述被保护CPU确定出对应的备用CPU。The corresponding spare CPU is determined for the protected CPU according to the resource vacancy information of each CPU acquired last time.
- 如权利要求4所述的业务保护方法,其中,所述确定所述被保护CPU对应的备用CPU包括:The service protection method according to claim 4, wherein said determining the backup CPU corresponding to the protected CPU comprises:向所述分布式业务处理系统中除所述被保护CPU以外的其他CPU发送空余信息请求;Sending a request for spare information to CPUs other than the protected CPU in the distributed service processing system;接收各CPU根据所述空余信息请求上报的自身的资源空余信息;Receiving its own resource vacancy information reported by each CPU according to the vacancy information request;根据各CPU的资源空余信息为所述被保护CPU确定出对应的备用CPU。The corresponding spare CPU is determined for the protected CPU according to the resource free information of each CPU.
- 如权利要求3所述的业务保护方法,其中,所述根据所述隧道信息与所述会话信息将所述被保护CPU的业务发送给所述备用CPU之后,还包括:The service protection method according to claim 3, wherein after the sending the service of the protected CPU to the backup CPU according to the tunnel information and the session information, the method further comprises:在所述被保护CPU恢复正常运行状态之后,将属于所述被保护CPU的业务切回所述被保护CPU。After the protected CPU returns to the normal operating state, the services belonging to the protected CPU are switched back to the protected CPU.
- 如权利要求3-9任一项所述的业务保护方法,其中,所述根据所述隧道信息与所述会话信息将所述被保护CPU的业务发送给所述备用CPU包括:9. The service protection method according to any one of claims 3-9, wherein the sending the service of the protected CPU to the backup CPU according to the tunnel information and the session information comprises:根据所述备用CPU的资源空余信息确定所述备用CPU的资源空余是否足以承载所述被保护CPU上的全部业务;Determining, according to the resource vacancy information of the backup CPU, whether the resource vacancy of the backup CPU is sufficient to carry all the services on the protected CPU;若否,则对所述被保护CPU的业务进行筛选,并将筛选保留的业务下发给所述备用CPU。If not, the services of the protected CPU are screened, and the services reserved by the screening are delivered to the standby CPU.
- 如权利要求10所述的业务保护方法,其中,所述对所述被保护CPU的业务进行筛选包括:The service protection method according to claim 10, wherein the screening of the service of the protected CPU comprises:以隧道为单位确定各隧道对应的保护敏感度,所述保护敏感度表征所述隧道中业务被保护的需求度,所述保护敏感度越高,则所述隧道中业务被保护的需求度越高;The protection sensitivity corresponding to each tunnel is determined in units of tunnels. The protection sensitivity represents the degree of protection required for the services in the tunnel. The higher the protection sensitivity, the greater the degree of protection required for the services in the tunnel. high;根据所述备用CPU的资源空余按照保护敏感度从高到低的顺序选择保留的业务。The reserved services are selected in the order of protection sensitivity from high to low according to the resource vacancy of the standby CPU.
- 如权利要求11所述的业务保护方法,其中,所述以隧道为单位确定各隧道对应的保护敏感度包括:The service protection method according to claim 11, wherein said determining the protection sensitivity corresponding to each tunnel by using a tunnel as a unit comprises:根据隧道的隧道状态Ts、隧道保活时间Tk、隧道内的会话量Tn确定各隧道的保护敏感度。The protection sensitivity of each tunnel is determined according to the tunnel state Ts, the tunnel keep-alive time Tk, and the session volume Tn in the tunnel.
- 如权利要求10所述的业务保护方法,其中,若确定所述备用CPU的资源空余足以承载所述被保护CPU上的全部业务,则所述业务保护方法还包括:直接将所述被保护CPU的全部业务下发给所述备用CPU。The service protection method according to claim 10, wherein if it is determined that the spare CPU resources are sufficient to carry all the services on the protected CPU, the service protection method further comprises: directly connecting the protected CPU All the services of is delivered to the standby CPU.
- 一种业务保护方法,包括:A method of business protection, including:向主控板上报本CPU的资源空余信息;Report the CPU's resource free information to the main control board;接收所述主控板发送的被保护CPU的业务;Receiving the protected CPU service sent by the main control board;对所述被保护CPU的业务进行处理。Process the services of the protected CPU.
- 一种网络设备,包括处理器、存储器及通信总线,其中:A network device including a processor, a memory and a communication bus, in which:所述通信总线用于实现处理器和存储器之间的连接通信;The communication bus is used to realize connection and communication between the processor and the memory;所述处理器用于执行存储器中存储的第一业务保护程序,以实现如权利要求1或2所述的业务保护方法的步骤;或,所述处理器用于执行存储器中存储的第二业务保护程序,以实现如权利要求3-13任一项所述的业务保护方法的步骤;所述处理器用于执行存储器中存储的第三业务保护程序,以实现如权利要求14所述的业务保护方法的步骤。The processor is configured to execute the first service protection program stored in the memory to implement the steps of the service protection method according to claim 1 or 2; or, the processor is configured to execute the second service protection program stored in the memory , To implement the steps of the service protection method according to any one of claims 3-13; the processor is used to execute the third service protection program stored in the memory to implement the service protection method according to claim 14 step.
- 一种分布式业务处理系统,包括主控板以及多个CPU,其中,所述主控板为权利要求15中处理器执行第二业务保护程序的网络设备,所述多个CPU中的部分为权利要求15中处理器执行第一业务保护程序的网络设备,部分为权利要求15中处理器执行第三业务保护程序的网络设备。A distributed service processing system, comprising a main control board and multiple CPUs, wherein the main control board is a network device in which the processor in claim 15 executes a second service protection program, and some of the multiple CPUs are The network device in which the processor in claim 15 executes the first service protection program is part of the network device in which the processor in claim 15 executes the third service protection program.
- 一种存储介质,存储有第一业务保护程序、第二业务保护程序以及第三业务保护程序中的至少一个,其中,所述第一业务保护程序可被一个或者多个处理器执行,以实现如权利要求1或2所述的业务保护方法的步骤;所述第二业务保护程序可被一个或者多个处理器执行,以实现如权利要求3-13任一项所述的业务保护方法的步骤;所述第三业务保护程序可被一个或者多个处理器执行,以实现如权利要求14所述的业务保护方法的步 骤。A storage medium that stores at least one of a first service protection program, a second service protection program, and a third service protection program, wherein the first service protection program can be executed by one or more processors to implement The steps of the service protection method according to claim 1 or 2; the second service protection program can be executed by one or more processors to implement the service protection method according to any one of claims 3-13 Step; The third service protection program can be executed by one or more processors to implement the steps of the service protection method according to claim 14.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910586539.2 | 2019-07-01 | ||
CN201910586539.2A CN112187494A (en) | 2019-07-01 | 2019-07-01 | Service protection method, network equipment and distributed service processing system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021000647A1 true WO2021000647A1 (en) | 2021-01-07 |
Family
ID=73914262
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/088318 WO2021000647A1 (en) | 2019-07-01 | 2020-04-30 | Service protection method, network device, distributed service processing system, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112187494A (en) |
WO (1) | WO2021000647A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104025522A (en) * | 2012-01-09 | 2014-09-03 | 瑞典爱立信有限公司 | Expanding network functionalities for openflow based split-architecture networks |
US20150006953A1 (en) * | 2013-06-28 | 2015-01-01 | Hugh W. Holbrook | System and method of a hardware shadow for a network element |
US20180013588A1 (en) * | 2015-11-02 | 2018-01-11 | International Business Machines Corporation | Distributed virtual gateway appliance |
CN109716293A (en) * | 2016-09-21 | 2019-05-03 | 高通股份有限公司 | Distributed branch is executed using fusion treatment device core in a processor-based system to predict |
-
2019
- 2019-07-01 CN CN201910586539.2A patent/CN112187494A/en active Pending
-
2020
- 2020-04-30 WO PCT/CN2020/088318 patent/WO2021000647A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104025522A (en) * | 2012-01-09 | 2014-09-03 | 瑞典爱立信有限公司 | Expanding network functionalities for openflow based split-architecture networks |
US20150006953A1 (en) * | 2013-06-28 | 2015-01-01 | Hugh W. Holbrook | System and method of a hardware shadow for a network element |
US20180013588A1 (en) * | 2015-11-02 | 2018-01-11 | International Business Machines Corporation | Distributed virtual gateway appliance |
CN109716293A (en) * | 2016-09-21 | 2019-05-03 | 高通股份有限公司 | Distributed branch is executed using fusion treatment device core in a processor-based system to predict |
Non-Patent Citations (1)
Title |
---|
XIANG, ZHEN: "Research of Key Technologies for Building Distributed System Based on Multi-core Processors", CHINA MASTER’S THESES FULL-TEXT DATABASE, 15 April 2012 (2012-04-15), XP055772557 * |
Also Published As
Publication number | Publication date |
---|---|
CN112187494A (en) | 2021-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109344014B (en) | Main/standby switching method and device and communication equipment | |
US20030037165A1 (en) | Dynamic load sharing system using a virtual router | |
TWI701916B (en) | Method and device for self-recovering management ability in distributed system | |
CN107508694B (en) | Node management method and node equipment in cluster | |
US11546215B2 (en) | Method, system, and device for data flow metric adjustment based on communication link state | |
WO2012083669A1 (en) | Method and apparatus for switching between primary-standby devices based on access gateway | |
CN111355649A (en) | Flow reinjection method, device and system | |
WO2019148716A1 (en) | Data transmission method, server, and storage medium | |
EP3622670B1 (en) | Connectivity monitoring for data tunneling between network device and application server | |
KR101586354B1 (en) | Communication failure recover method of parallel-connecte server system | |
WO2018103665A1 (en) | L2tp-based device management method, apparatus and system | |
US8614943B2 (en) | Method and apparatus for protecting subscriber access network | |
EP3618350A1 (en) | Protection switching method, device and system | |
US7519855B2 (en) | Method and system for distributing data processing units in a communication network | |
US11611816B2 (en) | Service data processing method and device | |
US10205630B2 (en) | Fault tolerance method for distributed stream processing system | |
US8370897B1 (en) | Configurable redundant security device failover | |
WO2021000647A1 (en) | Service protection method, network device, distributed service processing system, and storage medium | |
EP1867081A1 (en) | Distributed redundancy capacity licensing in a telecommunication network element | |
CN112995054B (en) | Flow distribution method and device, electronic equipment and computer readable medium | |
EP3435615B1 (en) | Network service implementation method, service controller, and communication system | |
WO2021057350A1 (en) | Flexible ethernet link failure response method, apparatus, device and medium | |
CN112948177A (en) | Disaster recovery backup method and device, electronic equipment and storage medium | |
WO2022083503A1 (en) | Data processing method and device | |
CN111984376B (en) | Protocol processing method, device, equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20834400 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20834400 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.05.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20834400 Country of ref document: EP Kind code of ref document: A1 |