WO2021000647A1

WO2021000647A1 - Service protection method, network device, distributed service processing system, and storage medium

Info

Publication number: WO2021000647A1
Application number: PCT/CN2020/088318
Authority: WO
Inventors: 张健
Original assignee: 中兴通讯股份有限公司
Priority date: 2019-07-01
Filing date: 2020-04-30
Publication date: 2021-01-07
Also published as: CN112187494A

Abstract

Embodiments of the present invention provide a service protection method, a network device, a distributed service processing system, and a storage medium. The service protection method comprises: collecting tunnel information and session information on a present CPU, wherein the tunnel information is the information of a tunnel carried by the present CPU, and the session information is the information of a session carried by the tunnel; and transmitting the tunnel information and the session information to a main control board, wherein the tunnel information and the session information are used for transmitting a service on the present CPU to a standby CPU by the main control board, so that the standby CPU continues to process the service on the present CPU.

Description

Business protection method, network equipment, distributed business processing system and storage medium

Cross references to related applications

This application is filed based on a Chinese patent application with an application number of 201910586539.2 and an application date of July 1, 2019, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated into this application by reference.

Technical field

The present invention relates to the field of communications, in particular to a service protection method, network equipment, distributed service processing system and storage medium.

Background technique

L2TP (Layer 2 Tunneling Protocol) is a layer 2 tunneling protocol tunneling technology, which is one of the basic technologies for establishing a secure VPN (Virtual Private Network, virtual private network). LAC (L2TP Access Concentrator, L2TP access concentrator) is an access device for L2TP, which provides AAA (Authentication, Authorization, Accounting, authentication and authorization and accounting) services for various user access, functions of initiating tunnels and session connections, and Proxy authentication function for VPN users. LNS (L2TP Network Server, L2TP network server) is a VPN server on the L2TP enterprise side. The server completes the final authorization and verification of users, receives tunnels and connection requests from LAC, and establishes a PPP channel connecting LNS and users.

In practical applications, the L2TP services carried by LNS equipment and LAC equipment are gradually increasing. In order to meet the increasing service load demand, the L2TP multi-core service processing based on the distributed architecture is proposed. The multi-core service board greatly decentralizes the main control. The board has strong pressure and load capacity. However, once the multi-core service board has a fault that needs to be restarted to recover or an unrecoverable fault caused by chips and other devices, the L2TP service will be interrupted, which will cause losses to telecommunications operations.

Summary of the invention

The service protection method, network equipment, distributed service processing system, and storage medium provided by the embodiments of the present invention solve the technical problem to a certain extent: the L2TP multi-core service based on the distributed architecture in the related technology is easy to be caused by the multi-core service board or The failure of the chip on the multi-core service board is interrupted, thereby affecting the user's communication service experience.

In order to solve the foregoing technical problems to at least a certain extent, an embodiment of the present invention provides a service protection method, including:

Collect tunnel information and session information on the CPU, where the tunnel information is information of a tunnel carried by the CPU, and the session information is information of a session carried by the tunnel;

The tunnel information and the session information are sent to the main control board, and the tunnel information and the session information are used by the main control board to send the services on the CPU to the standby CPU for the standby CPU to continue processing Business on this CPU.

The embodiment of the present invention also provides a service protection method, including:

Receiving tunnel information and session information sent by a protected CPU, where the tunnel information is information about a tunnel carried by the protected CPU, and the session information is information about a session carried by the tunnel;

Determine a backup CPU corresponding to the protected CPU, where the backup CPU and the protected CPU belong to the same distributed service processing system;

Sending the service of the protected CPU to the standby CPU according to the tunnel information and the session information.

Report the CPU's resource free information to the main control board;

Receiving the protected CPU service sent by the main control board;

Process the services of the protected CPU.

The embodiment of the present invention also provides a network device, which includes a processor, a memory, and a communication bus;

The communication bus is used to realize connection and communication between the processor and the memory;

The processor is configured to execute the first service protection program stored in the memory to implement the steps of the foregoing first service protection method; or, the processor is configured to execute the second service protection program stored in the memory to implement the foregoing first service protection program. The steps of the two service protection methods; the processor is used to execute the third service protection program stored in the memory to realize the steps of the third service protection method.

The embodiment of the present invention also provides a distributed service processing system, which includes a main control board and a plurality of CPUs. The main control board is a network device that executes a second service protection program by the above-mentioned processor. Part is a network device where the processor executes the first service protection program, and part is a network device where the processor executes the third service protection program.

An embodiment of the present invention also provides a storage medium that stores at least one of a first service protection program, a second service protection program, and a third service protection program. The first service protection program may be configured by one or more The second service protection program can be executed by one or more processors to implement the steps of the second service protection method; the third The service protection program may be executed by one or more processors to implement the steps of the third service protection method described above.

Other features and corresponding beneficial effects of the present invention are described in the latter part of the specification, and it should be understood that at least some of the beneficial effects will become apparent from the description in the specification of the present invention.

Description of the drawings

FIG. 1 is an interaction flowchart of the service protection method provided in Embodiment 1 of the present invention;

2 is an architecture diagram of a distributed service processing system provided in Embodiment 1 of the present invention;

3 is a flow chart of an interaction between the LAC device and the CPU on the LNS device side shown in the first embodiment of the present invention;

FIG. 4 is a flow chart of determining a backup CPU in the first solution shown in the first embodiment of the present invention;

FIG. 5 is a flowchart of determining a backup CPU in the third solution shown in the first embodiment of the present invention;

6 is a flowchart of the protected CPU side of the service protection method provided in the second embodiment of the present invention;

FIG. 7 is a flow chart on the main control board side of the service protection method provided in Embodiment 2 of the present invention;

FIG. 8 is a schematic diagram of a hardware structure of a network device provided in Embodiment 3 of the present invention.

Detailed ways

In order to make the objectives, technical solutions, and advantages of the present invention clearer, the following further describes the embodiments of the present invention in detail through specific implementations in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention.

Example one:

With the expansion of communication applications, LNS equipment and LAC equipment carry more and more services, and the load pressure is also increasing. Take LNS as an example: one LNS device may establish VPN tunnels with multiple LAC devices Therefore, the LNS equipment carries a lot of L2TP services. The traditional centralized L2TP is deployed on the main control board, but because the main control board has limited CPU resources and low processing efficiency, it cannot meet the increasing business load demand. In order to solve this problem, the LNS equipment can be deployed in a distributed manner, and multiple distributed service boards are used to carry the services that the original main control board needs to undertake, thereby distributing the pressure of the main control board. On a service board, multiple CPUs are responsible for processing LNS services, but when one of the CPUs fails, the services carried by the CPU will be affected and forced to be interrupted, thereby affecting the user experience. For this, this embodiment To provide a service protection method, please refer to an interactive flowchart of the service protection method shown in Figure 1:

S102: The protected CPU collects tunnel information and session information on the CPU.

In this embodiment, CPUs are divided into protected CPUs and standby CPUs. The protected CPUs refer to the services carried by the standby CPUs to continue processing and the protected CPUs. The protected CPUs usually refer to those that have to Middle refers to the faulty CPU of business processing. The spare CPU refers to the CPU that continues to process the services of the protected CPU. It is understandable that any CPU may serve as a protected CPU in some scenarios, and as a backup CPU in other scenarios.

The tunnel information refers to the information of the tunnel carried by the protected CPU, and the session information refers to the information of the session carried in each tunnel. Generally, one CPU can carry multiple tunnels, and one tunnel can carry multiple sessions. Generally, the protected CPU collects tunnel information and session information on the CPU when it fails and cannot continue to process the services carried by it. For example, in some examples of this embodiment, if a CPU determines that it needs to be reset to deal with the current failure, the CPU can determine that it is a protected CPU, and thus collect its own tunnel information and session information.

S104: The protected CPU sends the tunnel information and the session information to the main control board.

After the protected CPU collects its own tunnel information and session information, it can send these information to the main control board. Figure 2 shows a distributed business processing system, please refer to Figure 2:

The distributed service processing system 2 includes a main control board 21, a first service board 22 and a second service board 23, and a message transceiving processing board 24. Among them, the main control board 21 is responsible for system management, protocol message processing, and routing management of the entire distributed service processing system 2, and the message receiving and sending processing board 24 is responsible for interface traffic management, message forwarding, and switching traffic management, and can transmit L2TP reports. The document is distributed to each service board according to the processing rules set by the main control board 21. In this embodiment, each service board is independent of each other and performs distributed processing in parallel, thereby improving the throughput of the system. The first service board 22 includes multiple CPUs, and the second service board 23 also includes multiple CPUs. It is assumed that the distributed service processing system 2 is a distributed LNS device. Please refer to the schematic diagram of the L2TP establishment process shown in Fig. 3:

S302: LAC initiates a tunnel establishment request SCCRQ message;

S304: The CPU responds to the SCCRP message;

S306: LAC returns a confirmation SCCCN message to LNS after receiving the response;

At this point, the tunnel between the LAC and the CPU is established.

S308: The LAC initiates a session establishment request ICRQ message;

S310: The CPU returns a response ICRP message after receiving the request;

S312: LAC returns a confirmation ICCN message after receiving the response;

At this point, the session is established. After the session is established, the LNS can perform a PPP (Point to Point Protocol) interaction process with the user, and assign an IP address to the user, and then the user can access the network.

S106: The main control board determines the backup CPU corresponding to the protected CPU.

In this embodiment, after the main control board receives the tunnel information and session information sent by the protected CPU, it can be determined that the protected CPU cannot continue to process its services temporarily. Therefore, the main control board needs to determine a backup CPU for the protected CPU for backup. The CPU can process services that cannot be performed on the protected CPU, thereby protecting these services and avoiding interruption of these services.

In this embodiment, the backup CPU selected by the main control board for the protected CPU is also the CPU in the distributed service processing system, and the CPU in the distributed service processing system originally carries some services. Therefore, the backup CPU uses While processing its own business, the remaining resources are used to protect the business on the protected CPU. Therefore, when determining the backup CPU corresponding to the protected CPU, the main control board will refer to the resource vacancy information of each CPU, that is, the ability of a CPU to handle additional services while processing its own services.

In some examples of this embodiment, for a CPU, the main control board may determine its resource vacancy information according to at least one of the CPU utilization rate, the memory usage rate, the number of available tunnels, and the number of available sessions of the CPU.

PRI=W1*(1-CPURate)+W2*(1-MemRate)+W3*T+W4*S;

Among them, PRI can characterize the vacancy of CPU resources. The higher the PRI value, the more vacant CPU resources, and vice versa, the less vacant CPU resources are. Therefore, the higher the PRI value of a CPU, the higher the probability of the CPU being selected as the backup CPU. CPU Rate refers to the CPU utilization, W1 is the weight of the CPU free rate; Mem Rate is the memory usage rate, W2 is the weight of the memory remaining rate; T can represent the number of available tunnels, and W3 is the weight of the available tunnels ; S can represent the number of available sessions, and W4 is the weight of the number of available sessions.

It should be understood that the number of available sessions and the number of available tunnels are inconsistent with the measurements of the CPU vacancy rate and memory remaining rate. Therefore, in this embodiment, the number of available sessions and the number of available tunnels need to be normalized, so that four The measures of the people are consistent. For example, the value of T can be the ratio of the number of available tunnels to the rated total number of tunnels in the distributed service processing system, and the value range is (0,1); the value of S is the number of available sessions and the number of The ratio of the rated total number of sessions, the value range is (0,1).

It is understandable that a CPU can be used as a backup CPU. In addition to its current resource vacancy, the state of the CPU itself is also very important. For example, in some cases, although a CPU still has a lot of processing resources remaining, these processing resources are sufficient A lot of extra services are processed, but if the CPU itself is abnormal, it cannot be used as a backup CPU. Therefore, in some examples of this embodiment,

PRI=Stat*[W1*(1-CPURate)+W2*(1-MemRate)+W3*T+W4*S];

The meaning of other characters remains the same. Stat represents the running status of the CPU. If the value of Stat is 1, it means that the running status of the CPU is normal. When the running status of the CPU is abnormal, the value of Stat is 0. Therefore, regardless of the CPU vacancy rate, memory remaining rate and the number of available tunnels for a CPU. As long as the number of available sessions is abnormal, the PRI value of the CPU is 0.

In some other examples of this embodiment, the main control board may determine the resource free information of the CPU only according to the CPU free rate or the memory free rate of the CPU. In addition, the main control board can also determine the resource vacancy information of a CPU only based on the number of available tunnels or the number of available sessions of a CPU.

There is no doubt that before the main control board determines the spare CPU corresponding to the protected CPU, the main control board should first obtain the resource vacancy information of each CPU in the distributed business processing system. Here are several main control units for reference The board obtains the spare information of each CPU resource and determines the scheme of the spare CPU:

Option One:

The main control board periodically obtains the resource vacancy information of each CPU in the distributed service processing system, and periodically determines the spare CPU for each CPU, please refer to Figure 4:

S402: The main control board periodically determines the resource vacancy information of each CPU in the distributed service processing system.

In this solution, each CPU in the distributed service processing system periodically reports its own resource vacancy information to the main control board. In some examples of this embodiment, the CPU reports to the main control board to indicate its own resource vacancy. The information includes its own CPU utilization, memory utilization, the number of available tunnels, and the number of available sessions.

In this embodiment, because there are many CPUs in the distributed service processing system, these CPUs basically report their own resource vacancy information to the main control board, because, in order to ensure that the main control board can determine after receiving a reported message The resource vacancy information corresponding to the reported information belongs to which CPU. Therefore, when the CPU reports its own resource vacancy information to the main control board, the reported information will carry information that can uniquely identify itself in the distributed business processing system. For example, in an example of this embodiment, the CPU can uniquely characterize its identity through the L(a)N(b) identification, where L stands for "Location", which can characterize the business board where the CPU is located, where a is The number of the service board where the CPU is located, and N represents the serial number of the CPU on the service board where it is located, and b is the unique identifier of the CPU on the service board. Through the L(a)N(b) identification, the main control board can determine which CPU on which service board the resource vacancy information carried in the report information it receives belongs to.

S404: The main control board determines the corresponding standby CPU for each CPU according to the resource vacancy information of each CPU acquired last time, and stores the mapping relationship between each CPU and the corresponding standby CPU.

In this example, whenever the main control board reacquires the resource vacancy information of each CPU, it will reconfigure a spare CPU for each CPU. For a CPU, the main control board selects a backup CPU from other CPUs in the distributed service processing system to which the CPU belongs. Therefore, a CPU and its backup CPU are both CPUs belonging to the same distributed service processing system. For example, in an example of this embodiment, the main control board determines a backup CPU for each CPU in the distributed service processing system in the following manner:

The main control board calculates the PRI value of each CPU according to the resource vacancy information of each CPU. For the calculation method, please refer to the previous introduction and will not be repeated here. After calculating the PRI value of each CPU, the main control board determines the CPU with the highest PRI value, and uses this CPU as the CPU used by all CPUs in the distributed service processing system except the CPU itself. In addition, the main control board also needs to select a spare CPU for the CPU with the highest PRI value. In one example, the main control board can use the CPU with the next highest PRI value as the spare CPU for the CPU with the highest PRI value.

Assuming that in a distributed business processing system, there are five CPUs including CPU1, CPU2, CPU3, CPU4 and CPU5. After calculation, CPU3 has the highest PRI value, followed by CPU4. Therefore, the main control board can determine each CPU and its corresponding The mapping relationship between the standby CPUs is shown in Table 1:

Table 1

CPUCPU	备用CPUSpare CPU
CPU1CPU1	CPU3CPU3
CPU2CPU2	CPU3CPU3
CPU3CPU3	CPU4CPU4
CPU4CPU4	CPU3CPU3
CPU5CPU5	CPU3CPU3

After the main control board determines the mapping relationship between each CPU and its standby CPU, the mapping relationship can be stored for use when a CPU needs to be protected before the mapping relationship is updated next time.

S406: The main control board queries the backup CPU corresponding to the protected CPU according to the mapping relationship.

After a CPU reports its own tunnel information and session information to the main control board, the main control board can query the standby CPU corresponding to the CPU according to the pre-stored mapping relationship. In this solution, when a CPU fails, the main control board does not need to temporarily obtain the resource vacancy information of each CPU in the distributed business processing system, nor does it need to perform temporary calculations, so it can increase the speed of determining the backup CPU for faster The service of the protected CPU is switched to the backup CPU to avoid the impact of the service due to the time process of selecting the backup CPU, which is beneficial to reduce the user's perception of the failure of the protected CPU.

It is understandable that because the main control board periodically determines the backup CPU for each CPU in the distributed business processing system, so sometimes, after the main control board determines the backup CPU for each CPU, the backup CPU may not work. , Because it is possible that in a certain cycle, each CPU in the distributed business processing system is normal, and there is no protected CPU.

Option II:

The main control board periodically obtains the resource vacancy information of each CPU in the distributed service processing system, but temporarily determines the backup CPU corresponding to the protected CPU. In these solutions, the main control board can periodically obtain the resource vacancy information of each CPU in the distributed service processing system, as in solution 1, for example, the main control board periodically sends to each CPU in the distributed service processing system The vacant information request allows each CPU to report its own resource vacancy information according to the vacant information request. Of course, whether it is in the first or this solution, the main control board does not need to periodically send vacant information requests, but each CPU performs periodic monitoring by itself. When the reporting period arrives, each CPU automatically reports itself to the main control board. This can reduce the burden on the main control board.

However, in the second solution, unlike in the first solution, the main control board does not determine the spare CPU for each CPU every time it obtains the resource vacancy information of each CPU. The CPU resource vacancy information is stored. When a certain CPU fails, the failed CPU, that is, the protected CPU, can determine the corresponding used CPU temporarily. Because the main control board does not frequently determine the backup CPU for each CPU in the distributed business processing system, and when determining the backup CPU for the protected CPU, it does not need to determine the backup CPU for other CPUs in the distributed business processing system. , So it can reduce the occupation of its own processing resources.

It is understandable that in this scheme and scheme 1, since the main control board periodically obtains the resource vacancy information of each CPU in the distributed business processing system, the main control board always determines the spare CPU based on it. The newly acquired resource vacancy information, therefore, for the purpose of reducing the consumption of storage resources, the main control board can use the latest resource vacancy information of each CPU to overwrite the previous resource vacancy information.

third solution:

In the first two schemes, the main control board periodically obtains the resource vacancy information of each CPU in the distributed business processing system, but in this scheme, the main control board will only determine the backup for a certain protected CPU. The CPU only temporarily obtains the resource vacancy information of each CPU in the distributed business processing system. Please refer to a flow chart of determining the backup CPU for the protected CPU by the main control board shown in Figure 5 below:

S502: Send a request for spare information to CPUs other than the protected CPU in the distributed service processing system.

In this solution, because when a certain CPU in the distributed business processing system fails, other CPUs in the distributed business processing system are uncertain. Therefore, other CPUs in the distributed business processing system cannot be After the protection CPU appears, it actively reports its free information request to the main control board. Therefore, in this embodiment, when the main control board receives the tunnel information and session information sent by a protected CPU and determines that the backup CPU needs to be determined, The main control board can send a spare information request to other CPUs in the distributed service processing system except the protected CPU, and the spare information request can notify other CPUs to report their own resource spare information.

S504: Receive its own resource vacancy information reported by each CPU according to the vacancy information request.

After other CPUs in the distributed service processing system receive the vacant information request, they will report their own resource vacancy information to the main control board according to the vacant information request. Therefore, the main control board will receive the resource vacancy information sent by these CPUs.

S506: Determine a corresponding backup CPU for the protected CPU according to the resource vacancy information of each CPU.

After acquiring the resource vacancy information of other CPUs except the protected CPU, the main control board can determine the backup CPU of the protected CPU according to the resource vacancy information. It is understandable that, since the main control board only needs to select a backup CPU for the protected CPU at this time, the main control board can directly select the one with the best resource vacancy situation represented by the resource vacancy information as the backup CPU. Of course, in some other examples of this embodiment, the main control board may also select only the CPU with better resource vacancy as the backup CPU, instead of selecting the optimal one. For example, if the main control board determines through calculation that the resource vacancy of 3 CPUs is better, and they are all sufficient to carry all the services of the protected CPU, then in this case, the main control board can choose from these 3 CPUs arbitrarily One is used as the backup CPU of the protected CPU, and even the main control board can choose the one with the least spare resources among the three CPUs as the spare CPU, because in this way, the other two CPUs with better spare resources can be reserved. It is necessary to select a backup CPU to prevent the subsequent failure of a CPU that carries a larger amount of traffic.

S108: The main control board sends the services of the protected CPU to the standby CPU according to the tunnel information and the session information.

After the main control board determines the backup CPU of the protected CPU, it can send the services of the protected CPU to the backup CPU according to the tunnel information and session information reported by the protected CPU, so that the backup CPU can process the services of the protected CPU. It is understandable that, in some cases, the spare CPU selected by the main control board for the protected CPU has a large reserve of resources, and the spare CPU has sufficient resources to carry all the services of the protected CPU. In this case, the main control The board can directly distribute all the services of the protected CPU to the standby CPU. However, in other cases, the spare CPU selected by the main control board has few spare resources, and may only be able to handle part of the business on the protected CPU while processing its own services. In this case, the main control board needs Select from the services of the protected CPU, and only filter out some services and deliver them to the standby CPU.

In some examples of this embodiment, the main control board may randomly select a part of services from the services of the protected CPU according to the spare CPU resources and deliver it to the spare CPU. However, there is no doubt that for a certain service on the protected CPU, if it fails to be sent to the standby CPU by selection, the service will be interrupted, which will naturally affect the experience of the corresponding user. Therefore, in this embodiment, when the main control board screens the services of the protected CPU, it can be selected according to the importance of the services. In some examples of this embodiment, the main control board selects the services of the protected CPU in units of tunnels, that is, if a tunnel is selected by the main control board, all services carried on the tunnel will be delivered to On the standby CPU, if a tunnel is filtered out, all services carried on the tunnel can only be interrupted.

The following introduces a scheme for selecting services based on tunnels:

The main control board can determine the protection sensitivity of each tunnel according to the tunnel state Ts, the tunnel keep-alive time Tk, and the session volume Tn in the tunnel. E.g,

Sent(Tid)=Ts*Tk*Tn;

Among them, Tid refers to the tunnel number, and Sent refers to the protection sensitivity corresponding to a tunnel. From the above formula, it can be seen that the protection sensitivity of a tunnel is equal to the corresponding tunnel status, tunnel keep-alive time, and session in the tunnel. Measure the product of the three. There is no doubt that in some other examples of this embodiment, the main control board may also use other methods to calculate the protection sensitivity corresponding to each tunnel, or use other methods to filter the services of the protected CPU.

S110: The backup CPU processes the services of the protected CPU.

After the standby CPU receives the services of the protected CPU issued by the main control board, it can process these services. It is understandable that, since the backup CPU itself has its own services to be processed, the backup CPU provided in this embodiment is actually using its own redundant resources to protect the protected services. Therefore, the service protection provided by this embodiment is The scheme is actually a redundant protection scheme.

It is understandable that after the failure of the protected CPU is restored, the main control board can switch back the services issued to the standby CPU back to the restored protected CPU, allowing the protected CPU to continue its original services. deal with. Since then, the relationship between protection and protection between the protected CPU and the standby CPU can be lifted.

In the service protection method provided by the embodiment of the present invention, when the CPU cannot continue to process the services carried by itself, the main control board selects a backup CPU for the CPU so as to continue to process all or part of the services of the failed CPU, thereby reducing the failure of the failed CPU The impact on user business and enhance user experience.

Since the selection of the backup CPU is based on the resource vacancy of each CPU, a CPU with more resources can be selected as the backup CPU, so that the backup CPU can undertake as much business on the failed CPU as possible.

In addition, when the backup CPU cannot carry all the services on the faulty CPU, the main control board can filter the services of the faulty CPU that the backup CPU needs to carry to avoid the problem of excessive load on the backup CPU and affecting the backup CPU's own services.

Embodiment two:

This embodiment will continue to introduce the foregoing service protection method in combination with some examples. Please refer to a flowchart of the service protection method shown in FIG. 6:

S602: The CPU determines that it cannot continue processing services.

In this embodiment, if the CPU has a fault that needs to be reset, or a fault that cannot be recovered temporarily occurs, the CPU can determine that it cannot continue to perform business processing currently.

S604: The CPU collects its own tunnel information and session information.

In this embodiment, the tunnel information collected by the CPU refers to the information of the tunnel carried by the CPU, and the session information refers to the information of the session carried in each tunnel. One CPU can carry multiple tunnels, and one tunnel can carry multiple sessions.

S606: The CPU sends its own tunnel information and session information to the main control board.

After the CPU collects its own tunnel information and session information, it can send these information to the main control board.

In some examples of this embodiment, when the CPU does not fail, the CPU will report its own resource vacancy information to the main control board regularly, or report its own resource vacancy to the main control board at the request of the main control board. information. The resource vacancy information reported by the CPU includes, but is not limited to, the L(a)N(b) identifier of the CPU itself, running status flag, CPU utilization, memory utilization, available tunnel resource data, available session resource data, etc. The resource vacancy information reported by the CPU can be used by the main control board to determine whether the CPU is suitable for the backup CPU of the failed CPU after other CPUs fail.

Please continue with the process on the main control board side in the service processing method shown in Figure 7:

S702: The main control board receives resource vacancy information regularly reported by each CPU in the distributed service processing system.

In this embodiment, the main control board determines the used CPU corresponding to each CPU in the distributed service processing system by using the scheme-one correspondence method in the first embodiment. Therefore, the main control board can periodically obtain the resource vacancy information reported by each CPU.

S704: The main control board determines a backup CPU for each CPU in the distributed service processing system according to the latest reported resource vacancy information.

After the main control board obtains the resource vacancy information reported by each CPU in the distributed business processing system, it can determine the CPU according to one or more of the CPU utilization rate, memory utilization rate, number of available tunnels, and number of available sessions. Then, based on the PRI value of each CPU, the standby CPU corresponding to each CPU is determined.

S706: The main control board stores the latest mapping relationship between each CPU and the corresponding standby CPU.

After determining the mapping relationship between each CPU and the corresponding standby CPU, the main control board can store the mapping relationship. It is understandable that every time each CPU reports the resource vacancy information, the main control board will determine a mapping relationship, but because the main control board determines the backup CPU for a protected CPU, it always depends on the current latest Mapping relationship, so when the main control board stores the mapping relationship, it can perform overwriting storage, that is, always overwrite the previous mapping relationship with the latest mapping relationship, which can reduce the impact of the mapping relationship storage on the main control board side. Occupation of storage resources.

S708: The main control board receives the tunnel information and the session information sent by the protected CPU.

When the main control board receives the tunnel information and session information sent by a CPU, it can determine that the CPU should be faulty and cannot continue to process its own services. Therefore, the main control board determines that the CPU is the current protected CPU.

S710: The main control board queries the backup CPU corresponding to the protected CPU according to the stored mapping relationship.

Since the main control board has determined that except for the used CPU corresponding to each CPU in the distributed service processing system, after the main control board determines the protected CPU, it can determine the corresponding protected CPU by querying the stored mapping relationship Which is the spare CPU?

S712: The main control board judges whether the spare CPU resources are sufficient to carry all the services of the protected CPU.

After querying the backup CPU corresponding to the protected CPU, the main control board can determine whether the spare CPU resources are enough to carry all the services of the protected CPU. If the judgment result is yes, then go to S714, otherwise, go to S716.

S714: The main control board delivers all services of the protected CPU to the standby CPU according to the tunnel information and session information of the protected CPU.

If the main control board has a large spare CPU selected for the protected CPU, and the spare CPU has enough resources to carry all the services of the protected CPU, the main control board can directly send all the services of the protected CPU to Standby CPU.

S716: The main control board determines the protection sensitivity corresponding to each tunnel on the protected CPU.

If the spare CPU selected by the main control board has insufficient resources and can only handle part of the business on the protected CPU while processing its own business, the main control board needs to select from the business of the protected CPU , Only select part of the business and send it to the standby CPU.

In this embodiment, the main control board selects the service to be issued to the standby CPU based on the protection sensitivity corresponding to each tunnel on the protected CPU. Therefore, when the main control board determines that the spare CPU resources are insufficient to carry all the services of the protected CPU, the main control board calculates the protection sensitivity corresponding to each tunnel on the protected CPU. For example, the main control board calculates the protection sensitivity corresponding to each tunnel according to the formula Sent(Tid)=Ts*Tk*Tn.

S718: The main control board selects the part that can be carried by the standby CPU from the services of the protected CPU in the order of tunnel protection sensitivity according to the resource vacancy of the standby CPU.

After determining the protection sensitivity corresponding to each tunnel, the main control board can select the part that can be carried by the backup CPU from the services of the protected CPU in the order of the protection sensitivity of the tunnel according to the resource vacancy of the backup CPU. Since the amount of sessions carried in each tunnel on the protected CPU is not fixed, the main control board cannot directly determine how many tunnels are selected based on the amount of traffic carried by each tunnel. In an example of this embodiment, the main control board may first select the tunnel with the highest protection sensitivity value on the protected CPU, and determine whether there are resources left after the backup CPU carries all the services in the tunnel. If so, Then the main control board further selects the tunnel with the second highest protection sensitivity value, and determines whether there are free resources to carry the services of other tunnels after the backup CPU further carries the services in the tunnel... and so on, until the backup CPU has no resources or resources The vacancy is not enough to carry the business in a certain tunnel.

S720: The main control board delivers the selected service to the standby CPU.

After the main control board selects the business, it delivers the selected business to the standby CPU, and the standby CPU can process the delivered business.

S722: The main control board monitors whether the protected CPU is restored.

After the main control board delivers all or part of the services of the protected CPU to the standby CPU, the main control board can monitor the state of the protected CPU to determine whether the state of the protected CPU has been restored. If the judgment result is yes, Then enter S724, otherwise continue to execute S722.

In some examples of this embodiment, the main control board may periodically send the status query information to the protected CPU, and determine the status of the protected CPU according to the feedback of the protected CPU. In some other examples of this embodiment, the protected CPU may actively report the information of its state restoration to the main control board after its state is restored.

S724: The main control board switches the services of the protected CPU back to the protected CPU.

When the main control board determines that the state of the protected CPU is restored, it can switch the services that originally belonged to the protected CPU back to the protected CPU for processing. These services include the services carried by the spare CPU as well as the priority due to spare CPU resources. The business that has not been issued to the standby by the main control board.

In the service protection method provided in this embodiment, the main control board determines the spare CPU for each CPU in the distributed service processing system in advance. Therefore, when a CPU fails, the main control board can quickly query the spare CPU of the CPU , So that after a CPU failure, the business migration on the failed CPU is realized as soon as possible, avoiding long-term business interruption and user experience problems.

Example three:

This embodiment provides a storage medium that can store one or more computer programs that can be read, compiled, and executed by one or more processors. In this embodiment, the storage medium can store At least one of the first service protection program, the second service protection program, and the third service protection program, wherein the first service protection program can be executed by one or more processors to implement any of the service protection methods introduced in the foregoing embodiments The process on the side of the protected CPU, the second service protection program can be executed by one or more processors to implement any one of the service protection methods introduced in the foregoing embodiment, the process on the main control board side, and the third service protection program can be used for one or more Each processor executes the process on the standby CPU side that implements any of the service protection methods introduced in the foregoing embodiments.

In addition, this embodiment provides a network device, as shown in FIG. 8: the network device 80 includes a processor 81, a memory 82, and a communication bus 83 for connecting the processor 81 and the memory 82, where the memory 82 may be the aforementioned storage The storage medium of the first business protection program. The processor 81 may read the first service protection program, compile it, and execute the process on the protected CPU side in the service protection method introduced in the foregoing embodiment:

The processor 81 collects tunnel information and session information on the CPU, where the tunnel information is the information of the tunnel carried by the CPU, and the session information is the information of the session carried by the tunnel;

The processor 81 sends the tunnel information and session information to the main control board. The tunnel information and the session information are used by the main control board to send the services on the CPU to the standby CPU for the standby CPU to continue processing the services on the CPU.

In some examples of this embodiment, the processor 81 may collect tunnel information and session information on the local CPU when the local CPU needs to be reset.

The processor 81 may also read the second service protection program, compile and execute the process on the main control board side in the service protection method introduced in the foregoing embodiment:

The processor 81 receives the tunnel information and session information sent by the protected CPU, then determines the backup CPU corresponding to the protected CPU, and sends the services of the protected CPU to the backup CPU according to the tunnel information and the session information.

In some examples of this embodiment, the standby CPU is determined according to the resource vacancy information of each CPU in the distributed service processing system, and the resource vacancy information can represent the resource vacancy of the CPU.

The resource vacancy information of a CPU is determined according to one or more of the CPU utilization rate, the memory usage rate, the number of available tunnels, and the number of available sessions.

In an example of this embodiment, before the processor 81 receives the tunnel information and session information sent by the protected CPU, it also periodically determines the resource vacancy information of each CPU in the distributed service processing system, and then according to the latest obtained The resource vacancy information of each CPU determines the corresponding backup CPU for each CPU, and stores the mapping relationship between each CPU and the corresponding backup CPU. When it is necessary to determine the backup CPU corresponding to the protected CPU, query the backup CPU corresponding to the protected CPU according to the mapping relationship.

In an example of this embodiment, before the processor 81 receives the tunnel information and session information sent by the protected CPU, it also periodically determines the resource vacancy information of each CPU in the distributed service processing system. When it is necessary to determine the backup CPU corresponding to the protected CPU, the corresponding backup CPU is determined for the protected CPU according to the resource vacancy information of each CPU obtained last time.

In another example of this embodiment, when the processor 81 determines the backup CPU corresponding to the protected CPU, it sends a request for spare information to other CPUs in the distributed service processing system except the protected CPU, and then receives each CPU According to the vacant information request, report its own resource vacancy information and determine the corresponding spare CPU for the protected CPU according to the resource vacancy information of each CPU.

In addition, after the processor 81 sends the services of the protected CPU to the backup CPU according to the tunnel information and the session information, after the protected CPU returns to the normal operating state, it switches the services belonging to the protected CPU back to the protected CPU.

In this embodiment, the processor 81 will determine whether the resource vacancy of the backup CPU is sufficient to carry all the services on the protected CPU according to the resource vacancy information of the backup CPU; if not, it will screen the services of the protected CPU and will screen it The reserved business is delivered to the standby CPU. If it is determined that the spare CPU resources are sufficient to carry all the services on the protected CPU, the processor 81 directly issues all the services of the protected CPU to the spare CPU.

In some embodiments, the processor 81 uses the tunnel as a unit to determine the protection sensitivity corresponding to each tunnel. The protection sensitivity represents the degree of protection required for the services in the tunnel. The higher the protection sensitivity, the degree of protection required for the services in the tunnel. The higher is; after the protection sensitivity is determined, the processor 81 selects reserved services in the order of protection sensitivity from high to low according to the spare CPU resource vacancy.

For example, the processor 81 may determine the protection sensitivity of each tunnel according to the tunnel state Ts of the tunnel, the tunnel keep-alive time Tk, and the session volume Tn in the tunnel.

The processor 81 may also read the third service protection program, compile and execute the process on the standby CPU side in the service protection method introduced in the foregoing embodiment:

The processor 81 reports the resource vacancy information of the CPU to the main control board, and then receives the services of the protected CPU sent by the main control board, and processes the services of the protected CPU.

This embodiment also provides a distributed service processing system, which includes a main control board and multiple CPUs. The main control board is a network device for the processor 81 to execute the second service protection program, and part of the multiple CPUs is the aforementioned processing The processor 81 is a network device that executes the first service protection program, and part of it is a network device that the processor 81 executes the third service protection program.

In the network equipment and distributed service processing system provided in this embodiment, when a CPU cannot continue to process service, it can send its own tunnel information and session information to the main control board, allowing the main control board to process distributed services from Determine a spare CPU for yourself among other CPUs in the system, and let the spare CPU continue to process your own services, so as to avoid interruption of all services carried by itself due to your own reasons, which will affect the user experience, which is beneficial to improvement The disaster tolerance performance of the distributed business processing system enhances system stability and enhances the user's business experience.

Obviously, those skilled in the art should understand that all or some of the steps in the method disclosed above, the functional modules/units in the system, and the device can be implemented as software (which can be implemented by program code executable by a computing device) , Firmware, hardware and their appropriate combination. In hardware implementations, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may consist of several physical components. The components are executed cooperatively. Some physical components or all physical components can be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on a computer-readable medium and executed by a computing device, and in some cases, the steps shown or described may be executed in a different order than here. The computer-readable medium may include computer storage Medium (or non-transitory medium) and communication medium (or temporary medium). As is well known to those of ordinary skill in the art, the term computer storage medium includes volatile and non-volatile memory implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Flexible, removable and non-removable media. Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassette, tape, magnetic disk storage or other magnetic storage devices, or Any other medium used to store desired information and that can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media . Therefore, the present invention is not limited to any specific combination of hardware and software.

The above content is a further detailed description of the embodiments of the present invention in combination with specific implementations, and it cannot be considered that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field to which the present invention belongs, several simple deductions or substitutions can be made without departing from the concept of the present invention, which should be regarded as falling within the protection scope of the present invention.

Claims

A method of business protection, including:

Collect tunnel information and session information on the CPU, where the tunnel information is information of a tunnel carried by the CPU, and the session information is information of a session carried by the tunnel;

The tunnel information and the session information are sent to the main control board, and the tunnel information and the session information are used by the main control board to send the services on the CPU to the standby CPU for the standby CPU to continue processing Business on this CPU.
The service protection method according to claim 1, wherein the collecting tunnel information and session information on the CPU includes:

When the CPU needs to be reset, collect tunnel information and session information on the CPU.
A method of business protection, including:

Receiving tunnel information and session information sent by a protected CPU, where the tunnel information is information about a tunnel carried by the protected CPU, and the session information is information about a session carried by the tunnel;

Determine a backup CPU corresponding to the protected CPU, where the backup CPU and the protected CPU belong to the same distributed service processing system;

Sending the service of the protected CPU to the standby CPU according to the tunnel information and the session information.
The service protection method according to claim 3, wherein the standby CPU is determined according to the resource vacancy information of each CPU in the distributed service processing system, and the resource vacancy information can represent the resource vacancy of the CPU.
The service protection method of claim 4, wherein the resource vacancy information of a CPU is determined according to one or more of the CPU utilization rate, the memory usage rate, the number of available tunnels, and the number of available sessions of the CPU.
The service protection method according to claim 4, wherein before receiving the tunnel information and session information sent by the protected CPU, the method further comprises:

Periodically determining the resource vacancy information of each CPU in the distributed service processing system;

Determine the corresponding spare CPU for each CPU according to the resource vacancy information of each CPU obtained last time, and store the mapping relationship between each CPU and the corresponding spare CPU;

The determining the backup CPU corresponding to the protected CPU includes:

Query the backup CPU corresponding to the protected CPU according to the mapping relationship.
The service protection method according to claim 4, wherein before receiving the tunnel information and session information sent by the protected CPU, the method further comprises:

Periodically determining the resource vacancy information of each CPU in the distributed service processing system;

The determining the backup CPU corresponding to the protected CPU includes:

The corresponding spare CPU is determined for the protected CPU according to the resource vacancy information of each CPU acquired last time.
The service protection method according to claim 4, wherein said determining the backup CPU corresponding to the protected CPU comprises:

Sending a request for spare information to CPUs other than the protected CPU in the distributed service processing system;

Receiving its own resource vacancy information reported by each CPU according to the vacancy information request;

The corresponding spare CPU is determined for the protected CPU according to the resource free information of each CPU.
The service protection method according to claim 3, wherein after the sending the service of the protected CPU to the backup CPU according to the tunnel information and the session information, the method further comprises:

After the protected CPU returns to the normal operating state, the services belonging to the protected CPU are switched back to the protected CPU.
9. The service protection method according to any one of claims 3-9, wherein the sending the service of the protected CPU to the backup CPU according to the tunnel information and the session information comprises:

Determining, according to the resource vacancy information of the backup CPU, whether the resource vacancy of the backup CPU is sufficient to carry all the services on the protected CPU;

If not, the services of the protected CPU are screened, and the services reserved by the screening are delivered to the standby CPU.
The service protection method according to claim 10, wherein the screening of the service of the protected CPU comprises:

The protection sensitivity corresponding to each tunnel is determined in units of tunnels. The protection sensitivity represents the degree of protection required for the services in the tunnel. The higher the protection sensitivity, the greater the degree of protection required for the services in the tunnel. high;

The reserved services are selected in the order of protection sensitivity from high to low according to the resource vacancy of the standby CPU.
The service protection method according to claim 11, wherein said determining the protection sensitivity corresponding to each tunnel by using a tunnel as a unit comprises:

The protection sensitivity of each tunnel is determined according to the tunnel state Ts, the tunnel keep-alive time Tk, and the session volume Tn in the tunnel.
The service protection method according to claim 10, wherein if it is determined that the spare CPU resources are sufficient to carry all the services on the protected CPU, the service protection method further comprises: directly connecting the protected CPU All the services of is delivered to the standby CPU.
A method of business protection, including:

Report the CPU's resource free information to the main control board;

Receiving the protected CPU service sent by the main control board;

Process the services of the protected CPU.
A network device including a processor, a memory and a communication bus, in which:

The communication bus is used to realize connection and communication between the processor and the memory;

The processor is configured to execute the first service protection program stored in the memory to implement the steps of the service protection method according to claim 1 or 2; or, the processor is configured to execute the second service protection program stored in the memory , To implement the steps of the service protection method according to any one of claims 3-13; the processor is used to execute the third service protection program stored in the memory to implement the service protection method according to claim 14 step.
A distributed service processing system, comprising a main control board and multiple CPUs, wherein the main control board is a network device in which the processor in claim 15 executes a second service protection program, and some of the multiple CPUs are The network device in which the processor in claim 15 executes the first service protection program is part of the network device in which the processor in claim 15 executes the third service protection program.
A storage medium that stores at least one of a first service protection program, a second service protection program, and a third service protection program, wherein the first service protection program can be executed by one or more processors to implement The steps of the service protection method according to claim 1 or 2; the second service protection program can be executed by one or more processors to implement the service protection method according to any one of claims 3-13 Step; The third service protection program can be executed by one or more processors to implement the steps of the service protection method according to claim 14.