WO2021000647A1 - Service protection method, network device, distributed service processing system, and storage medium - Google Patents

Service protection method, network device, distributed service processing system, and storage medium Download PDF

Info

Publication number
WO2021000647A1
WO2021000647A1 PCT/CN2020/088318 CN2020088318W WO2021000647A1 WO 2021000647 A1 WO2021000647 A1 WO 2021000647A1 CN 2020088318 W CN2020088318 W CN 2020088318W WO 2021000647 A1 WO2021000647 A1 WO 2021000647A1
Authority
WO
WIPO (PCT)
Prior art keywords
cpu
information
protected
tunnel
service
Prior art date
Application number
PCT/CN2020/088318
Other languages
French (fr)
Chinese (zh)
Inventor
张健
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2021000647A1 publication Critical patent/WO2021000647A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/46Interconnection of networks
    • H04L12/4633Interconnection of networks using encapsulation techniques, e.g. tunneling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/46Interconnection of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/46Interconnection of networks
    • H04L12/4641Virtual LANs, VLANs, e.g. virtual private networks [VPN]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements

Definitions

  • the present invention relates to the field of communications, in particular to a service protection method, network equipment, distributed service processing system and storage medium.
  • L2TP Layer 2 Tunneling Protocol
  • LAC Layer 2 Tunneling Protocol
  • AAA Authentication, Authorization, Accounting, authentication and authorization and accounting
  • LNS L2TP Network Server, L2TP network server
  • the server completes the final authorization and verification of users, receives tunnels and connection requests from LAC, and establishes a PPP channel connecting LNS and users.
  • the L2TP services carried by LNS equipment and LAC equipment are gradually increasing.
  • the L2TP multi-core service processing based on the distributed architecture is proposed.
  • the multi-core service board greatly decentralizes the main control.
  • the board has strong pressure and load capacity.
  • the L2TP service will be interrupted, which will cause losses to telecommunications operations.
  • the service protection method, network equipment, distributed service processing system, and storage medium provided by the embodiments of the present invention solve the technical problem to a certain extent: the L2TP multi-core service based on the distributed architecture in the related technology is easy to be caused by the multi-core service board or The failure of the chip on the multi-core service board is interrupted, thereby affecting the user's communication service experience.
  • an embodiment of the present invention provides a service protection method, including:
  • tunnel information is information of a tunnel carried by the CPU
  • session information is information of a session carried by the tunnel
  • the tunnel information and the session information are sent to the main control board, and the tunnel information and the session information are used by the main control board to send the services on the CPU to the standby CPU for the standby CPU to continue processing Business on this CPU.
  • the embodiment of the present invention also provides a service protection method, including:
  • tunnel information is information about a tunnel carried by the protected CPU
  • session information is information about a session carried by the tunnel
  • the embodiment of the present invention also provides a service protection method, including:
  • the embodiment of the present invention also provides a network device, which includes a processor, a memory, and a communication bus;
  • the communication bus is used to realize connection and communication between the processor and the memory
  • the processor is configured to execute the first service protection program stored in the memory to implement the steps of the foregoing first service protection method; or, the processor is configured to execute the second service protection program stored in the memory to implement the foregoing first service protection program.
  • the steps of the two service protection methods; the processor is used to execute the third service protection program stored in the memory to realize the steps of the third service protection method.
  • the embodiment of the present invention also provides a distributed service processing system, which includes a main control board and a plurality of CPUs.
  • the main control board is a network device that executes a second service protection program by the above-mentioned processor.
  • Part is a network device where the processor executes the first service protection program, and part is a network device where the processor executes the third service protection program.
  • An embodiment of the present invention also provides a storage medium that stores at least one of a first service protection program, a second service protection program, and a third service protection program.
  • the first service protection program may be configured by one or more
  • the second service protection program can be executed by one or more processors to implement the steps of the second service protection method; the third The service protection program may be executed by one or more processors to implement the steps of the third service protection method described above.
  • FIG. 1 is an interaction flowchart of the service protection method provided in Embodiment 1 of the present invention.
  • FIG. 2 is an architecture diagram of a distributed service processing system provided in Embodiment 1 of the present invention.
  • FIG. 3 is a flow chart of an interaction between the LAC device and the CPU on the LNS device side shown in the first embodiment of the present invention
  • FIG. 4 is a flow chart of determining a backup CPU in the first solution shown in the first embodiment of the present invention
  • FIG. 5 is a flowchart of determining a backup CPU in the third solution shown in the first embodiment of the present invention.
  • FIG. 6 is a flowchart of the protected CPU side of the service protection method provided in the second embodiment of the present invention.
  • FIG. 7 is a flow chart on the main control board side of the service protection method provided in Embodiment 2 of the present invention.
  • FIG. 8 is a schematic diagram of a hardware structure of a network device provided in Embodiment 3 of the present invention.
  • LNS equipment and LAC equipment carry more and more services, and the load pressure is also increasing.
  • LNS Take LNS as an example: one LNS device may establish VPN tunnels with multiple LAC devices Therefore, the LNS equipment carries a lot of L2TP services.
  • the traditional centralized L2TP is deployed on the main control board, but because the main control board has limited CPU resources and low processing efficiency, it cannot meet the increasing business load demand.
  • the LNS equipment can be deployed in a distributed manner, and multiple distributed service boards are used to carry the services that the original main control board needs to undertake, thereby distributing the pressure of the main control board.
  • S102 The protected CPU collects tunnel information and session information on the CPU.
  • CPUs are divided into protected CPUs and standby CPUs.
  • the protected CPUs refer to the services carried by the standby CPUs to continue processing and the protected CPUs.
  • the protected CPUs usually refer to those that have to Middle refers to the faulty CPU of business processing.
  • the spare CPU refers to the CPU that continues to process the services of the protected CPU. It is understandable that any CPU may serve as a protected CPU in some scenarios, and as a backup CPU in other scenarios.
  • the tunnel information refers to the information of the tunnel carried by the protected CPU
  • the session information refers to the information of the session carried in each tunnel.
  • one CPU can carry multiple tunnels, and one tunnel can carry multiple sessions.
  • the protected CPU collects tunnel information and session information on the CPU when it fails and cannot continue to process the services carried by it. For example, in some examples of this embodiment, if a CPU determines that it needs to be reset to deal with the current failure, the CPU can determine that it is a protected CPU, and thus collect its own tunnel information and session information.
  • S104 The protected CPU sends the tunnel information and the session information to the main control board.
  • Figure 2 shows a distributed business processing system, please refer to Figure 2:
  • the distributed service processing system 2 includes a main control board 21, a first service board 22 and a second service board 23, and a message transceiving processing board 24.
  • the main control board 21 is responsible for system management, protocol message processing, and routing management of the entire distributed service processing system 2
  • the message receiving and sending processing board 24 is responsible for interface traffic management, message forwarding, and switching traffic management, and can transmit L2TP reports.
  • the document is distributed to each service board according to the processing rules set by the main control board 21.
  • each service board is independent of each other and performs distributed processing in parallel, thereby improving the throughput of the system.
  • the first service board 22 includes multiple CPUs
  • the second service board 23 also includes multiple CPUs. It is assumed that the distributed service processing system 2 is a distributed LNS device. Please refer to the schematic diagram of the L2TP establishment process shown in Fig. 3:
  • LAC initiates a tunnel establishment request SCCRQ message
  • LAC returns a confirmation SCCCN message to LNS after receiving the response
  • the session is established.
  • the LNS can perform a PPP (Point to Point Protocol) interaction process with the user, and assign an IP address to the user, and then the user can access the network.
  • PPP Point to Point Protocol
  • the main control board determines the backup CPU corresponding to the protected CPU.
  • the main control board After the main control board receives the tunnel information and session information sent by the protected CPU, it can be determined that the protected CPU cannot continue to process its services temporarily. Therefore, the main control board needs to determine a backup CPU for the protected CPU for backup.
  • the CPU can process services that cannot be performed on the protected CPU, thereby protecting these services and avoiding interruption of these services.
  • the backup CPU selected by the main control board for the protected CPU is also the CPU in the distributed service processing system, and the CPU in the distributed service processing system originally carries some services. Therefore, the backup CPU uses While processing its own business, the remaining resources are used to protect the business on the protected CPU. Therefore, when determining the backup CPU corresponding to the protected CPU, the main control board will refer to the resource vacancy information of each CPU, that is, the ability of a CPU to handle additional services while processing its own services.
  • the main control board may determine its resource vacancy information according to at least one of the CPU utilization rate, the memory usage rate, the number of available tunnels, and the number of available sessions of the CPU.
  • PRI can characterize the vacancy of CPU resources. The higher the PRI value, the more vacant CPU resources, and vice versa, the less vacant CPU resources are. Therefore, the higher the PRI value of a CPU, the higher the probability of the CPU being selected as the backup CPU.
  • CPU Rate refers to the CPU utilization, W1 is the weight of the CPU free rate; Mem Rate is the memory usage rate, W2 is the weight of the memory remaining rate; T can represent the number of available tunnels, and W3 is the weight of the available tunnels ; S can represent the number of available sessions, and W4 is the weight of the number of available sessions.
  • the number of available sessions and the number of available tunnels are inconsistent with the measurements of the CPU vacancy rate and memory remaining rate. Therefore, in this embodiment, the number of available sessions and the number of available tunnels need to be normalized, so that four The measures of the people are consistent.
  • the value of T can be the ratio of the number of available tunnels to the rated total number of tunnels in the distributed service processing system, and the value range is (0,1); the value of S is the number of available sessions and the number of The ratio of the rated total number of sessions, the value range is (0,1).
  • a CPU can be used as a backup CPU.
  • the state of the CPU itself is also very important. For example, in some cases, although a CPU still has a lot of processing resources remaining, these processing resources are sufficient A lot of extra services are processed, but if the CPU itself is abnormal, it cannot be used as a backup CPU. Therefore, in some examples of this embodiment,
  • Stat represents the running status of the CPU. If the value of Stat is 1, it means that the running status of the CPU is normal. When the running status of the CPU is abnormal, the value of Stat is 0. Therefore, regardless of the CPU vacancy rate, memory remaining rate and the number of available tunnels for a CPU. As long as the number of available sessions is abnormal, the PRI value of the CPU is 0.
  • the main control board may determine the resource free information of the CPU only according to the CPU free rate or the memory free rate of the CPU.
  • the main control board can also determine the resource vacancy information of a CPU only based on the number of available tunnels or the number of available sessions of a CPU.
  • the main control board determines the spare CPU corresponding to the protected CPU, the main control board should first obtain the resource vacancy information of each CPU in the distributed business processing system.
  • the board obtains the spare information of each CPU resource and determines the scheme of the spare CPU:
  • the main control board periodically obtains the resource vacancy information of each CPU in the distributed service processing system, and periodically determines the spare CPU for each CPU, please refer to Figure 4:
  • the main control board periodically determines the resource vacancy information of each CPU in the distributed service processing system.
  • each CPU in the distributed service processing system periodically reports its own resource vacancy information to the main control board.
  • the CPU reports to the main control board to indicate its own resource vacancy.
  • the information includes its own CPU utilization, memory utilization, the number of available tunnels, and the number of available sessions.
  • the CPU can uniquely characterize its identity through the L(a)N(b) identification, where L stands for "Location", which can characterize the business board where the CPU is located, where a is The number of the service board where the CPU is located, and N represents the serial number of the CPU on the service board where it is located, and b is the unique identifier of the CPU on the service board.
  • L stands for "Location”
  • the main control board can determine which CPU on which service board the resource vacancy information carried in the report information it receives belongs to.
  • the main control board determines the corresponding standby CPU for each CPU according to the resource vacancy information of each CPU acquired last time, and stores the mapping relationship between each CPU and the corresponding standby CPU.
  • the main control board whenever the main control board reacquires the resource vacancy information of each CPU, it will reconfigure a spare CPU for each CPU.
  • the main control board selects a backup CPU from other CPUs in the distributed service processing system to which the CPU belongs. Therefore, a CPU and its backup CPU are both CPUs belonging to the same distributed service processing system.
  • the main control board determines a backup CPU for each CPU in the distributed service processing system in the following manner:
  • the main control board calculates the PRI value of each CPU according to the resource vacancy information of each CPU. For the calculation method, please refer to the previous introduction and will not be repeated here. After calculating the PRI value of each CPU, the main control board determines the CPU with the highest PRI value, and uses this CPU as the CPU used by all CPUs in the distributed service processing system except the CPU itself. In addition, the main control board also needs to select a spare CPU for the CPU with the highest PRI value. In one example, the main control board can use the CPU with the next highest PRI value as the spare CPU for the CPU with the highest PRI value.
  • mapping relationship can be stored for use when a CPU needs to be protected before the mapping relationship is updated next time.
  • S406 The main control board queries the backup CPU corresponding to the protected CPU according to the mapping relationship.
  • the main control board can query the standby CPU corresponding to the CPU according to the pre-stored mapping relationship.
  • the main control board does not need to temporarily obtain the resource vacancy information of each CPU in the distributed business processing system, nor does it need to perform temporary calculations, so it can increase the speed of determining the backup CPU for faster
  • the service of the protected CPU is switched to the backup CPU to avoid the impact of the service due to the time process of selecting the backup CPU, which is beneficial to reduce the user's perception of the failure of the protected CPU.
  • the main control board periodically determines the backup CPU for each CPU in the distributed business processing system, so sometimes, after the main control board determines the backup CPU for each CPU, the backup CPU may not work. , Because it is possible that in a certain cycle, each CPU in the distributed business processing system is normal, and there is no protected CPU.
  • the main control board periodically obtains the resource vacancy information of each CPU in the distributed service processing system, but temporarily determines the backup CPU corresponding to the protected CPU.
  • the main control board can periodically obtain the resource vacancy information of each CPU in the distributed service processing system, as in solution 1, for example, the main control board periodically sends to each CPU in the distributed service processing system
  • the vacant information request allows each CPU to report its own resource vacancy information according to the vacant information request.
  • the main control board does not need to periodically send vacant information requests, but each CPU performs periodic monitoring by itself. When the reporting period arrives, each CPU automatically reports itself to the main control board. This can reduce the burden on the main control board.
  • the main control board does not determine the spare CPU for each CPU every time it obtains the resource vacancy information of each CPU.
  • the CPU resource vacancy information is stored.
  • the failed CPU that is, the protected CPU
  • the main control board does not frequently determine the backup CPU for each CPU in the distributed business processing system, and when determining the backup CPU for the protected CPU, it does not need to determine the backup CPU for other CPUs in the distributed business processing system. , So it can reduce the occupation of its own processing resources.
  • the main control board since the main control board periodically obtains the resource vacancy information of each CPU in the distributed business processing system, the main control board always determines the spare CPU based on it. The newly acquired resource vacancy information, therefore, for the purpose of reducing the consumption of storage resources, the main control board can use the latest resource vacancy information of each CPU to overwrite the previous resource vacancy information.
  • the main control board periodically obtains the resource vacancy information of each CPU in the distributed business processing system, but in this scheme, the main control board will only determine the backup for a certain protected CPU.
  • the CPU only temporarily obtains the resource vacancy information of each CPU in the distributed business processing system. Please refer to a flow chart of determining the backup CPU for the protected CPU by the main control board shown in Figure 5 below:
  • S502 Send a request for spare information to CPUs other than the protected CPU in the distributed service processing system.
  • S504 Receive its own resource vacancy information reported by each CPU according to the vacancy information request.
  • S506 Determine a corresponding backup CPU for the protected CPU according to the resource vacancy information of each CPU.
  • the main control board After acquiring the resource vacancy information of other CPUs except the protected CPU, the main control board can determine the backup CPU of the protected CPU according to the resource vacancy information. It is understandable that, since the main control board only needs to select a backup CPU for the protected CPU at this time, the main control board can directly select the one with the best resource vacancy situation represented by the resource vacancy information as the backup CPU. Of course, in some other examples of this embodiment, the main control board may also select only the CPU with better resource vacancy as the backup CPU, instead of selecting the optimal one.
  • the main control board determines through calculation that the resource vacancy of 3 CPUs is better, and they are all sufficient to carry all the services of the protected CPU, then in this case, the main control board can choose from these 3 CPUs arbitrarily One is used as the backup CPU of the protected CPU, and even the main control board can choose the one with the least spare resources among the three CPUs as the spare CPU, because in this way, the other two CPUs with better spare resources can be reserved. It is necessary to select a backup CPU to prevent the subsequent failure of a CPU that carries a larger amount of traffic.
  • S108 The main control board sends the services of the protected CPU to the standby CPU according to the tunnel information and the session information.
  • the main control board After the main control board determines the backup CPU of the protected CPU, it can send the services of the protected CPU to the backup CPU according to the tunnel information and session information reported by the protected CPU, so that the backup CPU can process the services of the protected CPU.
  • the spare CPU selected by the main control board for the protected CPU has a large reserve of resources, and the spare CPU has sufficient resources to carry all the services of the protected CPU.
  • the main control The board can directly distribute all the services of the protected CPU to the standby CPU.
  • the spare CPU selected by the main control board has few spare resources, and may only be able to handle part of the business on the protected CPU while processing its own services. In this case, the main control board needs Select from the services of the protected CPU, and only filter out some services and deliver them to the standby CPU.
  • the main control board may randomly select a part of services from the services of the protected CPU according to the spare CPU resources and deliver it to the spare CPU.
  • the main control board screens the services of the protected CPU, it can be selected according to the importance of the services.
  • the main control board selects the services of the protected CPU in units of tunnels, that is, if a tunnel is selected by the main control board, all services carried on the tunnel will be delivered to On the standby CPU, if a tunnel is filtered out, all services carried on the tunnel can only be interrupted.
  • the main control board can determine the protection sensitivity of each tunnel according to the tunnel state Ts, the tunnel keep-alive time Tk, and the session volume Tn in the tunnel.
  • Tid refers to the tunnel number
  • Sent refers to the protection sensitivity corresponding to a tunnel. From the above formula, it can be seen that the protection sensitivity of a tunnel is equal to the corresponding tunnel status, tunnel keep-alive time, and session in the tunnel. Measure the product of the three. There is no doubt that in some other examples of this embodiment, the main control board may also use other methods to calculate the protection sensitivity corresponding to each tunnel, or use other methods to filter the services of the protected CPU.
  • the standby CPU After the standby CPU receives the services of the protected CPU issued by the main control board, it can process these services. It is understandable that, since the backup CPU itself has its own services to be processed, the backup CPU provided in this embodiment is actually using its own redundant resources to protect the protected services. Therefore, the service protection provided by this embodiment is The scheme is actually a redundant protection scheme.
  • the main control board can switch back the services issued to the standby CPU back to the restored protected CPU, allowing the protected CPU to continue its original services. deal with. Since then, the relationship between protection and protection between the protected CPU and the standby CPU can be lifted.
  • the main control board selects a backup CPU for the CPU so as to continue to process all or part of the services of the failed CPU, thereby reducing the failure of the failed CPU The impact on user business and enhance user experience.
  • the selection of the backup CPU is based on the resource vacancy of each CPU, a CPU with more resources can be selected as the backup CPU, so that the backup CPU can undertake as much business on the failed CPU as possible.
  • the main control board can filter the services of the faulty CPU that the backup CPU needs to carry to avoid the problem of excessive load on the backup CPU and affecting the backup CPU's own services.
  • the CPU can determine that it cannot continue to perform business processing currently.
  • S604 The CPU collects its own tunnel information and session information.
  • the tunnel information collected by the CPU refers to the information of the tunnel carried by the CPU
  • the session information refers to the information of the session carried in each tunnel.
  • One CPU can carry multiple tunnels, and one tunnel can carry multiple sessions.
  • S606 The CPU sends its own tunnel information and session information to the main control board.
  • the CPU After the CPU collects its own tunnel information and session information, it can send these information to the main control board.
  • the CPU when the CPU does not fail, the CPU will report its own resource vacancy information to the main control board regularly, or report its own resource vacancy to the main control board at the request of the main control board. information.
  • the resource vacancy information reported by the CPU includes, but is not limited to, the L(a)N(b) identifier of the CPU itself, running status flag, CPU utilization, memory utilization, available tunnel resource data, available session resource data, etc.
  • the resource vacancy information reported by the CPU can be used by the main control board to determine whether the CPU is suitable for the backup CPU of the failed CPU after other CPUs fail.
  • S702 The main control board receives resource vacancy information regularly reported by each CPU in the distributed service processing system.
  • the main control board determines the used CPU corresponding to each CPU in the distributed service processing system by using the scheme-one correspondence method in the first embodiment. Therefore, the main control board can periodically obtain the resource vacancy information reported by each CPU.
  • the main control board determines a backup CPU for each CPU in the distributed service processing system according to the latest reported resource vacancy information.
  • the main control board After the main control board obtains the resource vacancy information reported by each CPU in the distributed business processing system, it can determine the CPU according to one or more of the CPU utilization rate, memory utilization rate, number of available tunnels, and number of available sessions. Then, based on the PRI value of each CPU, the standby CPU corresponding to each CPU is determined.
  • the main control board stores the latest mapping relationship between each CPU and the corresponding standby CPU.
  • the main control board After determining the mapping relationship between each CPU and the corresponding standby CPU, the main control board can store the mapping relationship. It is understandable that every time each CPU reports the resource vacancy information, the main control board will determine a mapping relationship, but because the main control board determines the backup CPU for a protected CPU, it always depends on the current latest Mapping relationship, so when the main control board stores the mapping relationship, it can perform overwriting storage, that is, always overwrite the previous mapping relationship with the latest mapping relationship, which can reduce the impact of the mapping relationship storage on the main control board side. Occupation of storage resources.
  • S708 The main control board receives the tunnel information and the session information sent by the protected CPU.
  • the main control board When the main control board receives the tunnel information and session information sent by a CPU, it can determine that the CPU should be faulty and cannot continue to process its own services. Therefore, the main control board determines that the CPU is the current protected CPU.
  • S710 The main control board queries the backup CPU corresponding to the protected CPU according to the stored mapping relationship.
  • the main control board Since the main control board has determined that except for the used CPU corresponding to each CPU in the distributed service processing system, after the main control board determines the protected CPU, it can determine the corresponding protected CPU by querying the stored mapping relationship Which is the spare CPU?
  • S712 The main control board judges whether the spare CPU resources are sufficient to carry all the services of the protected CPU.
  • the main control board After querying the backup CPU corresponding to the protected CPU, the main control board can determine whether the spare CPU resources are enough to carry all the services of the protected CPU. If the judgment result is yes, then go to S714, otherwise, go to S716.
  • the main control board delivers all services of the protected CPU to the standby CPU according to the tunnel information and session information of the protected CPU.
  • the main control board can directly send all the services of the protected CPU to Standby CPU.
  • S716 The main control board determines the protection sensitivity corresponding to each tunnel on the protected CPU.
  • the main control board needs to select from the business of the protected CPU , Only select part of the business and send it to the standby CPU.
  • the main control board selects the part that can be carried by the standby CPU from the services of the protected CPU in the order of tunnel protection sensitivity according to the resource vacancy of the standby CPU.
  • the main control board can select the part that can be carried by the backup CPU from the services of the protected CPU in the order of the protection sensitivity of the tunnel according to the resource vacancy of the backup CPU. Since the amount of sessions carried in each tunnel on the protected CPU is not fixed, the main control board cannot directly determine how many tunnels are selected based on the amount of traffic carried by each tunnel. In an example of this embodiment, the main control board may first select the tunnel with the highest protection sensitivity value on the protected CPU, and determine whether there are resources left after the backup CPU carries all the services in the tunnel.
  • the main control board further selects the tunnel with the second highest protection sensitivity value, and determines whether there are free resources to carry the services of other tunnels after the backup CPU further carries the services in the tunnel... and so on, until the backup CPU has no resources or resources The vacancy is not enough to carry the business in a certain tunnel.
  • S720 The main control board delivers the selected service to the standby CPU.
  • the main control board After the main control board selects the business, it delivers the selected business to the standby CPU, and the standby CPU can process the delivered business.
  • S722 The main control board monitors whether the protected CPU is restored.
  • the main control board delivers all or part of the services of the protected CPU to the standby CPU, the main control board can monitor the state of the protected CPU to determine whether the state of the protected CPU has been restored. If the judgment result is yes, Then enter S724, otherwise continue to execute S722.
  • the main control board may periodically send the status query information to the protected CPU, and determine the status of the protected CPU according to the feedback of the protected CPU.
  • the protected CPU may actively report the information of its state restoration to the main control board after its state is restored.
  • S724 The main control board switches the services of the protected CPU back to the protected CPU.
  • the main control board determines that the state of the protected CPU is restored, it can switch the services that originally belonged to the protected CPU back to the protected CPU for processing.
  • These services include the services carried by the spare CPU as well as the priority due to spare CPU resources. The business that has not been issued to the standby by the main control board.
  • the main control board determines the spare CPU for each CPU in the distributed service processing system in advance. Therefore, when a CPU fails, the main control board can quickly query the spare CPU of the CPU , So that after a CPU failure, the business migration on the failed CPU is realized as soon as possible, avoiding long-term business interruption and user experience problems.
  • This embodiment provides a storage medium that can store one or more computer programs that can be read, compiled, and executed by one or more processors.
  • the storage medium can store At least one of the first service protection program, the second service protection program, and the third service protection program, wherein the first service protection program can be executed by one or more processors to implement any of the service protection methods introduced in the foregoing embodiments
  • the network device 80 includes a processor 81, a memory 82, and a communication bus 83 for connecting the processor 81 and the memory 82, where the memory 82 may be the aforementioned storage
  • the storage medium of the first business protection program The processor 81 may read the first service protection program, compile it, and execute the process on the protected CPU side in the service protection method introduced in the foregoing embodiment:
  • the processor 81 collects tunnel information and session information on the CPU, where the tunnel information is the information of the tunnel carried by the CPU, and the session information is the information of the session carried by the tunnel;
  • the processor 81 sends the tunnel information and session information to the main control board.
  • the tunnel information and the session information are used by the main control board to send the services on the CPU to the standby CPU for the standby CPU to continue processing the services on the CPU.
  • the processor 81 may collect tunnel information and session information on the local CPU when the local CPU needs to be reset.
  • the processor 81 may also read the second service protection program, compile and execute the process on the main control board side in the service protection method introduced in the foregoing embodiment:
  • the processor 81 receives the tunnel information and session information sent by the protected CPU, then determines the backup CPU corresponding to the protected CPU, and sends the services of the protected CPU to the backup CPU according to the tunnel information and the session information.
  • the standby CPU is determined according to the resource vacancy information of each CPU in the distributed service processing system, and the resource vacancy information can represent the resource vacancy of the CPU.
  • the resource vacancy information of a CPU is determined according to one or more of the CPU utilization rate, the memory usage rate, the number of available tunnels, and the number of available sessions.
  • the processor 81 before the processor 81 receives the tunnel information and session information sent by the protected CPU, it also periodically determines the resource vacancy information of each CPU in the distributed service processing system, and then according to the latest obtained The resource vacancy information of each CPU determines the corresponding backup CPU for each CPU, and stores the mapping relationship between each CPU and the corresponding backup CPU. When it is necessary to determine the backup CPU corresponding to the protected CPU, query the backup CPU corresponding to the protected CPU according to the mapping relationship.
  • the processor 81 before the processor 81 receives the tunnel information and session information sent by the protected CPU, it also periodically determines the resource vacancy information of each CPU in the distributed service processing system. When it is necessary to determine the backup CPU corresponding to the protected CPU, the corresponding backup CPU is determined for the protected CPU according to the resource vacancy information of each CPU obtained last time.
  • the processor 81 when the processor 81 determines the backup CPU corresponding to the protected CPU, it sends a request for spare information to other CPUs in the distributed service processing system except the protected CPU, and then receives each CPU According to the vacant information request, report its own resource vacancy information and determine the corresponding spare CPU for the protected CPU according to the resource vacancy information of each CPU.
  • the processor 81 sends the services of the protected CPU to the backup CPU according to the tunnel information and the session information, after the protected CPU returns to the normal operating state, it switches the services belonging to the protected CPU back to the protected CPU.
  • the processor 81 will determine whether the resource vacancy of the backup CPU is sufficient to carry all the services on the protected CPU according to the resource vacancy information of the backup CPU; if not, it will screen the services of the protected CPU and will screen it The reserved business is delivered to the standby CPU. If it is determined that the spare CPU resources are sufficient to carry all the services on the protected CPU, the processor 81 directly issues all the services of the protected CPU to the spare CPU.
  • the processor 81 uses the tunnel as a unit to determine the protection sensitivity corresponding to each tunnel.
  • the protection sensitivity represents the degree of protection required for the services in the tunnel. The higher the protection sensitivity, the degree of protection required for the services in the tunnel. The higher is; after the protection sensitivity is determined, the processor 81 selects reserved services in the order of protection sensitivity from high to low according to the spare CPU resource vacancy.
  • the processor 81 may determine the protection sensitivity of each tunnel according to the tunnel state Ts of the tunnel, the tunnel keep-alive time Tk, and the session volume Tn in the tunnel.
  • the processor 81 may also read the third service protection program, compile and execute the process on the standby CPU side in the service protection method introduced in the foregoing embodiment:
  • the processor 81 reports the resource vacancy information of the CPU to the main control board, and then receives the services of the protected CPU sent by the main control board, and processes the services of the protected CPU.
  • This embodiment also provides a distributed service processing system, which includes a main control board and multiple CPUs.
  • the main control board is a network device for the processor 81 to execute the second service protection program, and part of the multiple CPUs is the aforementioned processing
  • the processor 81 is a network device that executes the first service protection program, and part of it is a network device that the processor 81 executes the third service protection program.
  • a CPU when a CPU cannot continue to process service, it can send its own tunnel information and session information to the main control board, allowing the main control board to process distributed services from Determine a spare CPU for yourself among other CPUs in the system, and let the spare CPU continue to process your own services, so as to avoid interruption of all services carried by itself due to your own reasons, which will affect the user experience, which is beneficial to improvement
  • the disaster tolerance performance of the distributed business processing system enhances system stability and enhances the user's business experience.
  • the functional modules/units in the system, and the device can be implemented as software (which can be implemented by program code executable by a computing device) , Firmware, hardware and their appropriate combination.
  • the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may consist of several physical components. The components are executed cooperatively.
  • Some physical components or all physical components can be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit .
  • the computer-readable medium may include computer storage Medium (or non-transitory medium) and communication medium (or temporary medium).
  • computer storage medium includes volatile and non-volatile memory implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data).
  • flexible, removable and non-removable media are examples of flexible, removable and non-removable media.
  • Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassette, tape, magnetic disk storage or other magnetic storage devices, or Any other medium used to store desired information and that can be accessed by a computer.
  • communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media . Therefore, the present invention is not limited to any specific combination of hardware and software.

Abstract

Embodiments of the present invention provide a service protection method, a network device, a distributed service processing system, and a storage medium. The service protection method comprises: collecting tunnel information and session information on a present CPU, wherein the tunnel information is the information of a tunnel carried by the present CPU, and the session information is the information of a session carried by the tunnel; and transmitting the tunnel information and the session information to a main control board, wherein the tunnel information and the session information are used for transmitting a service on the present CPU to a standby CPU by the main control board, so that the standby CPU continues to process the service on the present CPU.

Description

一种业务保护方法、网络设备、分布式业务处理系统及存储介质Business protection method, network equipment, distributed business processing system and storage medium
相关申请的交叉引用Cross references to related applications
本申请基于申请号为201910586539.2、申请日为2019年07月01日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is filed based on a Chinese patent application with an application number of 201910586539.2 and an application date of July 1, 2019, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated into this application by reference.
技术领域Technical field
本发明涉及通信领域,尤其涉及一种业务保护方法、网络设备、分布式业务处理系统及存储介质。The present invention relates to the field of communications, in particular to a service protection method, network equipment, distributed service processing system and storage medium.
背景技术Background technique
L2TP(Layer 2 Tunneling Protocol)是一种二层隧道协议隧道技术,是建立安全VPN(Virtual Private Network,虚拟专用网络)的基本技术之一。LAC(L2TP Access Concentrator,L2TP访问集中器)为L2TP的接入设备,它提供各种用户接入的AAA(Authentication,Authorization,Accounting,认证授权计费)服务,发起隧道和会话连接的功能,以及对VPN用户的代理认证功能。LNS(L2TP Network Server,L2TP网络服务器)为L2TP企业侧的VPN服务器,该服务器完成对用户的最终授权和验证,接收来自LAC的隧道和连接请求,并建立连接LNS和用户的PPP通道。L2TP (Layer 2 Tunneling Protocol) is a layer 2 tunneling protocol tunneling technology, which is one of the basic technologies for establishing a secure VPN (Virtual Private Network, virtual private network). LAC (L2TP Access Concentrator, L2TP access concentrator) is an access device for L2TP, which provides AAA (Authentication, Authorization, Accounting, authentication and authorization and accounting) services for various user access, functions of initiating tunnels and session connections, and Proxy authentication function for VPN users. LNS (L2TP Network Server, L2TP network server) is a VPN server on the L2TP enterprise side. The server completes the final authorization and verification of users, receives tunnels and connection requests from LAC, and establishes a PPP channel connecting LNS and users.
在实际应用中,LNS设备与LAC设备负载的L2TP业务均呈逐渐增加趋势,为了满足日益增加业务负荷需求,提出了基于分布式架构的L2TP多核业务处理,多核业务板极大地分散分的主控板的压力,负载能力强。不过,一旦多核业务板出现需要重启才能恢复的故障或因芯片等器件造成的不可恢复的故障时,L2TP业务就会中断,从而对电信运营上造成损失。In practical applications, the L2TP services carried by LNS equipment and LAC equipment are gradually increasing. In order to meet the increasing service load demand, the L2TP multi-core service processing based on the distributed architecture is proposed. The multi-core service board greatly decentralizes the main control. The board has strong pressure and load capacity. However, once the multi-core service board has a fault that needs to be restarted to recover or an unrecoverable fault caused by chips and other devices, the L2TP service will be interrupted, which will cause losses to telecommunications operations.
发明内容Summary of the invention
本发明实施例提供的业务保护方法、网络设备、分布式业务处理系统及存储介质,在至少一定程度上解决的技术问题是:相关技术中基于分布式架构的L2TP多核业务容易因为多核业务板或者多核业务板上芯片的故障而中断,从而影响用户通信业务的体验。The service protection method, network equipment, distributed service processing system, and storage medium provided by the embodiments of the present invention solve the technical problem to a certain extent: the L2TP multi-core service based on the distributed architecture in the related technology is easy to be caused by the multi-core service board or The failure of the chip on the multi-core service board is interrupted, thereby affecting the user's communication service experience.
为在至少一定程度上解决上述技术问题,本发明实施例提供一种业务保护方法,包括:In order to solve the foregoing technical problems to at least a certain extent, an embodiment of the present invention provides a service protection method, including:
收集本CPU上的隧道信息与会话信息,所述隧道信息为本CPU所承载隧道的信息,所述会话信息为所述隧道所承载会话的信息;Collect tunnel information and session information on the CPU, where the tunnel information is information of a tunnel carried by the CPU, and the session information is information of a session carried by the tunnel;
将所述隧道信息与所述会话信息发送给主控板,所述隧道信息与所述会话信息用于所述主控板将本CPU上的业务发送给备用CPU,供所述备用CPU继续处理本CPU上的业务。The tunnel information and the session information are sent to the main control board, and the tunnel information and the session information are used by the main control board to send the services on the CPU to the standby CPU for the standby CPU to continue processing Business on this CPU.
本发明实施例还提供一种业务保护方法,包括:The embodiment of the present invention also provides a service protection method, including:
接收被保护CPU发送的隧道信息与会话信息,所述隧道信息为所述被保护CPU所承载的隧道的信息,所述会话信息为所述隧道所承载会话的信息;Receiving tunnel information and session information sent by a protected CPU, where the tunnel information is information about a tunnel carried by the protected CPU, and the session information is information about a session carried by the tunnel;
确定所述被保护CPU对应的备用CPU,所述备用CPU与所述被保护CPU属于同一分布式业务处理系统;Determine a backup CPU corresponding to the protected CPU, where the backup CPU and the protected CPU belong to the same distributed service processing system;
根据所述隧道信息与所述会话信息将所述被保护CPU的业务发送给所述备用CPU。Sending the service of the protected CPU to the standby CPU according to the tunnel information and the session information.
本发明实施例还提供一种业务保护方法,包括:The embodiment of the present invention also provides a service protection method, including:
向主控板上报本CPU的资源空余信息;Report the CPU's resource free information to the main control board;
接收所述主控板发送的被保护CPU的业务;Receiving the protected CPU service sent by the main control board;
对所述被保护CPU的业务进行处理。Process the services of the protected CPU.
本发明实施例还提供一种网络设备,该网络设备包括处理器、存储器及通信总线;The embodiment of the present invention also provides a network device, which includes a processor, a memory, and a communication bus;
所述通信总线用于实现处理器和存储器之间的连接通信;The communication bus is used to realize connection and communication between the processor and the memory;
所述处理器用于执行存储器中存储的第一业务保护程序,以实现上述第一种业务保护方法的步骤;或,所述处理器用于执行存储器中存储的第二业务保护程序,以实现上述第二种业务保护方法的步骤;所述处理器用于执行存储器中存储的第三业务保护程序,以实现上述第三种业务保护方法的步骤。The processor is configured to execute the first service protection program stored in the memory to implement the steps of the foregoing first service protection method; or, the processor is configured to execute the second service protection program stored in the memory to implement the foregoing first service protection program. The steps of the two service protection methods; the processor is used to execute the third service protection program stored in the memory to realize the steps of the third service protection method.
本发明实施例还提供一种分布式业务处理系统,其包括主控板以及多个CPU,所述主控板为上述处理器执行第二业务保护程序的网络设备,所述多个CPU中的部分为上述处理器执行第一业务保护程序的网络设备,部分为上述处理器执行第三业务保护程序的网络设备。The embodiment of the present invention also provides a distributed service processing system, which includes a main control board and a plurality of CPUs. The main control board is a network device that executes a second service protection program by the above-mentioned processor. Part is a network device where the processor executes the first service protection program, and part is a network device where the processor executes the third service protection program.
本发明实施例还提供一种存储介质,该存储介质存储有第一业务保护程序、第二业务保护程序以及第三业务保护程序中的至少一个,所述第一业务保护程序可被一个或者多个处理器执行,以实现上述第一种业务保护方法的步骤;所述第二业务保护程序可被一个或者多个处理器执行,以实现上述第二种业务保护方法的步骤;所述第三业务保护程序可被一个或者多个处理器执行,以实现上述第三种业务保护方法的步骤。An embodiment of the present invention also provides a storage medium that stores at least one of a first service protection program, a second service protection program, and a third service protection program. The first service protection program may be configured by one or more The second service protection program can be executed by one or more processors to implement the steps of the second service protection method; the third The service protection program may be executed by one or more processors to implement the steps of the third service protection method described above.
本发明其他特征和相应的有益效果在说明书的后面部分进行阐述说明,且应当理解,至少部分有益效果从本发明说明书中的记载变的显而易见。Other features and corresponding beneficial effects of the present invention are described in the latter part of the specification, and it should be understood that at least some of the beneficial effects will become apparent from the description in the specification of the present invention.
附图说明Description of the drawings
图1为本发明实施例一中提供的业务保护方法的一种交互流程图;FIG. 1 is an interaction flowchart of the service protection method provided in Embodiment 1 of the present invention;
图2为本发明实施例一中提供的分布式业务处理系统的一种架构图;2 is an architecture diagram of a distributed service processing system provided in Embodiment 1 of the present invention;
图3为本发明实施例一中示出的LAC设备与LNS设备侧CPU建立会话的一种交互流程图;3 is a flow chart of an interaction between the LAC device and the CPU on the LNS device side shown in the first embodiment of the present invention;
图4为本发明实施例一中示出的方案一中确定备用CPU的一种流程图;FIG. 4 is a flow chart of determining a backup CPU in the first solution shown in the first embodiment of the present invention;
图5为本发明实施例一中示出的方案三中确定备用CPU的一种流程图;FIG. 5 is a flowchart of determining a backup CPU in the third solution shown in the first embodiment of the present invention;
图6为本发明实施例二中提供的业务保护方法被保护CPU侧的一种流程图;6 is a flowchart of the protected CPU side of the service protection method provided in the second embodiment of the present invention;
图7为本发明实施例二中提供的业务保护方法主控板侧的一种流程图;FIG. 7 is a flow chart on the main control board side of the service protection method provided in Embodiment 2 of the present invention;
图8为本发明实施例三中提供的网络设备的一种硬件结构示意图。FIG. 8 is a schematic diagram of a hardware structure of a network device provided in Embodiment 3 of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,下面通过具体实施方式结合附图对本发明实施例作进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions, and advantages of the present invention clearer, the following further describes the embodiments of the present invention in detail through specific implementations in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention.
实施例一:Example one:
随着通信应用的扩展,LNS设备和LAC设备所承载的业务量越来越多,负载压力也越来越大,以LNS为例来看:一台LNS设备可能和多台LAC设备建立VPN隧道连接,所以,LNS设备承载了很多的L2TP业务,传统的集中式L2TP部署在主控板上,但因为主控板的 CPU资源有限、处理效率不高,因此不能满足日益增加业务负荷需求。为了解决这个问题,可以将LNS设备进行分布式部署,利用多个分布式部署的业务板来承载原本主控板需要承担的业务,进而分散主控板的压力。在一块业务板上,多个CPU负责处理LNS业务,但当其中一个CPU故障的时候,该CPU所承载的业务就会因此受到影响而被迫中断,从而影响用户体验,对此,本实施例提供一种业务保护方法,请参见图1示出的业务保护方法的一种交互流程图:With the expansion of communication applications, LNS equipment and LAC equipment carry more and more services, and the load pressure is also increasing. Take LNS as an example: one LNS device may establish VPN tunnels with multiple LAC devices Therefore, the LNS equipment carries a lot of L2TP services. The traditional centralized L2TP is deployed on the main control board, but because the main control board has limited CPU resources and low processing efficiency, it cannot meet the increasing business load demand. In order to solve this problem, the LNS equipment can be deployed in a distributed manner, and multiple distributed service boards are used to carry the services that the original main control board needs to undertake, thereby distributing the pressure of the main control board. On a service board, multiple CPUs are responsible for processing LNS services, but when one of the CPUs fails, the services carried by the CPU will be affected and forced to be interrupted, thereby affecting the user experience. For this, this embodiment To provide a service protection method, please refer to an interactive flowchart of the service protection method shown in Figure 1:
S102:被保护CPU收集本CPU上的隧道信息与会话信息。S102: The protected CPU collects tunnel information and session information on the CPU.
在本实施例中,CPU分为被保护CPU与备用CPU,其中被保护CPU就是指所承载业务被备用CPU继续处理,得到保护的CPU,被保护CPU通常可以是指因为各种故障而不得不中指业务处理的故障CPU。而备用CPU则是指对被保护CPU的业务进行继续处理的CPU。可以理解的是,任何一个CPU,都可能在某些情景下作为被保护CPU,在另一些情境下作为备用CPU。In this embodiment, CPUs are divided into protected CPUs and standby CPUs. The protected CPUs refer to the services carried by the standby CPUs to continue processing and the protected CPUs. The protected CPUs usually refer to those that have to Middle refers to the faulty CPU of business processing. The spare CPU refers to the CPU that continues to process the services of the protected CPU. It is understandable that any CPU may serve as a protected CPU in some scenarios, and as a backup CPU in other scenarios.
隧道信息是指被保护CPU所承载隧道的信息,而会话信息则是指各隧道中所承载的会话的信息。通常,一个CPU可以承载多个隧道,而一个隧道中又可以承载多个会话。通常情况下,被保护CPU是在自己出现故障,不能继续对自身所承载的业务进行处理的情况下收集本CPU上的隧道信息与会话信息的。例如,在本实施例的一些示例当中,一个CPU确定自己需要进行复位以应对当前的故障,则该CPU可以确定自己是被保护CPU,因此收集自身的隧道信息与会话信息。The tunnel information refers to the information of the tunnel carried by the protected CPU, and the session information refers to the information of the session carried in each tunnel. Generally, one CPU can carry multiple tunnels, and one tunnel can carry multiple sessions. Generally, the protected CPU collects tunnel information and session information on the CPU when it fails and cannot continue to process the services carried by it. For example, in some examples of this embodiment, if a CPU determines that it needs to be reset to deal with the current failure, the CPU can determine that it is a protected CPU, and thus collect its own tunnel information and session information.
S104:被保护CPU将隧道信息与会话信息发送给主控板。S104: The protected CPU sends the tunnel information and the session information to the main control board.
被保护CPU收集到自己的隧道信息和会话信息之后,可以将这些信息发送给主控板。图2示出了一种分布式业务处理系统,请参见图2:After the protected CPU collects its own tunnel information and session information, it can send these information to the main control board. Figure 2 shows a distributed business processing system, please refer to Figure 2:
分布式业务处理系统2当中包括主控板21、第一业务板22以及第二业务板23、报文收发处理板24。其中,主控板21负责管理整个分布式业务处理系统2的系统管理、协议报文处理及路由管理,报文收发处理板24负责接口流量管理、报文转发、交换流量管理,能够将L2TP报文根据主控板21设定的处理规则分发到各个业务板上。在本实施例中,各个业务板相互独立,并行地进行分布式处理,从而提高系统的吞吐量。第一业务板22上包括多个CPU,第二业务板23上也包括多个CPU。这里假定分布式业务处理系统2是分布式的LNS设备,请参见图3示出的L2TP的建立流程示意图:The distributed service processing system 2 includes a main control board 21, a first service board 22 and a second service board 23, and a message transceiving processing board 24. Among them, the main control board 21 is responsible for system management, protocol message processing, and routing management of the entire distributed service processing system 2, and the message receiving and sending processing board 24 is responsible for interface traffic management, message forwarding, and switching traffic management, and can transmit L2TP reports. The document is distributed to each service board according to the processing rules set by the main control board 21. In this embodiment, each service board is independent of each other and performs distributed processing in parallel, thereby improving the throughput of the system. The first service board 22 includes multiple CPUs, and the second service board 23 also includes multiple CPUs. It is assumed that the distributed service processing system 2 is a distributed LNS device. Please refer to the schematic diagram of the L2TP establishment process shown in Fig. 3:
S302:LAC发起隧道建立的请求SCCRQ报文;S302: LAC initiates a tunnel establishment request SCCRQ message;
S304:CPU应答SCCRP报文;S304: The CPU responds to the SCCRP message;
S306:LAC在收到应答后向LNS返回确认SCCCN报文;S306: LAC returns a confirmation SCCCN message to LNS after receiving the response;
至此,LAC与CPU之间的隧道建立。At this point, the tunnel between the LAC and the CPU is established.
S308:LAC发起会话建立请求ICRQ报文;S308: The LAC initiates a session establishment request ICRQ message;
S310:CPU收到请求后返回应答ICRP报文;S310: The CPU returns a response ICRP message after receiving the request;
S312:LAC收到应答后返回确认ICCN报文;S312: LAC returns a confirmation ICCN message after receiving the response;
至此,会话建立。会话建立后,LNS可与用户进行PPP(Point to Point Protocol,点对点协议)交互过程,并为用户分配IP地址,随后,用户即可访问网络。At this point, the session is established. After the session is established, the LNS can perform a PPP (Point to Point Protocol) interaction process with the user, and assign an IP address to the user, and then the user can access the network.
S106:主控板确定被保护CPU对应的备用CPU。S106: The main control board determines the backup CPU corresponding to the protected CPU.
在本实施例中,主控板接收到被保护CPU发送的隧道信息与会话信息之后,可以确定 被保护CPU暂时不能继续处理其业务,因此主控板需要为被保护CPU确定备用CPU,以便备用CPU能够对被保护CPU上不能进行的业务进行处理,从而实现对这些业务的保护,避免这些业务被中断。In this embodiment, after the main control board receives the tunnel information and session information sent by the protected CPU, it can be determined that the protected CPU cannot continue to process its services temporarily. Therefore, the main control board needs to determine a backup CPU for the protected CPU for backup. The CPU can process services that cannot be performed on the protected CPU, thereby protecting these services and avoiding interruption of these services.
在本实施例中,主控板为被保护CPU所选择确定的备用CPU也是分布式业务处理系统中的CPU,而分布式业务处理系统中的CPU原本也承载一些业务,因此,备用CPU是利用自己在处理自身业务同时剩余的资源来保护被保护CPU上的业务。所以,主控板在确定被保护CPU对应的备用CPU时,会参考各CPU的资源空余信息,也即一个CPU在处理自身业务的同时还能额外承载的业务的能力。In this embodiment, the backup CPU selected by the main control board for the protected CPU is also the CPU in the distributed service processing system, and the CPU in the distributed service processing system originally carries some services. Therefore, the backup CPU uses While processing its own business, the remaining resources are used to protect the business on the protected CPU. Therefore, when determining the backup CPU corresponding to the protected CPU, the main control board will refer to the resource vacancy information of each CPU, that is, the ability of a CPU to handle additional services while processing its own services.
在本实施例的一些示例当中,对以一个CPU,主控板可以根据该CPU的CPU利用率、内存使用率、可用隧道数以及可用会话数中的至少一个来确定其资源空余信息。In some examples of this embodiment, for a CPU, the main control board may determine its resource vacancy information according to at least one of the CPU utilization rate, the memory usage rate, the number of available tunnels, and the number of available sessions of the CPU.
PRI=W1*(1-CPURate)+W2*(1-MemRate)+W3*T+W4*S;PRI=W1*(1-CPURate)+W2*(1-MemRate)+W3*T+W4*S;
其中,PRI能够表征CPU的资源空余情况,PRI值越高,则CPU的资源空余越多,反之,则CPU的资源空余越少。因此,一个CPU的PRI值越高,则该CPU被选择作为备用CPU的概率也越高。CPU Rate就是指CPU利用率,W1即为CPU空余率的权重;Mem Rate为内存使用率,W2则为内存剩余率的权重;T可以表征可用隧道数的多少,W3则为可用隧道数的权重;S可以表征可用会话数的多少,W4为可用会话数的权重。Among them, PRI can characterize the vacancy of CPU resources. The higher the PRI value, the more vacant CPU resources, and vice versa, the less vacant CPU resources are. Therefore, the higher the PRI value of a CPU, the higher the probability of the CPU being selected as the backup CPU. CPU Rate refers to the CPU utilization, W1 is the weight of the CPU free rate; Mem Rate is the memory usage rate, W2 is the weight of the memory remaining rate; T can represent the number of available tunnels, and W3 is the weight of the available tunnels ; S can represent the number of available sessions, and W4 is the weight of the number of available sessions.
应当理解的是,可用会话数、可用隧道数与CPU空余率和内存剩余率的度量不一致,因此,在本实施例中,需要将可用会话数、可用隧道数进行归一化处理,从而使得四者的度量一致。例如,T的取值可以是可用隧道数与分布式业务处理系统中额定总隧道数的比值,取值区间为(0,1);S的取值是可用会话数与分布式业务处理系统中额定总会话数的比值,取值区间为(0,1)。It should be understood that the number of available sessions and the number of available tunnels are inconsistent with the measurements of the CPU vacancy rate and memory remaining rate. Therefore, in this embodiment, the number of available sessions and the number of available tunnels need to be normalized, so that four The measures of the people are consistent. For example, the value of T can be the ratio of the number of available tunnels to the rated total number of tunnels in the distributed service processing system, and the value range is (0,1); the value of S is the number of available sessions and the number of The ratio of the rated total number of sessions, the value range is (0,1).
可以理解的是,一个CPU能够作为备用CPU,除了其当前的资源空余以外,该CPU本身的状态也是非常重要的,例如,在一些情况下,虽然一个CPU还剩余很多处理资源,这些处理资源足以处理很多额外的业务,但如果该CPU自身不正常,则其也不能作为备用CPU。所以,在本实施例的一些示例当中,It is understandable that a CPU can be used as a backup CPU. In addition to its current resource vacancy, the state of the CPU itself is also very important. For example, in some cases, although a CPU still has a lot of processing resources remaining, these processing resources are sufficient A lot of extra services are processed, but if the CPU itself is abnormal, it cannot be used as a backup CPU. Therefore, in some examples of this embodiment,
PRI=Stat*[W1*(1-CPURate)+W2*(1-MemRate)+W3*T+W4*S];PRI=Stat*[W1*(1-CPURate)+W2*(1-MemRate)+W3*T+W4*S];
其他字符的含义不变,Stat表示CPU的运行状态,如果Stat的值为1,则表征CPU的运行状态正常,而当CPU的运行状态异常时,Stat的值就为0。所以,无论一个CPU的CPU空余率、内存剩余率以及可用隧道数。可用会话数如何,只要CPU的运行状态不正常,则该CPU的PRI值就为0。The meaning of other characters remains the same. Stat represents the running status of the CPU. If the value of Stat is 1, it means that the running status of the CPU is normal. When the running status of the CPU is abnormal, the value of Stat is 0. Therefore, regardless of the CPU vacancy rate, memory remaining rate and the number of available tunnels for a CPU. As long as the number of available sessions is abnormal, the PRI value of the CPU is 0.
在本实施例的另一些示例当中,主控板可以仅根据CPU的CPU空余率或者内存剩余率来确定该CPU的资源空余信息。除此以外,主控板也可以仅根据一个CPU的可用隧道数或者可用会话数来确定该CPU的资源空余信息。In some other examples of this embodiment, the main control board may determine the resource free information of the CPU only according to the CPU free rate or the memory free rate of the CPU. In addition, the main control board can also determine the resource vacancy information of a CPU only based on the number of available tunnels or the number of available sessions of a CPU.
毫无疑义的是,在主控板确定被保护CPU对应的备用CPU之前,主控板应当先获取到分布式业务处理系统中各CPU的资源空余信息,下面介绍几种可供参考的主控板获取各CPU资源空余信息以及确定备用CPU的方案:There is no doubt that before the main control board determines the spare CPU corresponding to the protected CPU, the main control board should first obtain the resource vacancy information of each CPU in the distributed business processing system. Here are several main control units for reference The board obtains the spare information of each CPU resource and determines the scheme of the spare CPU:
方案一:Option One:
主控板周期性获取分布式业务处理系统中各CPU的资源空余信息,并周期性地为各CPU确定备用CPU,请结合图4:The main control board periodically obtains the resource vacancy information of each CPU in the distributed service processing system, and periodically determines the spare CPU for each CPU, please refer to Figure 4:
S402:主控板周期性确定分布式业务处理系统中各CPU的资源空余信息。S402: The main control board periodically determines the resource vacancy information of each CPU in the distributed service processing system.
在这种方案当中,分布式业务处理系统中的各个CPU会定时地向主控板上报自己的资源空余信息,在本实施例的一些示例当中,CPU向主控板上报的能够表征自己资源空余的信息包括自己的CPU利用率、内存利用率以及可用隧道数、可用会话数等。In this solution, each CPU in the distributed service processing system periodically reports its own resource vacancy information to the main control board. In some examples of this embodiment, the CPU reports to the main control board to indicate its own resource vacancy. The information includes its own CPU utilization, memory utilization, the number of available tunnels, and the number of available sessions.
在本实施例中,因为分布式业务处理系统中存在很多CPU,而这些CPU基本都会向主控板上报自己的资源空余信息,因为,为了保证主控板在收到一个上报信息之后,能够确定该上报信息对应的资源空余信息属于哪一个CPU,因此,在CPU向主控板上报自己的资源空余信息的时候,会在上报信息中携带能够在分布式业务处理系统中唯一标识自己的信息,例如在本实施例的一种示例当中,CPU可以通过L(a)N(b)标识来唯一表征自己的身份,其中L代表“Location”,能够表征该CPU所处的业务板,其中a就是该CPU所处业务板的编号,而N则表示该CPU在其所处的业务板上的序号,b就是该CPU在业务板上的唯一标识。通过L(a)N(b)标识,主控板可以确定自己受到的上报信息中所携带的资源空余信息属于哪一个业务板上的哪一个CPU。In this embodiment, because there are many CPUs in the distributed service processing system, these CPUs basically report their own resource vacancy information to the main control board, because, in order to ensure that the main control board can determine after receiving a reported message The resource vacancy information corresponding to the reported information belongs to which CPU. Therefore, when the CPU reports its own resource vacancy information to the main control board, the reported information will carry information that can uniquely identify itself in the distributed business processing system. For example, in an example of this embodiment, the CPU can uniquely characterize its identity through the L(a)N(b) identification, where L stands for "Location", which can characterize the business board where the CPU is located, where a is The number of the service board where the CPU is located, and N represents the serial number of the CPU on the service board where it is located, and b is the unique identifier of the CPU on the service board. Through the L(a)N(b) identification, the main control board can determine which CPU on which service board the resource vacancy information carried in the report information it receives belongs to.
S404:主控板根据最近一次获取的各CPU的资源空余信息为各CPU确定出对应的备用CPU,并存储各CPU与对应备用CPU间的映射关系。S404: The main control board determines the corresponding standby CPU for each CPU according to the resource vacancy information of each CPU acquired last time, and stores the mapping relationship between each CPU and the corresponding standby CPU.
在该示例当中,每当主控板重新获取一次各CPU的资源空余信息,其就会重新为各CPU配置一次备用CPU。对于一个CPU,主控板就是从该CPU所属的分布式业务处理系统中的其他CPU中为其选择备用CPU,因此,一个CPU与其备用CPU都是同属一个分布式业务处理系统的CPU。例如,在本实施例的一种示例当中,主控板按照如下方式为分布式业务处理系统中的各CPU确定备用CPU:In this example, whenever the main control board reacquires the resource vacancy information of each CPU, it will reconfigure a spare CPU for each CPU. For a CPU, the main control board selects a backup CPU from other CPUs in the distributed service processing system to which the CPU belongs. Therefore, a CPU and its backup CPU are both CPUs belonging to the same distributed service processing system. For example, in an example of this embodiment, the main control board determines a backup CPU for each CPU in the distributed service processing system in the following manner:
主控板根据各CPU的资源空余信息计算各CPU的PRI值,计算方式请参见前面的介绍,这里不再赘述。计算出各CPU的PRI值之后,主控板确定出其中PRI值最高的一个CPU,将该CPU作为分布式业务处理系统当中除了该CPU自身以外其他所有CPU的被用CPU。另外,主控板还需要为PRI值最高的CPU选择出一个备用CPU,在一个示例当中,主控板可以将PRI值次高的CPU作为PRI值最高的CPU的备用CPU。The main control board calculates the PRI value of each CPU according to the resource vacancy information of each CPU. For the calculation method, please refer to the previous introduction and will not be repeated here. After calculating the PRI value of each CPU, the main control board determines the CPU with the highest PRI value, and uses this CPU as the CPU used by all CPUs in the distributed service processing system except the CPU itself. In addition, the main control board also needs to select a spare CPU for the CPU with the highest PRI value. In one example, the main control board can use the CPU with the next highest PRI value as the spare CPU for the CPU with the highest PRI value.
假定在一个分布式业务处理系统当中,包括CPU1、CPU2、CPU3以及CPU4和CPU5五个CPU,经过计算,CPU3的PRI值最高,其次是CPU4,因此,主控板可以确定各CPU及其对应的备用CPU之间的映射关系如表1所示:Assuming that in a distributed business processing system, there are five CPUs including CPU1, CPU2, CPU3, CPU4 and CPU5. After calculation, CPU3 has the highest PRI value, followed by CPU4. Therefore, the main control board can determine each CPU and its corresponding The mapping relationship between the standby CPUs is shown in Table 1:
表1Table 1
CPUCPU 备用CPUSpare CPU
CPU1CPU1 CPU3CPU3
CPU2CPU2 CPU3CPU3
CPU3CPU3 CPU4CPU4
CPU4CPU4 CPU3CPU3
CPU5CPU5 CPU3CPU3
主控板确定出各CPU与其备用CPU之间的映射关系之后,可以将该映射关系进行存储,以便在下次更新该映射关系之前有CPU需要被保护时使用。After the main control board determines the mapping relationship between each CPU and its standby CPU, the mapping relationship can be stored for use when a CPU needs to be protected before the mapping relationship is updated next time.
S406:主控板根据映射关系查询被保护CPU对应的备用CPU。S406: The main control board queries the backup CPU corresponding to the protected CPU according to the mapping relationship.
当有一个CPU向主控板上报自身的隧道信息和会话信息之后,主控板可以根据预先存储的映射关系查询出该CPU对应的备用CPU。在这种方案当中,当一个CPU故障之后,主控板无须临时获取分布式业务处理系统中各CPU的资源空余信息,也无须进行临时的计算,因此能够提升确定备用CPU的速度,以便更加快速地将被保护CPU的业务切换到备用CPU上,避免业务因选择备用CPU的时间过程而被影响,有利于降低用户侧对被保护CPU故障的感知度。After a CPU reports its own tunnel information and session information to the main control board, the main control board can query the standby CPU corresponding to the CPU according to the pre-stored mapping relationship. In this solution, when a CPU fails, the main control board does not need to temporarily obtain the resource vacancy information of each CPU in the distributed business processing system, nor does it need to perform temporary calculations, so it can increase the speed of determining the backup CPU for faster The service of the protected CPU is switched to the backup CPU to avoid the impact of the service due to the time process of selecting the backup CPU, which is beneficial to reduce the user's perception of the failure of the protected CPU.
可以理解的是,因为主控板是周期性为分布式业务处理系统中的各CPU确定备用CPU,因此,有时候,主控板为各CPU确定备用CPU之后,备用CPU可能并不会发生作用,因为可能在某一个周期当中,分布式业务处理系统中的各CPU均是正常的,没有出现过被保护CPU。It is understandable that because the main control board periodically determines the backup CPU for each CPU in the distributed business processing system, so sometimes, after the main control board determines the backup CPU for each CPU, the backup CPU may not work. , Because it is possible that in a certain cycle, each CPU in the distributed business processing system is normal, and there is no protected CPU.
方案二:Option II:
主控板周期性地获取分布式业务处理系统中各CPU的资源空余信息,但临时确定被保护CPU对应的备用CPU。在这些方案当中,主控板可以和方案一中一样,周期性地获取分布式业务处理系统中各CPU的资源空余信息,例如,主控板周期性地向分布式业务处理系统中各CPU发送空余信息请求,让各CPU根据空余信息请求上报自身的资源空余信息。当然,无论是方案一中还是本方案中,主控板可以不必周期性地发送空余信息请求,而是由各CPU自己进行周期监测,当上报周期到达时,各CPU自动向主控板上报自己的资源空余信息,这样可以降低主控板的负担。The main control board periodically obtains the resource vacancy information of each CPU in the distributed service processing system, but temporarily determines the backup CPU corresponding to the protected CPU. In these solutions, the main control board can periodically obtain the resource vacancy information of each CPU in the distributed service processing system, as in solution 1, for example, the main control board periodically sends to each CPU in the distributed service processing system The vacant information request allows each CPU to report its own resource vacancy information according to the vacant information request. Of course, whether it is in the first or this solution, the main control board does not need to periodically send vacant information requests, but each CPU performs periodic monitoring by itself. When the reporting period arrives, each CPU automatically reports itself to the main control board. This can reduce the burden on the main control board.
不过,在方案二中,和方案一中不同的是,主控板并不会在每一次获取到各CPU的资源空余信息之后都为每一个CPU确定备用CPU,而是将新获取到的各CPU的资源空余信息进行存储,当出现某一个CPU故障时,可以临时为该故障CPU,也即被保护CPU确定出对应的被用CPU。由于主控板不用频繁地为分布式业务处理系统中的各个CPU确定备用CPU,而且,在为被保护CPU确定备用CPU的时候,也不需要为分布式业务处理系统中的其他CPU确定备用CPU,因此可以降低对自身处理资源的占用。However, in the second solution, unlike in the first solution, the main control board does not determine the spare CPU for each CPU every time it obtains the resource vacancy information of each CPU. The CPU resource vacancy information is stored. When a certain CPU fails, the failed CPU, that is, the protected CPU, can determine the corresponding used CPU temporarily. Because the main control board does not frequently determine the backup CPU for each CPU in the distributed business processing system, and when determining the backup CPU for the protected CPU, it does not need to determine the backup CPU for other CPUs in the distributed business processing system. , So it can reduce the occupation of its own processing resources.
可以理解的是,在本方案和方案一当中,由于主控板会周期性地获取到分布式业务处理系统中各个CPU的资源空余信息,但主控板在确定备用CPU的时候却总是依据最新获取到的资源空余信息,因此,处于降低存储资源耗费的目的,主控板可以采用各CPU最新一次的资源空余信息覆盖之前的资源空余信息。It is understandable that in this scheme and scheme 1, since the main control board periodically obtains the resource vacancy information of each CPU in the distributed business processing system, the main control board always determines the spare CPU based on it. The newly acquired resource vacancy information, therefore, for the purpose of reducing the consumption of storage resources, the main control board can use the latest resource vacancy information of each CPU to overwrite the previous resource vacancy information.
方案三:third solution:
在前两个方案当中,主控板都是周期性地获取分布式业务处理系统中各CPU的资源空余信息,但在本方案当中,主控板只会在需要为某一个被保护CPU确定备用CPU的时候才会临时获取分布式业务处理系统中各CPU的资源空余信息。下面请结合图5示出的主控板为被保护CPU确定备用CPU的一种流程图:In the first two schemes, the main control board periodically obtains the resource vacancy information of each CPU in the distributed business processing system, but in this scheme, the main control board will only determine the backup for a certain protected CPU. The CPU only temporarily obtains the resource vacancy information of each CPU in the distributed business processing system. Please refer to a flow chart of determining the backup CPU for the protected CPU by the main control board shown in Figure 5 below:
S502:向分布式业务处理系统中除被保护CPU以外的其他CPU发送空余信息请求。S502: Send a request for spare information to CPUs other than the protected CPU in the distributed service processing system.
在本方案当中,由于分布式业务处理系统中的某一个CPU何时故障,分布式业务处理系统中的其他CPU是不确定的,因此,分布式业务处理系统中的其他CPU没有办法在一个被保护CPU出现后主动向主控板上报自己的空余信息请求,所以,在本实施例中,当主控板接收到一个被保护CPU发送的隧道信息与会话信息,确定需要确定备用CPU的时候,主控板可以向分布式业务处理系统中除被保护CPU以外的其他CPU发送空余信息请求,该空 余信息请求能够通知其他CPU上报自己的资源空余信息。In this solution, because when a certain CPU in the distributed business processing system fails, other CPUs in the distributed business processing system are uncertain. Therefore, other CPUs in the distributed business processing system cannot be After the protection CPU appears, it actively reports its free information request to the main control board. Therefore, in this embodiment, when the main control board receives the tunnel information and session information sent by a protected CPU and determines that the backup CPU needs to be determined, The main control board can send a spare information request to other CPUs in the distributed service processing system except the protected CPU, and the spare information request can notify other CPUs to report their own resource spare information.
S504:接收各CPU根据空余信息请求上报的自身的资源空余信息。S504: Receive its own resource vacancy information reported by each CPU according to the vacancy information request.
在分布式业务处理系统中的其他CPU接收到空余信息请求之后,会根据空余信息请求向主控板上报自身的资源空余信息,因此,主控板会接收这些CPU发送的资源空余信息。After other CPUs in the distributed service processing system receive the vacant information request, they will report their own resource vacancy information to the main control board according to the vacant information request. Therefore, the main control board will receive the resource vacancy information sent by these CPUs.
S506:根据各CPU的资源空余信息为被保护CPU确定出对应的备用CPU。S506: Determine a corresponding backup CPU for the protected CPU according to the resource vacancy information of each CPU.
在获取到除被保护CPU以外其他CPU的资源空余信息之后,主控板可以根据这些资源空余信息确定出被保护CPU的备用CPU。可以理解的是,由于此时主控板仅需要为被保护CPU选择备用CPU,因此,主控板可以直接选择资源空余信息所表征的资源空余情况最优的一个作为备用CPU。当然,在本实施例的其他一些示例当中,主控板也可以仅选择资源空余情况较优的CPU作为备用CPU,不用选择最优的一个。例如,如果主控板通过计算确定有3个CPU的资源空余情况都比较好,均足以承载被保护CPU的全部业务,那在这种情况下,主控板可以从这3个CPU中任意选择一个作为被保护CPU的备用CPU,甚至,主控板还可以选择这3个CPU中资源空余最少的一个作为备用CPU,因为这样就可以将资源空余情况较优的另外两个CPU留着,以防后续有承载业务量更大的CPU故障之后,需要选择备用CPU的情况出现。After acquiring the resource vacancy information of other CPUs except the protected CPU, the main control board can determine the backup CPU of the protected CPU according to the resource vacancy information. It is understandable that, since the main control board only needs to select a backup CPU for the protected CPU at this time, the main control board can directly select the one with the best resource vacancy situation represented by the resource vacancy information as the backup CPU. Of course, in some other examples of this embodiment, the main control board may also select only the CPU with better resource vacancy as the backup CPU, instead of selecting the optimal one. For example, if the main control board determines through calculation that the resource vacancy of 3 CPUs is better, and they are all sufficient to carry all the services of the protected CPU, then in this case, the main control board can choose from these 3 CPUs arbitrarily One is used as the backup CPU of the protected CPU, and even the main control board can choose the one with the least spare resources among the three CPUs as the spare CPU, because in this way, the other two CPUs with better spare resources can be reserved. It is necessary to select a backup CPU to prevent the subsequent failure of a CPU that carries a larger amount of traffic.
S108:主控板根据隧道信息与会话信息将被保护CPU的业务发送给备用CPU。S108: The main control board sends the services of the protected CPU to the standby CPU according to the tunnel information and the session information.
主控板确定出被保护CPU的备用CPU之后,可以根据被保护CPU上报的隧道信息与会话信息将被保护CPU的业务发送给备用CPU,以便备用CPU可以对被保护CPU的业务进行处理。可以理解的是,在一些情况下,主控板为被保护CPU所选择的备用CPU的资源空余较大,备用CPU的资源空余足以承载被保护CPU的全部业务,在这种情况下,主控板可以直接将被保护CPU的全部业务均下发给备用CPU。但在另外一些情况下,主控板所选择出来的备用CPU的资源空余不多,可能只能在处理其自身业务的同时承担被保护CPU上部分业务的处理,此时,主控板就需要从被保护CPU的业务中进行挑选,仅筛选出部分业务下发给备用CPU。After the main control board determines the backup CPU of the protected CPU, it can send the services of the protected CPU to the backup CPU according to the tunnel information and session information reported by the protected CPU, so that the backup CPU can process the services of the protected CPU. It is understandable that, in some cases, the spare CPU selected by the main control board for the protected CPU has a large reserve of resources, and the spare CPU has sufficient resources to carry all the services of the protected CPU. In this case, the main control The board can directly distribute all the services of the protected CPU to the standby CPU. However, in other cases, the spare CPU selected by the main control board has few spare resources, and may only be able to handle part of the business on the protected CPU while processing its own services. In this case, the main control board needs Select from the services of the protected CPU, and only filter out some services and deliver them to the standby CPU.
在本实施例的一些示例当中,主控板可以根据备用CPU的资源空余从被保护CPU的业务中随机选择一部分业务下发到备用CPU上。不过,毫无疑义的是,对于被保护CPU上的某一个业务,如果其没能通过选择被下发到备用CPU上,则该业务就会被中断,这自然会影响对应用户的体验。因此,在本实施例中,主控板在筛选被保护CPU的业务时,可以按照业务的重要程度等因此来选择。在本实施例的一些示例当中,主控板以隧道为单位选择被保护CPU的业务,也即,如果一个隧道被主控板选中,则该隧道上所承载的所有业务均会被下发到备用CPU上,如果一个隧道被筛除,则该隧道上所承载的所有业务都只能中断。In some examples of this embodiment, the main control board may randomly select a part of services from the services of the protected CPU according to the spare CPU resources and deliver it to the spare CPU. However, there is no doubt that for a certain service on the protected CPU, if it fails to be sent to the standby CPU by selection, the service will be interrupted, which will naturally affect the experience of the corresponding user. Therefore, in this embodiment, when the main control board screens the services of the protected CPU, it can be selected according to the importance of the services. In some examples of this embodiment, the main control board selects the services of the protected CPU in units of tunnels, that is, if a tunnel is selected by the main control board, all services carried on the tunnel will be delivered to On the standby CPU, if a tunnel is filtered out, all services carried on the tunnel can only be interrupted.
下面介绍一种以隧道为单位选择业务的方案:The following introduces a scheme for selecting services based on tunnels:
主控板可以根据隧道的隧道状态Ts、隧道保活时间Tk、隧道内的会话量Tn确定各隧道的保护敏感度。例如,The main control board can determine the protection sensitivity of each tunnel according to the tunnel state Ts, the tunnel keep-alive time Tk, and the session volume Tn in the tunnel. E.g,
Sent(Tid)=Ts*Tk*Tn;Sent(Tid)=Ts*Tk*Tn;
其中Tid是指隧道编号,而Sent则是指一个隧道对应的保护敏感度,从上述公式中可以看出,一个隧道的保护敏感度等于该隧道对应的隧道状态、隧道保活时间以及隧道内会话量三者的乘积。毫无疑义的是,在本实施例的其他一些示例当中,主控板也可以采用其他方式来计算各隧道对应的保护敏感度,或者是采用其他方式筛选被保护CPU的业务。Among them, Tid refers to the tunnel number, and Sent refers to the protection sensitivity corresponding to a tunnel. From the above formula, it can be seen that the protection sensitivity of a tunnel is equal to the corresponding tunnel status, tunnel keep-alive time, and session in the tunnel. Measure the product of the three. There is no doubt that in some other examples of this embodiment, the main control board may also use other methods to calculate the protection sensitivity corresponding to each tunnel, or use other methods to filter the services of the protected CPU.
S110:备用CPU对被保护CPU的业务进行处理。S110: The backup CPU processes the services of the protected CPU.
备用CPU接收到主控板下发的被保护CPU的业务之后,可以对这些业务进行处理。可以理解的是,由于备用CPU本身也有自己的业务需要处理,因此,本实施例提供的备用CPU实际上是在利用自己的多余资源对被保护业务进行保护,所以,本实施例提供的业务保护方案实际上是一种冗余保护方案。After the standby CPU receives the services of the protected CPU issued by the main control board, it can process these services. It is understandable that, since the backup CPU itself has its own services to be processed, the backup CPU provided in this embodiment is actually using its own redundant resources to protect the protected services. Therefore, the service protection provided by this embodiment is The scheme is actually a redundant protection scheme.
可以理解的是,当被保护CPU的故障恢复以后,主控板有可以将下发到备用CPU上的业务重新切回已经恢复的被保护CPU上,让被保护CPU继续对其原本的业务进行处理。自此,被保护CPU与备用CPU之间的保护与被保护的关系就可以解除了。It is understandable that after the failure of the protected CPU is restored, the main control board can switch back the services issued to the standby CPU back to the restored protected CPU, allowing the protected CPU to continue its original services. deal with. Since then, the relationship between protection and protection between the protected CPU and the standby CPU can be lifted.
本发明实施例提供的业务保护方法,在CPU无法继续处理自身所承载业务时,由主控板为该CPU选择一个备用CPU,以便继续处理故障CPU的全部或部分业务,从而减少故障CPU的故障给用户业务带来的影响,增强用户体验。In the service protection method provided by the embodiment of the present invention, when the CPU cannot continue to process the services carried by itself, the main control board selects a backup CPU for the CPU so as to continue to process all or part of the services of the failed CPU, thereby reducing the failure of the failed CPU The impact on user business and enhance user experience.
由于选择备用CPU的时候是基于各CPU的资源空余情况进行的,因此,可以选择资源空余较多的CPU作为备用CPU,进而让备用CPU尽可能多地承担故障CPU上的业务。Since the selection of the backup CPU is based on the resource vacancy of each CPU, a CPU with more resources can be selected as the backup CPU, so that the backup CPU can undertake as much business on the failed CPU as possible.
另外,当备用CPU无法承载故障CPU上全部的业务时,主控板可以对备用CPU需要承载的故障CPU的业务进行筛选,避免备用CPU的负载过大,影响备用CPU自身业务的问题。In addition, when the backup CPU cannot carry all the services on the faulty CPU, the main control board can filter the services of the faulty CPU that the backup CPU needs to carry to avoid the problem of excessive load on the backup CPU and affecting the backup CPU's own services.
实施例二:Embodiment two:
本实施例将结合一些示例继续对前述业务保护方法进行介绍,请参见图6示出的业务保护方法一种流程图:This embodiment will continue to introduce the foregoing service protection method in combination with some examples. Please refer to a flowchart of the service protection method shown in FIG. 6:
S602:CPU确定自身不能继续处理业务。S602: The CPU determines that it cannot continue processing services.
在本实施例中,CPU如果发生了需要复位的故障,或者发生了暂时无法恢复的故障,则CPU可以确定自己当前无法继续进行业务处理。In this embodiment, if the CPU has a fault that needs to be reset, or a fault that cannot be recovered temporarily occurs, the CPU can determine that it cannot continue to perform business processing currently.
S604:CPU收集自身的隧道信息与会话信息。S604: The CPU collects its own tunnel information and session information.
在本实施例中,CPU收集的隧道信息是指该CPU所承载隧道的信息,而会话信息则是指各隧道中所承载的会话的信息。一个CPU可以承载多个隧道,而一个隧道中又可以承载多个会话。In this embodiment, the tunnel information collected by the CPU refers to the information of the tunnel carried by the CPU, and the session information refers to the information of the session carried in each tunnel. One CPU can carry multiple tunnels, and one tunnel can carry multiple sessions.
S606:CPU将自身的隧道信息与会话信息发送给主控板。S606: The CPU sends its own tunnel information and session information to the main control board.
CPU收集到自己的隧道信息和会话信息之后,可以将这些信息发送给主控板。After the CPU collects its own tunnel information and session information, it can send these information to the main control board.
在本实施例的一些示例当中,在CPU未发生故障的时候,CPU还会定时向主控板上报自身的资源空余信息,或者是在主控板的请求下向主控板上报自身的资源空余信息。CPU上报的资源空余信息包括但不限于该CPU自身的L(a)N(b)标识、运行状态标记、CPU利用率、内存利用率、可用隧道资源数据、可用会话资源数据等几种。CPU上报的这些资源空余信息可以供主控板在其他CPU故障之后确定本CPU是否适合做故障CPU的备用CPU。In some examples of this embodiment, when the CPU does not fail, the CPU will report its own resource vacancy information to the main control board regularly, or report its own resource vacancy to the main control board at the request of the main control board. information. The resource vacancy information reported by the CPU includes, but is not limited to, the L(a)N(b) identifier of the CPU itself, running status flag, CPU utilization, memory utilization, available tunnel resource data, available session resource data, etc. The resource vacancy information reported by the CPU can be used by the main control board to determine whether the CPU is suitable for the backup CPU of the failed CPU after other CPUs fail.
请接续结合图7示出的业务处理方法中主控板侧的流程:Please continue with the process on the main control board side in the service processing method shown in Figure 7:
S702:主控板接收分布式业务处理系统中各CPU定时上报的资源空余信息。S702: The main control board receives resource vacancy information regularly reported by each CPU in the distributed service processing system.
在本实施例中,主控板采用实施例一中方案一对应的方式确定分布式业务处理系统中各CPU对应的被用CPU,因此,主控板可以定时获取各CPU上报的资源空余信息。In this embodiment, the main control board determines the used CPU corresponding to each CPU in the distributed service processing system by using the scheme-one correspondence method in the first embodiment. Therefore, the main control board can periodically obtain the resource vacancy information reported by each CPU.
S704:主控板根据最新上报的资源空余信息为分布式业务处理系统中各个CPU确定出备用CPU。S704: The main control board determines a backup CPU for each CPU in the distributed service processing system according to the latest reported resource vacancy information.
主控板在获取到分布式业务处理系统中各CPU上报的资源空余信息之后,可以根据CPU 的CPU利用率、内存使用率、可用隧道数以及可用会话数中的一个或多个确定出该CPU的PRI值,然后基于各CPU的PRI值确定出各个CPU对应的备用CPU。After the main control board obtains the resource vacancy information reported by each CPU in the distributed business processing system, it can determine the CPU according to one or more of the CPU utilization rate, memory utilization rate, number of available tunnels, and number of available sessions. Then, based on the PRI value of each CPU, the standby CPU corresponding to each CPU is determined.
S706:主控板存储各CPU与对应备用CPU之间最新的映射关系。S706: The main control board stores the latest mapping relationship between each CPU and the corresponding standby CPU.
确定出各CPU与对应备用CPU之间的映射关系之后,主控板可以对该映射关系进行存储。可以理解的是,每当各CPU上报一次资源空余信息,主控板就会确定出一种映射关系,但因为主控板在为一个被保护CPU确定备用CPU的时候,总是依据当前最新的映射关系,因此主控板在存储映射关系的时候,可以进行覆盖式的存储,即总是以最新的映射关系覆盖前一次获取到的映射关系,这样可以减少映射关系存储对主控板侧的存储资源的占用。After determining the mapping relationship between each CPU and the corresponding standby CPU, the main control board can store the mapping relationship. It is understandable that every time each CPU reports the resource vacancy information, the main control board will determine a mapping relationship, but because the main control board determines the backup CPU for a protected CPU, it always depends on the current latest Mapping relationship, so when the main control board stores the mapping relationship, it can perform overwriting storage, that is, always overwrite the previous mapping relationship with the latest mapping relationship, which can reduce the impact of the mapping relationship storage on the main control board side. Occupation of storage resources.
S708:主控板接收被保护CPU发送的隧道信息与会话信息。S708: The main control board receives the tunnel information and the session information sent by the protected CPU.
当主控板接收到一个CPU发送的隧道信息与会话信息后,可以确定该CPU应该是故障了,无法继续处理其自身业务,因此,主控板确定该CPU是当前的被保护CPU。When the main control board receives the tunnel information and session information sent by a CPU, it can determine that the CPU should be faulty and cannot continue to process its own services. Therefore, the main control board determines that the CPU is the current protected CPU.
S710:主控板根据存储的映射关系查询该被保护CPU对应的备用CPU。S710: The main control board queries the backup CPU corresponding to the protected CPU according to the stored mapping relationship.
由于在此之前主控板已经确定除了分布式业务处理系统中各CPU对应的被用CPU,因此,主控板确定被保护CPU之后,可以通过查询存储的映射关系确定出该被保护CPU对应的备用CPU是哪一个。Since the main control board has determined that except for the used CPU corresponding to each CPU in the distributed service processing system, after the main control board determines the protected CPU, it can determine the corresponding protected CPU by querying the stored mapping relationship Which is the spare CPU?
S712:主控板判断备用CPU的资源空余是否足以承载被保护CPU的全部业务。S712: The main control board judges whether the spare CPU resources are sufficient to carry all the services of the protected CPU.
查询到被保护CPU对应的备用CPU之后,主控板可以确定备用CPU的资源空余是否足以承载被保护CPU的全部业务,若判断结果为是,则进入S714,否则,进入S716。After querying the backup CPU corresponding to the protected CPU, the main control board can determine whether the spare CPU resources are enough to carry all the services of the protected CPU. If the judgment result is yes, then go to S714, otherwise, go to S716.
S714:主控板根据被保护CPU的隧道信息与会话信息将被保护CPU的全部业务均下发给备用CPU。S714: The main control board delivers all services of the protected CPU to the standby CPU according to the tunnel information and session information of the protected CPU.
如果主控板为被保护CPU所选择的备用CPU的资源空余较大,备用CPU的资源空余足以承载被保护CPU的全部业务,那么主控板可以直接将被保护CPU的全部业务均下发给备用CPU。If the main control board has a large spare CPU selected for the protected CPU, and the spare CPU has enough resources to carry all the services of the protected CPU, the main control board can directly send all the services of the protected CPU to Standby CPU.
S716:主控板确定被保护CPU上各隧道对应的保护敏感度。S716: The main control board determines the protection sensitivity corresponding to each tunnel on the protected CPU.
如果主控板所选择出来的备用CPU的资源空余不多,只能在处理其自身业务的同时承担被保护CPU上部分业务的处理,那么主控板就需要从被保护CPU的业务中进行挑选,仅筛选出部分业务下发给备用CPU。If the spare CPU selected by the main control board has insufficient resources and can only handle part of the business on the protected CPU while processing its own business, the main control board needs to select from the business of the protected CPU , Only select part of the business and send it to the standby CPU.
在本实施例中,主控板基于被保护CPU上各隧道对应的保护敏感度来选择下发给备用CPU的业务。所以,当主控板确定备用CPU的资源空余不足以承载被保护CPU的全部业务后,主控板会计算被保护CPU上各隧道对应的保护敏感度。例如,主控板根据Sent(Tid)=Ts*Tk*Tn公式计算各隧道对应的保护敏感度。In this embodiment, the main control board selects the service to be issued to the standby CPU based on the protection sensitivity corresponding to each tunnel on the protected CPU. Therefore, when the main control board determines that the spare CPU resources are insufficient to carry all the services of the protected CPU, the main control board calculates the protection sensitivity corresponding to each tunnel on the protected CPU. For example, the main control board calculates the protection sensitivity corresponding to each tunnel according to the formula Sent(Tid)=Ts*Tk*Tn.
S718:主控板根据备用CPU的资源空余按照隧道保护敏感度从高到低的顺序从被保护CPU的业务中选择出备用CPU可以承载的部分。S718: The main control board selects the part that can be carried by the standby CPU from the services of the protected CPU in the order of tunnel protection sensitivity according to the resource vacancy of the standby CPU.
确定出各隧道对应的保护敏感度之后,主控板可以根据备用CPU的资源空余按照隧道保护敏感度从高到低的顺序从被保护CPU的业务中选择出备用CPU可以承载的部分。由于被保护CPU上各隧道中所承载的会话量并不是固定的,因此,主控板无法直接根据各隧道所承载的业务量决定选择多少个隧道中的业务。在本实施例的一种示例当中,主控板可以首先选择被保护CPU上保护敏感度值最高的一个隧道,判断备用CPU承载了该隧道中的全部业务之后,是否还有资源空余,若是,则主控板进一步选择保护敏感度值次高的隧道, 确定备用CPU进一步承载该隧道中的业务之后,是否还有资源空余承载其他隧道的业务……如此循环,直到备用CPU没有资源空余或者资源空余不足以承载某一隧道中的业务为止。After determining the protection sensitivity corresponding to each tunnel, the main control board can select the part that can be carried by the backup CPU from the services of the protected CPU in the order of the protection sensitivity of the tunnel according to the resource vacancy of the backup CPU. Since the amount of sessions carried in each tunnel on the protected CPU is not fixed, the main control board cannot directly determine how many tunnels are selected based on the amount of traffic carried by each tunnel. In an example of this embodiment, the main control board may first select the tunnel with the highest protection sensitivity value on the protected CPU, and determine whether there are resources left after the backup CPU carries all the services in the tunnel. If so, Then the main control board further selects the tunnel with the second highest protection sensitivity value, and determines whether there are free resources to carry the services of other tunnels after the backup CPU further carries the services in the tunnel... and so on, until the backup CPU has no resources or resources The vacancy is not enough to carry the business in a certain tunnel.
S720:主控板将选择出的业务下发给备用CPU。S720: The main control board delivers the selected service to the standby CPU.
主控板选择出业务之后,将选择出的业务下发给备用CPU,让备用CPU对下发的业务进行处理。After the main control board selects the business, it delivers the selected business to the standby CPU, and the standby CPU can process the delivered business.
S722:主控板监测被保护CPU是否恢复。S722: The main control board monitors whether the protected CPU is restored.
在主控板将被保护CPU的业务全部或部分下发到备用CPU上之后,主控板可以对被保护CPU的状态进行监测,确定被保护CPU的状态是否已经恢复,若判断结果为是,则进入S724,否则继续执行S722。After the main control board delivers all or part of the services of the protected CPU to the standby CPU, the main control board can monitor the state of the protected CPU to determine whether the state of the protected CPU has been restored. If the judgment result is yes, Then enter S724, otherwise continue to execute S722.
在本实施例的一些示例当中,主控板可以定时向被保护CPU发送转状态询问信息,根据被保护CPU的反馈确定被保护CPU的状态。在本实施例的另外一些示例当中,被保护CPU可以在自己状态恢复之后主动向主控板上报自己状态恢复的信息。In some examples of this embodiment, the main control board may periodically send the status query information to the protected CPU, and determine the status of the protected CPU according to the feedback of the protected CPU. In some other examples of this embodiment, the protected CPU may actively report the information of its state restoration to the main control board after its state is restored.
S724:主控板将被保护CPU的业务切回被保护CPU上。S724: The main control board switches the services of the protected CPU back to the protected CPU.
当主控板确定被保护CPU的状态恢复之后,可以将原本属于被保护CPU的业务切回到被保护CPU上处理,这些业务包括备用CPU额外承载的业务,也包括因为备用CPU资源空余优先而没有被主控板下发给备用因为的业务。When the main control board determines that the state of the protected CPU is restored, it can switch the services that originally belonged to the protected CPU back to the protected CPU for processing. These services include the services carried by the spare CPU as well as the priority due to spare CPU resources. The business that has not been issued to the standby by the main control board.
本实施例提供的业务保护方法,主控板通过预先为分布式业务处理系统中各CPU确定备用CPU,因此,当一个CPU出现故障的时候,主控板能够快速地查询出该CPU的备用CPU,从而在CPU故障后,尽快实现该故障CPU上业务的迁移,避免了业务长时间中断,影响用户体验的问题。In the service protection method provided in this embodiment, the main control board determines the spare CPU for each CPU in the distributed service processing system in advance. Therefore, when a CPU fails, the main control board can quickly query the spare CPU of the CPU , So that after a CPU failure, the business migration on the failed CPU is realized as soon as possible, avoiding long-term business interruption and user experience problems.
实施例三:Example three:
本实施例提供一种存储介质,该存储介质中可以存储有一个或多个可供一个或多个处理器读取、编译并执行的计算机程序,在本实施例中,该存储介质可以存储有第一业务保护程序、第二业务保护程序和第三业务保护程序中的至少一个,其中,第一业务保护程序可供一个或多个处理器执行实现前述实施例介绍的任意一种业务保护方法被保护CPU侧的流程,第二业务保护程序可供一个或多个处理器执行实现前述实施例介绍的任意一种业务保护方法主控板侧的流程,第三业务保护程序可供一个或多个处理器执行实现前述实施例介绍的任意一种业务保护方法备用CPU侧的流程。This embodiment provides a storage medium that can store one or more computer programs that can be read, compiled, and executed by one or more processors. In this embodiment, the storage medium can store At least one of the first service protection program, the second service protection program, and the third service protection program, wherein the first service protection program can be executed by one or more processors to implement any of the service protection methods introduced in the foregoing embodiments The process on the side of the protected CPU, the second service protection program can be executed by one or more processors to implement any one of the service protection methods introduced in the foregoing embodiment, the process on the main control board side, and the third service protection program can be used for one or more Each processor executes the process on the standby CPU side that implements any of the service protection methods introduced in the foregoing embodiments.
另外,本实施例提供一种网络设备,如图8所示:网络设备80包括处理器81、存储器82以及用于连接处理器81与存储器82的通信总线83,其中存储器82可以为前述存储有第一业务保护程序的存储介质。处理器81可以读取第一业务保护程序,进行编译并执行实现前述实施例中介绍的业务保护方法中被保护CPU侧的流程:In addition, this embodiment provides a network device, as shown in FIG. 8: the network device 80 includes a processor 81, a memory 82, and a communication bus 83 for connecting the processor 81 and the memory 82, where the memory 82 may be the aforementioned storage The storage medium of the first business protection program. The processor 81 may read the first service protection program, compile it, and execute the process on the protected CPU side in the service protection method introduced in the foregoing embodiment:
处理器81收集本CPU上的隧道信息与会话信息,其中,隧道信息为本CPU所承载隧道的信息,会话信息为隧道所承载会话的信息;The processor 81 collects tunnel information and session information on the CPU, where the tunnel information is the information of the tunnel carried by the CPU, and the session information is the information of the session carried by the tunnel;
处理器81将隧道信息与会话信息发送给主控板,这些隧道信息与会话信息用于主控板将本CPU上的业务发送给备用CPU,供备用CPU继续处理本CPU上的业务。The processor 81 sends the tunnel information and session information to the main control board. The tunnel information and the session information are used by the main control board to send the services on the CPU to the standby CPU for the standby CPU to continue processing the services on the CPU.
在本实施例的一些示例当中,处理器81可以本端CPU需要进行复位时,收集本CPU上的隧道信息与会话信息。In some examples of this embodiment, the processor 81 may collect tunnel information and session information on the local CPU when the local CPU needs to be reset.
处理器81还可以读取第二业务保护程序,进行编译并执行实现前述实施例中介绍的业务保护方法中主控板侧的流程:The processor 81 may also read the second service protection program, compile and execute the process on the main control board side in the service protection method introduced in the foregoing embodiment:
处理器81接收被保护CPU发送的隧道信息与会话信息,然后确定被保护CPU对应的备用CPU,并根据隧道信息与会话信息将被保护CPU的业务发送给备用CPU。The processor 81 receives the tunnel information and session information sent by the protected CPU, then determines the backup CPU corresponding to the protected CPU, and sends the services of the protected CPU to the backup CPU according to the tunnel information and the session information.
在本实施例的一些示例当中,备用CPU是根据分布式业务处理系统中各CPU的资源空余信息确定的,资源空余信息能够表征CPU的资源空余。In some examples of this embodiment, the standby CPU is determined according to the resource vacancy information of each CPU in the distributed service processing system, and the resource vacancy information can represent the resource vacancy of the CPU.
一个CPU的资源空余信息根据CPU的CPU利用率、内存使用率、可用隧道数以及可用会话数中的一个或多个确定。The resource vacancy information of a CPU is determined according to one or more of the CPU utilization rate, the memory usage rate, the number of available tunnels, and the number of available sessions.
在本实施例的一种示例当中,处理器81接收被保护CPU发送的隧道信息与会话信息之前,还会周期性确定分布式业务处理系统中各CPU的资源空余信息,然后根据最近一次获取的各CPU的资源空余信息为各CPU确定出对应的备用CPU,并存储各CPU与对应备用CPU间的映射关系。在需要确定被保护CPU对应的备用CPU时,根据映射关系查询被保护CPU对应的备用CPU。In an example of this embodiment, before the processor 81 receives the tunnel information and session information sent by the protected CPU, it also periodically determines the resource vacancy information of each CPU in the distributed service processing system, and then according to the latest obtained The resource vacancy information of each CPU determines the corresponding backup CPU for each CPU, and stores the mapping relationship between each CPU and the corresponding backup CPU. When it is necessary to determine the backup CPU corresponding to the protected CPU, query the backup CPU corresponding to the protected CPU according to the mapping relationship.
在本实施例的一种示例当中,处理器81接收被保护CPU发送的隧道信息与会话信息之前,还会先周期性确定分布式业务处理系统中各CPU的资源空余信息。在需要确定被保护CPU对应的备用CPU时,根据最近一次获取的各CPU的资源空余信息为被保护CPU确定出对应的备用CPU。In an example of this embodiment, before the processor 81 receives the tunnel information and session information sent by the protected CPU, it also periodically determines the resource vacancy information of each CPU in the distributed service processing system. When it is necessary to determine the backup CPU corresponding to the protected CPU, the corresponding backup CPU is determined for the protected CPU according to the resource vacancy information of each CPU obtained last time.
在本实施例的另一种示例当中,处理器81在确定被保护CPU对应的备用CPU时,会向分布式业务处理系统中除被保护CPU以外的其他CPU发送空余信息请求,然后接收各CPU根据空余信息请求上报的自身的资源空余信息并根据各CPU的资源空余信息为被保护CPU确定出对应的备用CPU。In another example of this embodiment, when the processor 81 determines the backup CPU corresponding to the protected CPU, it sends a request for spare information to other CPUs in the distributed service processing system except the protected CPU, and then receives each CPU According to the vacant information request, report its own resource vacancy information and determine the corresponding spare CPU for the protected CPU according to the resource vacancy information of each CPU.
另外,处理器81根据隧道信息与会话信息将被保护CPU的业务发送给备用CPU之后,还会在被保护CPU恢复正常运行状态之后,将属于被保护CPU的业务切回被保护CPU。In addition, after the processor 81 sends the services of the protected CPU to the backup CPU according to the tunnel information and the session information, after the protected CPU returns to the normal operating state, it switches the services belonging to the protected CPU back to the protected CPU.
在本实施例中,处理器81会根据备用CPU的资源空余信息确定备用CPU的资源空余是否足以承载被保护CPU上的全部业务;若否,则对被保护CPU的业务进行筛选,并将筛选保留的业务下发给备用CPU。若确定备用CPU的资源空余足以承载被保护CPU上的全部业务,则处理器81直接将被保护CPU的全部业务下发给备用CPU。In this embodiment, the processor 81 will determine whether the resource vacancy of the backup CPU is sufficient to carry all the services on the protected CPU according to the resource vacancy information of the backup CPU; if not, it will screen the services of the protected CPU and will screen it The reserved business is delivered to the standby CPU. If it is determined that the spare CPU resources are sufficient to carry all the services on the protected CPU, the processor 81 directly issues all the services of the protected CPU to the spare CPU.
在一些实施例中,处理器81以隧道为单位确定各隧道对应的保护敏感度,保护敏感度表征隧道中业务被保护的需求度,保护敏感度越高,则隧道中业务被保护的需求度越高;确定出保护敏感度后,处理器81根据备用CPU的资源空余按照保护敏感度从高到低的顺序选择保留的业务。In some embodiments, the processor 81 uses the tunnel as a unit to determine the protection sensitivity corresponding to each tunnel. The protection sensitivity represents the degree of protection required for the services in the tunnel. The higher the protection sensitivity, the degree of protection required for the services in the tunnel. The higher is; after the protection sensitivity is determined, the processor 81 selects reserved services in the order of protection sensitivity from high to low according to the spare CPU resource vacancy.
例如,处理器81可以根据隧道的隧道状态Ts、隧道保活时间Tk、隧道内的会话量Tn确定各隧道的保护敏感度。For example, the processor 81 may determine the protection sensitivity of each tunnel according to the tunnel state Ts of the tunnel, the tunnel keep-alive time Tk, and the session volume Tn in the tunnel.
处理器81还可以读取第三业务保护程序,进行编译并执行实现前述实施例中介绍的业务保护方法中备用CPU侧的流程:The processor 81 may also read the third service protection program, compile and execute the process on the standby CPU side in the service protection method introduced in the foregoing embodiment:
处理器81向主控板上报本CPU的资源空余信息,然后接收主控板发送的被保护CPU的业务,并对被保护CPU的业务进行处理。The processor 81 reports the resource vacancy information of the CPU to the main control board, and then receives the services of the protected CPU sent by the main control board, and processes the services of the protected CPU.
本实施例还提供一种分布式业务处理系统,其中包括主控板以及多个CPU,主控板为上述处理器81执行第二业务保护程序的网络设备,多个CPU中的部分为上述处理器81执 行第一业务保护程序的网络设备,部分为处理器81执行第三业务保护程序的网络设备。This embodiment also provides a distributed service processing system, which includes a main control board and multiple CPUs. The main control board is a network device for the processor 81 to execute the second service protection program, and part of the multiple CPUs is the aforementioned processing The processor 81 is a network device that executes the first service protection program, and part of it is a network device that the processor 81 executes the third service protection program.
本实施例提供的网络设备及分布式业务处理系统,当一个CPU不能继续进行业务处理的时候,其可以将自己的隧道信息与会话信息发送给主控板,让主控板从分布式业务处理系统中的其他CPU中为自己确定一个备用CPU,让备用CPU继续对自己的业务进行处理,避免因为自己的原因而导致自身所承载的全部业务均中断,进而影响用户体验的情况,有利于提升分布式业务处理系统的容灾性能,增强系统稳定性,提升用户的业务体验。In the network equipment and distributed service processing system provided in this embodiment, when a CPU cannot continue to process service, it can send its own tunnel information and session information to the main control board, allowing the main control board to process distributed services from Determine a spare CPU for yourself among other CPUs in the system, and let the spare CPU continue to process your own services, so as to avoid interruption of all services carried by itself due to your own reasons, which will affect the user experience, which is beneficial to improvement The disaster tolerance performance of the distributed business processing system enhances system stability and enhances the user's business experience.
显然,本领域的技术人员应该明白,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件(可以用计算装置可执行的程序代码来实现)、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM,ROM,EEPROM、闪存或其他存储器技术、CD-ROM,数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。所以,本发明不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that all or some of the steps in the method disclosed above, the functional modules/units in the system, and the device can be implemented as software (which can be implemented by program code executable by a computing device) , Firmware, hardware and their appropriate combination. In hardware implementations, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may consist of several physical components. The components are executed cooperatively. Some physical components or all physical components can be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on a computer-readable medium and executed by a computing device, and in some cases, the steps shown or described may be executed in a different order than here. The computer-readable medium may include computer storage Medium (or non-transitory medium) and communication medium (or temporary medium). As is well known to those of ordinary skill in the art, the term computer storage medium includes volatile and non-volatile memory implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Flexible, removable and non-removable media. Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassette, tape, magnetic disk storage or other magnetic storage devices, or Any other medium used to store desired information and that can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media . Therefore, the present invention is not limited to any specific combination of hardware and software.
以上内容是结合具体的实施方式对本发明实施例所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。The above content is a further detailed description of the embodiments of the present invention in combination with specific implementations, and it cannot be considered that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field to which the present invention belongs, several simple deductions or substitutions can be made without departing from the concept of the present invention, which should be regarded as falling within the protection scope of the present invention.

Claims (17)

  1. 一种业务保护方法,包括:A method of business protection, including:
    收集本CPU上的隧道信息与会话信息,所述隧道信息为本CPU所承载隧道的信息,所述会话信息为所述隧道所承载会话的信息;Collect tunnel information and session information on the CPU, where the tunnel information is information of a tunnel carried by the CPU, and the session information is information of a session carried by the tunnel;
    将所述隧道信息与所述会话信息发送给主控板,所述隧道信息与所述会话信息用于所述主控板将本CPU上的业务发送给备用CPU,供所述备用CPU继续处理本CPU上的业务。The tunnel information and the session information are sent to the main control board, and the tunnel information and the session information are used by the main control board to send the services on the CPU to the standby CPU for the standby CPU to continue processing Business on this CPU.
  2. 如权利要求1所述的业务保护方法,其中,所述收集本CPU上的隧道信息与会话信息包括:The service protection method according to claim 1, wherein the collecting tunnel information and session information on the CPU includes:
    在本CPU需要进行复位时,收集本CPU上的隧道信息与会话信息。When the CPU needs to be reset, collect tunnel information and session information on the CPU.
  3. 一种业务保护方法,包括:A method of business protection, including:
    接收被保护CPU发送的隧道信息与会话信息,所述隧道信息为所述被保护CPU所承载的隧道的信息,所述会话信息为所述隧道所承载会话的信息;Receiving tunnel information and session information sent by a protected CPU, where the tunnel information is information about a tunnel carried by the protected CPU, and the session information is information about a session carried by the tunnel;
    确定所述被保护CPU对应的备用CPU,所述备用CPU与所述被保护CPU属于同一分布式业务处理系统;Determine a backup CPU corresponding to the protected CPU, where the backup CPU and the protected CPU belong to the same distributed service processing system;
    根据所述隧道信息与所述会话信息将所述被保护CPU的业务发送给所述备用CPU。Sending the service of the protected CPU to the standby CPU according to the tunnel information and the session information.
  4. 如权利要求3所述的业务保护方法,其中,所述备用CPU根据所述分布式业务处理系统中各CPU的资源空余信息确定,所述资源空余信息能够表征CPU的资源空余。The service protection method according to claim 3, wherein the standby CPU is determined according to the resource vacancy information of each CPU in the distributed service processing system, and the resource vacancy information can represent the resource vacancy of the CPU.
  5. 如权利要求4所述的业务保护方法,其中,一个CPU的资源空余信息根据所述CPU的CPU利用率、内存使用率、可用隧道数以及可用会话数中的一个或多个确定。The service protection method of claim 4, wherein the resource vacancy information of a CPU is determined according to one or more of the CPU utilization rate, the memory usage rate, the number of available tunnels, and the number of available sessions of the CPU.
  6. 如权利要求4所述的业务保护方法,其中,所述接收被保护CPU发送的隧道信息与会话信息之前,还包括:The service protection method according to claim 4, wherein before receiving the tunnel information and session information sent by the protected CPU, the method further comprises:
    周期性确定所述分布式业务处理系统中各CPU的资源空余信息;Periodically determining the resource vacancy information of each CPU in the distributed service processing system;
    根据最近一次获取的各CPU的资源空余信息为各CPU确定出对应的备用CPU,并存储各CPU与对应备用CPU间的映射关系;Determine the corresponding spare CPU for each CPU according to the resource vacancy information of each CPU obtained last time, and store the mapping relationship between each CPU and the corresponding spare CPU;
    所述确定所述被保护CPU对应的备用CPU包括:The determining the backup CPU corresponding to the protected CPU includes:
    根据所述映射关系查询所述被保护CPU对应的备用CPU。Query the backup CPU corresponding to the protected CPU according to the mapping relationship.
  7. 如权利要求4所述的业务保护方法,其中,所述接收被保护CPU发送的隧道信息与会话信息之前,还包括:The service protection method according to claim 4, wherein before receiving the tunnel information and session information sent by the protected CPU, the method further comprises:
    周期性确定所述分布式业务处理系统中各CPU的资源空余信息;Periodically determining the resource vacancy information of each CPU in the distributed service processing system;
    所述确定所述被保护CPU对应的备用CPU包括:The determining the backup CPU corresponding to the protected CPU includes:
    根据最近一次获取的各CPU的资源空余信息为所述被保护CPU确定出对应的备用CPU。The corresponding spare CPU is determined for the protected CPU according to the resource vacancy information of each CPU acquired last time.
  8. 如权利要求4所述的业务保护方法,其中,所述确定所述被保护CPU对应的备用CPU包括:The service protection method according to claim 4, wherein said determining the backup CPU corresponding to the protected CPU comprises:
    向所述分布式业务处理系统中除所述被保护CPU以外的其他CPU发送空余信息请求;Sending a request for spare information to CPUs other than the protected CPU in the distributed service processing system;
    接收各CPU根据所述空余信息请求上报的自身的资源空余信息;Receiving its own resource vacancy information reported by each CPU according to the vacancy information request;
    根据各CPU的资源空余信息为所述被保护CPU确定出对应的备用CPU。The corresponding spare CPU is determined for the protected CPU according to the resource free information of each CPU.
  9. 如权利要求3所述的业务保护方法,其中,所述根据所述隧道信息与所述会话信息将所述被保护CPU的业务发送给所述备用CPU之后,还包括:The service protection method according to claim 3, wherein after the sending the service of the protected CPU to the backup CPU according to the tunnel information and the session information, the method further comprises:
    在所述被保护CPU恢复正常运行状态之后,将属于所述被保护CPU的业务切回所述被保护CPU。After the protected CPU returns to the normal operating state, the services belonging to the protected CPU are switched back to the protected CPU.
  10. 如权利要求3-9任一项所述的业务保护方法,其中,所述根据所述隧道信息与所述会话信息将所述被保护CPU的业务发送给所述备用CPU包括:9. The service protection method according to any one of claims 3-9, wherein the sending the service of the protected CPU to the backup CPU according to the tunnel information and the session information comprises:
    根据所述备用CPU的资源空余信息确定所述备用CPU的资源空余是否足以承载所述被保护CPU上的全部业务;Determining, according to the resource vacancy information of the backup CPU, whether the resource vacancy of the backup CPU is sufficient to carry all the services on the protected CPU;
    若否,则对所述被保护CPU的业务进行筛选,并将筛选保留的业务下发给所述备用CPU。If not, the services of the protected CPU are screened, and the services reserved by the screening are delivered to the standby CPU.
  11. 如权利要求10所述的业务保护方法,其中,所述对所述被保护CPU的业务进行筛选包括:The service protection method according to claim 10, wherein the screening of the service of the protected CPU comprises:
    以隧道为单位确定各隧道对应的保护敏感度,所述保护敏感度表征所述隧道中业务被保护的需求度,所述保护敏感度越高,则所述隧道中业务被保护的需求度越高;The protection sensitivity corresponding to each tunnel is determined in units of tunnels. The protection sensitivity represents the degree of protection required for the services in the tunnel. The higher the protection sensitivity, the greater the degree of protection required for the services in the tunnel. high;
    根据所述备用CPU的资源空余按照保护敏感度从高到低的顺序选择保留的业务。The reserved services are selected in the order of protection sensitivity from high to low according to the resource vacancy of the standby CPU.
  12. 如权利要求11所述的业务保护方法,其中,所述以隧道为单位确定各隧道对应的保护敏感度包括:The service protection method according to claim 11, wherein said determining the protection sensitivity corresponding to each tunnel by using a tunnel as a unit comprises:
    根据隧道的隧道状态Ts、隧道保活时间Tk、隧道内的会话量Tn确定各隧道的保护敏感度。The protection sensitivity of each tunnel is determined according to the tunnel state Ts, the tunnel keep-alive time Tk, and the session volume Tn in the tunnel.
  13. 如权利要求10所述的业务保护方法,其中,若确定所述备用CPU的资源空余足以承载所述被保护CPU上的全部业务,则所述业务保护方法还包括:直接将所述被保护CPU的全部业务下发给所述备用CPU。The service protection method according to claim 10, wherein if it is determined that the spare CPU resources are sufficient to carry all the services on the protected CPU, the service protection method further comprises: directly connecting the protected CPU All the services of is delivered to the standby CPU.
  14. 一种业务保护方法,包括:A method of business protection, including:
    向主控板上报本CPU的资源空余信息;Report the CPU's resource free information to the main control board;
    接收所述主控板发送的被保护CPU的业务;Receiving the protected CPU service sent by the main control board;
    对所述被保护CPU的业务进行处理。Process the services of the protected CPU.
  15. 一种网络设备,包括处理器、存储器及通信总线,其中:A network device including a processor, a memory and a communication bus, in which:
    所述通信总线用于实现处理器和存储器之间的连接通信;The communication bus is used to realize connection and communication between the processor and the memory;
    所述处理器用于执行存储器中存储的第一业务保护程序,以实现如权利要求1或2所述的业务保护方法的步骤;或,所述处理器用于执行存储器中存储的第二业务保护程序,以实现如权利要求3-13任一项所述的业务保护方法的步骤;所述处理器用于执行存储器中存储的第三业务保护程序,以实现如权利要求14所述的业务保护方法的步骤。The processor is configured to execute the first service protection program stored in the memory to implement the steps of the service protection method according to claim 1 or 2; or, the processor is configured to execute the second service protection program stored in the memory , To implement the steps of the service protection method according to any one of claims 3-13; the processor is used to execute the third service protection program stored in the memory to implement the service protection method according to claim 14 step.
  16. 一种分布式业务处理系统,包括主控板以及多个CPU,其中,所述主控板为权利要求15中处理器执行第二业务保护程序的网络设备,所述多个CPU中的部分为权利要求15中处理器执行第一业务保护程序的网络设备,部分为权利要求15中处理器执行第三业务保护程序的网络设备。A distributed service processing system, comprising a main control board and multiple CPUs, wherein the main control board is a network device in which the processor in claim 15 executes a second service protection program, and some of the multiple CPUs are The network device in which the processor in claim 15 executes the first service protection program is part of the network device in which the processor in claim 15 executes the third service protection program.
  17. 一种存储介质,存储有第一业务保护程序、第二业务保护程序以及第三业务保护程序中的至少一个,其中,所述第一业务保护程序可被一个或者多个处理器执行,以实现如权利要求1或2所述的业务保护方法的步骤;所述第二业务保护程序可被一个或者多个处理器执行,以实现如权利要求3-13任一项所述的业务保护方法的步骤;所述第三业务保护程序可被一个或者多个处理器执行,以实现如权利要求14所述的业务保护方法的步 骤。A storage medium that stores at least one of a first service protection program, a second service protection program, and a third service protection program, wherein the first service protection program can be executed by one or more processors to implement The steps of the service protection method according to claim 1 or 2; the second service protection program can be executed by one or more processors to implement the service protection method according to any one of claims 3-13 Step; The third service protection program can be executed by one or more processors to implement the steps of the service protection method according to claim 14.
PCT/CN2020/088318 2019-07-01 2020-04-30 Service protection method, network device, distributed service processing system, and storage medium WO2021000647A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910586539.2 2019-07-01
CN201910586539.2A CN112187494A (en) 2019-07-01 2019-07-01 Service protection method, network equipment and distributed service processing system

Publications (1)

Publication Number Publication Date
WO2021000647A1 true WO2021000647A1 (en) 2021-01-07

Family

ID=73914262

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/088318 WO2021000647A1 (en) 2019-07-01 2020-04-30 Service protection method, network device, distributed service processing system, and storage medium

Country Status (2)

Country Link
CN (1) CN112187494A (en)
WO (1) WO2021000647A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104025522A (en) * 2012-01-09 2014-09-03 瑞典爱立信有限公司 Expanding network functionalities for openflow based split-architecture networks
US20150006953A1 (en) * 2013-06-28 2015-01-01 Hugh W. Holbrook System and method of a hardware shadow for a network element
US20180013588A1 (en) * 2015-11-02 2018-01-11 International Business Machines Corporation Distributed virtual gateway appliance
CN109716293A (en) * 2016-09-21 2019-05-03 高通股份有限公司 Distributed branch is executed using fusion treatment device core in a processor-based system to predict

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104025522A (en) * 2012-01-09 2014-09-03 瑞典爱立信有限公司 Expanding network functionalities for openflow based split-architecture networks
US20150006953A1 (en) * 2013-06-28 2015-01-01 Hugh W. Holbrook System and method of a hardware shadow for a network element
US20180013588A1 (en) * 2015-11-02 2018-01-11 International Business Machines Corporation Distributed virtual gateway appliance
CN109716293A (en) * 2016-09-21 2019-05-03 高通股份有限公司 Distributed branch is executed using fusion treatment device core in a processor-based system to predict

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIANG, ZHEN: "Research of Key Technologies for Building Distributed System Based on Multi-core Processors", CHINA MASTER’S THESES FULL-TEXT DATABASE, 15 April 2012 (2012-04-15), XP055772557 *

Also Published As

Publication number Publication date
CN112187494A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN109344014B (en) Main/standby switching method and device and communication equipment
US20030037165A1 (en) Dynamic load sharing system using a virtual router
TWI701916B (en) Method and device for self-recovering management ability in distributed system
CN107508694B (en) Node management method and node equipment in cluster
US11546215B2 (en) Method, system, and device for data flow metric adjustment based on communication link state
WO2012083669A1 (en) Method and apparatus for switching between primary-standby devices based on access gateway
CN111355649A (en) Flow reinjection method, device and system
WO2019148716A1 (en) Data transmission method, server, and storage medium
EP3622670B1 (en) Connectivity monitoring for data tunneling between network device and application server
KR101586354B1 (en) Communication failure recover method of parallel-connecte server system
WO2018103665A1 (en) L2tp-based device management method, apparatus and system
US8614943B2 (en) Method and apparatus for protecting subscriber access network
EP3618350A1 (en) Protection switching method, device and system
US7519855B2 (en) Method and system for distributing data processing units in a communication network
US11611816B2 (en) Service data processing method and device
US10205630B2 (en) Fault tolerance method for distributed stream processing system
US8370897B1 (en) Configurable redundant security device failover
WO2021000647A1 (en) Service protection method, network device, distributed service processing system, and storage medium
EP1867081A1 (en) Distributed redundancy capacity licensing in a telecommunication network element
CN112995054B (en) Flow distribution method and device, electronic equipment and computer readable medium
EP3435615B1 (en) Network service implementation method, service controller, and communication system
WO2021057350A1 (en) Flexible ethernet link failure response method, apparatus, device and medium
CN112948177A (en) Disaster recovery backup method and device, electronic equipment and storage medium
WO2022083503A1 (en) Data processing method and device
CN111984376B (en) Protocol processing method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20834400

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20834400

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.05.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20834400

Country of ref document: EP

Kind code of ref document: A1