CN105703950B - Fault-tolerant method for equipment out-of-service caused by control plane abnormality - Google Patents
Fault-tolerant method for equipment out-of-service caused by control plane abnormality Download PDFInfo
- Publication number
- CN105703950B CN105703950B CN201610051450.2A CN201610051450A CN105703950B CN 105703950 B CN105703950 B CN 105703950B CN 201610051450 A CN201610051450 A CN 201610051450A CN 105703950 B CN105703950 B CN 105703950B
- Authority
- CN
- China
- Prior art keywords
- management
- configuration
- network element
- security mode
- service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a fault-tolerant method for equipment disconnection caused by control plane abnormality, which comprises the following steps: adding a management security mode for accessing a management communication network in network element equipment; decoupling management configuration and service configuration; when the triggering condition of the management security mode is formed, the network element equipment enters the management security mode, the service configuration is omitted from starting, and only the management configuration is loaded, so that the system recovers the communication capability of the management communication network of the network element equipment. The invention enters the management security mode by introducing the concept of the management security mode and defining the condition of triggering the management security mode, separates the management configuration and the service configuration in the system integrating control and management, ensures that the system can recover the management configuration of the network element and the communication capability of the network element of the management communication network after the system fails, avoids the offline of the network element, and reduces the maintenance cost of the network element when the network element is offline.
Description
Technical Field
The invention relates to a management communication network, in particular to a fault-tolerant method for equipment disconnection caused by control plane abnormality.
Background
The management communication network adopts an independent logic path to realize remote management of the equipment by an operator; the equipment is used as a network element and needs to communicate with an adjacent network element in a management communication network, and the equipment and a network management server realize remote intercommunication through routing hop-by-hop release;
the conventional network element equipment realizes the function of managing network communication and can realize an independent management plane through an independent protocol instance or process; in this case, if the control plane fails, the management plane is independent, so that the control plane can be influenced by the failure of the control plane, and the risk of pipe falling is reduced; however, when the conventional network element device realizes the network communication management function, the control plane and the management plane have more information to share and interact with each other, so the complexity of realizing part of functions is higher; meanwhile, as communication networks develop towards the trends of reducing code maintenance, unifying protocol stacks, platforms and the like, the gradual integration of a management plane and a control plane in the technical development process becomes a trend; under the fusion condition, when the control plane has problems and the system can not be recovered due to restart, the management plane is affected, and the network element of the remote management is continuously managed, so that the network element has to be taken off the station for maintenance, and the operation and maintenance cost is increased.
Disclosure of Invention
The invention aims to solve the technical problem that under the condition that a management plane and a control plane are fused, the network element of remote management of the management plane is continuously out of management and cannot be recovered due to failure of the control plane.
In order to solve the technical problem, the technical scheme adopted by the invention is to provide a fault-tolerant method for equipment disconnection caused by control plane abnormality, which comprises the following steps:
adding a management security mode for accessing a management communication network in network element equipment;
decoupling management configuration and service configuration;
when the triggering condition of the management security mode is formed, the network element equipment enters the management security mode, the service configuration is omitted from starting, and only the management configuration is loaded, so that the system recovers the communication capability of the management communication network of the network element equipment.
In the above method, the triggering condition for managing the security mode includes but is not limited to:
the control plane is restarted due to software exception, and the restarting times exceed the threshold times;
and after the control plane is started, the service configuration check fails.
In the method, after the network element equipment enters the management security mode, the service configuration is not checked with the service single disk.
In the method, the system of the network element equipment is normally started, and the service configuration and the management configuration are loaded in sequence respectively.
The invention enters the management security mode by introducing the concept of the management security mode and defining the condition of triggering the management security mode, separates the management configuration and the service configuration in the system integrating control and management, only loads the management configuration after the system fails, recovers the management configuration of the network element and the communication capability of the network element of the management communication network under the condition of not influencing the existing service configuration, avoids the disconnection of the network element, reduces the cost of the maintenance of the lower station after the disconnection of the network element to the maximum extent, and greatly reduces the problem of restarting the control plane caused by various false triggers of the service configuration because only loads the management configuration file after the system is started.
Drawings
FIG. 1 is a flow chart of a fault-tolerant method for controlling the out-of-service of a device caused by an abnormal plane according to the present invention;
FIG. 2 is a schematic diagram illustrating a process of loading a service configuration and a management configuration sequence in an unmanaged security mode according to the present invention;
FIG. 3 is a schematic diagram illustrating a process of loading a service configuration and a management configuration sequence in a management security mode according to the present invention;
FIG. 4 is a diagram illustrating a conventional configuration checking process;
FIG. 5 is a flow chart of the present invention in an embedded implementation environment.
Detailed Description
The invention is described in detail below with reference to the figures and specific examples.
The invention provides a fault-tolerant method for equipment disconnection caused by control plane abnormality, which comprises the following steps as shown in figure 1:
firstly, a management security mode for accessing a management communication network is added in network element equipment.
Because each operator has a respective management communication network protocol standard, the invention does not specifically describe the protocol standard required to be supported, and the management security mode abstracted in the network element equipment can recover the management communication network configuration of the network element and the network element communication capability of the management communication network.
Secondly, decoupling configuration data of a control and management common plane into management configuration and service configuration; generally, the service configuration and the management configuration of the network element device have a partial dependency relationship, for example, the management configuration depends on a default MCC interface to join the VPN, and if the requirement of recovering the interworking capability of the management communication network is to be met by performing no service configuration but only the management configuration, the requirement of decoupling the management configuration from the service configuration must be met in design.
The completely decoupled service configuration and management configuration are sequentially loaded when a normal system is started (in a non-management security mode), and the service configuration and management configuration are respectively recovered (as shown in fig. 2);
thirdly, when the triggering condition of the management security mode is formed, the network element equipment enters the management security mode, neglects the start of service configuration, only loads the management configuration, and restores the system to the state before the restart of the management configuration, and then realizes the restoration of the communication capability of the management network of the network element equipment after each service entity restores the work according to the corresponding configuration of the management network (as shown in fig. 3); the invention only loads the management configuration file, thereby greatly reducing the problems of restarting the control plane caused by various service configuration errors (including processing errors, abnormal error configuration, insufficient memory and the like after the triggered service alarm is reported).
Triggering conditions for managing the secure mode include, but are not limited to:
(1) the control plane is restarted due to software exception, and the restarting times exceed the threshold times; in practical application, an external triggered control plane software defect may cause an abnormal restart of a control plane system, and if the system cannot be normally recovered after restart and continuous restart occurs, a downstream network element device will be taken off management, so that the restart times exceeding a threshold number can be set as a trigger condition for entering a management security mode;
(2) after the control plane is started, the service configuration check fails; since the network element devices access the management communication network and have the management configuration of the specific management communication network, the configuration must depend on the basic system service configuration, if the service configuration is inconsistent with the expected management configuration, the network management may be out of management, and therefore, if the condition that the service configuration is inconsistent with the verification is identified, the management security mode needs to be entered.
In the present invention, since the network element device only loads the management configuration after entering the management security mode, the system lacks complete service configuration, and in order to avoid the influence on the service, the conventional configuration check cannot be performed with the service single disk, and the conventional configuration check process is as shown in fig. 4.
As shown in fig. 5, in an embedded implementation environment, a specific workflow of the fault-tolerant method for controlling plane exception to cause device unmanaging provided by the present invention is as follows:
s101, starting a system;
step S102, judging whether a triggering condition for entering a management safety mode is met, and if so, executing step S104; otherwise, executing step S103;
step S103, respectively loading service configuration and management configuration, and then executing step S107;
step S104, the equipment system enters a management security mode and only loads management configuration;
step S105, judging whether the management security mode is entered currently, if so, executing step S107, otherwise, executing step S106;
s106, after the configuration recovery is finished, configuration checking is carried out on the single disk according to requirements;
and step S107, ending the program.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (2)
1. A fault-tolerant method for equipment disconnection caused by control plane abnormality is characterized by comprising the following steps:
adding a management security mode for accessing a management communication network in network element equipment; the management security mode can recover the management communication network configuration of the network element and the management communication network element communication capability of the management communication network;
the control and management converged public plane service configuration and management configuration have a partial dependency relationship; decoupling configuration data of a common plane with control and management convergence into management configuration and service configuration; when the system of the network element equipment is normally started, the service configuration and the management configuration can be sequentially loaded respectively to complete the respective recovery of the service configuration and the management configuration;
when the triggering condition of the management security mode is formed, the network element equipment enters the management security mode, neglects the start of service configuration, only loads the management configuration, and restores the system to the state before the restart of the management configuration, and then after each service entity restores the work according to the corresponding configuration of the management network, the restoration of the management network communication capability of the network element equipment is realized, so that the system restores the management communication network communication capability of the network element equipment;
the triggering conditions for managing the security mode include, but are not limited to:
the control plane is restarted due to software exception, and the restarting times exceed the threshold times;
and after the control plane is started, the service configuration check fails.
2. The method of claim 1, wherein after the network element device enters the management security mode, the service configuration is not checked for configuration with the service single disk.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610051450.2A CN105703950B (en) | 2016-01-26 | 2016-01-26 | Fault-tolerant method for equipment out-of-service caused by control plane abnormality |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610051450.2A CN105703950B (en) | 2016-01-26 | 2016-01-26 | Fault-tolerant method for equipment out-of-service caused by control plane abnormality |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105703950A CN105703950A (en) | 2016-06-22 |
CN105703950B true CN105703950B (en) | 2020-04-21 |
Family
ID=56228609
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610051450.2A Active CN105703950B (en) | 2016-01-26 | 2016-01-26 | Fault-tolerant method for equipment out-of-service caused by control plane abnormality |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105703950B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102255799A (en) * | 2011-06-23 | 2011-11-23 | 中国人民解放军国防科学技术大学 | Internal network interface mapping method and device supporting separation of forwarding and control |
CN104270341A (en) * | 2014-09-03 | 2015-01-07 | 烽火通信科技股份有限公司 | A data protocol forwarding system and method in an IPRAN |
CN204291012U (en) * | 2014-12-19 | 2015-04-22 | 深圳市邦彦信息技术有限公司 | The MicroTCA platform that a kind of business datum is separated with management data |
CN204859198U (en) * | 2015-08-13 | 2015-12-09 | 国网智能电网研究院 | OLT device management system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8990365B1 (en) * | 2004-09-27 | 2015-03-24 | Alcatel Lucent | Processing management packets |
US8050219B2 (en) * | 2007-11-15 | 2011-11-01 | Telefonaktiebolaget Lm Ericsson (Publ) | Logical protocol architecture for wireless metropolitan area networks |
-
2016
- 2016-01-26 CN CN201610051450.2A patent/CN105703950B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102255799A (en) * | 2011-06-23 | 2011-11-23 | 中国人民解放军国防科学技术大学 | Internal network interface mapping method and device supporting separation of forwarding and control |
CN104270341A (en) * | 2014-09-03 | 2015-01-07 | 烽火通信科技股份有限公司 | A data protocol forwarding system and method in an IPRAN |
CN204291012U (en) * | 2014-12-19 | 2015-04-22 | 深圳市邦彦信息技术有限公司 | The MicroTCA platform that a kind of business datum is separated with management data |
CN204859198U (en) * | 2015-08-13 | 2015-12-09 | 国网智能电网研究院 | OLT device management system |
Non-Patent Citations (1)
Title |
---|
交换机数据平面、控制平面、管理平面分离设计——网络设备稳定的重要技术白皮书;福建星网锐捷网络有限公司;《道客巴巴》;20120715;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN105703950A (en) | 2016-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11194679B2 (en) | Method and apparatus for redundancy in active-active cluster system | |
RU2606053C2 (en) | Method of controlling change in state interconnection node | |
CN107508694B (en) | Node management method and node equipment in cluster | |
CN108429629A (en) | Equipment fault restoration methods and device | |
CN108984349B (en) | Method and device for electing master node, medium and computing equipment | |
CN104717077B (en) | A kind of method, apparatus and system for managing data center | |
CN114840495B (en) | Method, storage medium and equipment for preventing brain fracture of database cluster | |
CN109982065B (en) | Method, device and storage medium for equipment fault recovery in video monitoring network | |
US20040153704A1 (en) | Automatic startup of a cluster system after occurrence of a recoverable error | |
CN111026585B (en) | Storage server hot standby switching method in recording and broadcasting system | |
CN103559188B (en) | Metadata management method and management system | |
CN101980478A (en) | Method and device for detecting and processing equipment failures and network equipment | |
CN112486718B (en) | Database fault automatic switching method, device and computer storage medium | |
WO2024222514A1 (en) | Functional safety system and method for functional safety system | |
CN105703950B (en) | Fault-tolerant method for equipment out-of-service caused by control plane abnormality | |
CN108667640B (en) | Communication method and device, and network access system | |
CN113377702A (en) | Method and device for starting two-node cluster, electronic equipment and storage medium | |
WO2018137145A1 (en) | Data synchronization method and apparatus | |
CN104836679A (en) | Communication abnormity processing method and network element equipment | |
CN109617716B (en) | Data center exception handling method and device | |
CN111399978A (en) | OpenStack-based fault migration system and migration method | |
CN116719657A (en) | Firmware fault log generation method, device, server and readable medium | |
CN110572292B (en) | High availability system and method based on unidirectional transmission link | |
CN114185691A (en) | Sidecar-based UDP multicast method and system | |
CN102111313A (en) | Method and device for automatically recovering access user table |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |