CN105703950B - Fault-tolerant method for equipment out-of-service caused by control plane abnormality - Google Patents

Fault-tolerant method for equipment out-of-service caused by control plane abnormality Download PDF

Info

Publication number
CN105703950B
CN105703950B CN201610051450.2A CN201610051450A CN105703950B CN 105703950 B CN105703950 B CN 105703950B CN 201610051450 A CN201610051450 A CN 201610051450A CN 105703950 B CN105703950 B CN 105703950B
Authority
CN
China
Prior art keywords
management
configuration
network element
security mode
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610051450.2A
Other languages
Chinese (zh)
Other versions
CN105703950A (en
Inventor
丁毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fiberhome Telecommunication Technologies Co Ltd
Original Assignee
Fiberhome Telecommunication Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fiberhome Telecommunication Technologies Co Ltd filed Critical Fiberhome Telecommunication Technologies Co Ltd
Priority to CN201610051450.2A priority Critical patent/CN105703950B/en
Publication of CN105703950A publication Critical patent/CN105703950A/en
Application granted granted Critical
Publication of CN105703950B publication Critical patent/CN105703950B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a fault-tolerant method for equipment disconnection caused by control plane abnormality, which comprises the following steps: adding a management security mode for accessing a management communication network in network element equipment; decoupling management configuration and service configuration; when the triggering condition of the management security mode is formed, the network element equipment enters the management security mode, the service configuration is omitted from starting, and only the management configuration is loaded, so that the system recovers the communication capability of the management communication network of the network element equipment. The invention enters the management security mode by introducing the concept of the management security mode and defining the condition of triggering the management security mode, separates the management configuration and the service configuration in the system integrating control and management, ensures that the system can recover the management configuration of the network element and the communication capability of the network element of the management communication network after the system fails, avoids the offline of the network element, and reduces the maintenance cost of the network element when the network element is offline.

Description

Fault-tolerant method for equipment out-of-service caused by control plane abnormality
Technical Field
The invention relates to a management communication network, in particular to a fault-tolerant method for equipment disconnection caused by control plane abnormality.
Background
The management communication network adopts an independent logic path to realize remote management of the equipment by an operator; the equipment is used as a network element and needs to communicate with an adjacent network element in a management communication network, and the equipment and a network management server realize remote intercommunication through routing hop-by-hop release;
the conventional network element equipment realizes the function of managing network communication and can realize an independent management plane through an independent protocol instance or process; in this case, if the control plane fails, the management plane is independent, so that the control plane can be influenced by the failure of the control plane, and the risk of pipe falling is reduced; however, when the conventional network element device realizes the network communication management function, the control plane and the management plane have more information to share and interact with each other, so the complexity of realizing part of functions is higher; meanwhile, as communication networks develop towards the trends of reducing code maintenance, unifying protocol stacks, platforms and the like, the gradual integration of a management plane and a control plane in the technical development process becomes a trend; under the fusion condition, when the control plane has problems and the system can not be recovered due to restart, the management plane is affected, and the network element of the remote management is continuously managed, so that the network element has to be taken off the station for maintenance, and the operation and maintenance cost is increased.
Disclosure of Invention
The invention aims to solve the technical problem that under the condition that a management plane and a control plane are fused, the network element of remote management of the management plane is continuously out of management and cannot be recovered due to failure of the control plane.
In order to solve the technical problem, the technical scheme adopted by the invention is to provide a fault-tolerant method for equipment disconnection caused by control plane abnormality, which comprises the following steps:
adding a management security mode for accessing a management communication network in network element equipment;
decoupling management configuration and service configuration;
when the triggering condition of the management security mode is formed, the network element equipment enters the management security mode, the service configuration is omitted from starting, and only the management configuration is loaded, so that the system recovers the communication capability of the management communication network of the network element equipment.
In the above method, the triggering condition for managing the security mode includes but is not limited to:
the control plane is restarted due to software exception, and the restarting times exceed the threshold times;
and after the control plane is started, the service configuration check fails.
In the method, after the network element equipment enters the management security mode, the service configuration is not checked with the service single disk.
In the method, the system of the network element equipment is normally started, and the service configuration and the management configuration are loaded in sequence respectively.
The invention enters the management security mode by introducing the concept of the management security mode and defining the condition of triggering the management security mode, separates the management configuration and the service configuration in the system integrating control and management, only loads the management configuration after the system fails, recovers the management configuration of the network element and the communication capability of the network element of the management communication network under the condition of not influencing the existing service configuration, avoids the disconnection of the network element, reduces the cost of the maintenance of the lower station after the disconnection of the network element to the maximum extent, and greatly reduces the problem of restarting the control plane caused by various false triggers of the service configuration because only loads the management configuration file after the system is started.
Drawings
FIG. 1 is a flow chart of a fault-tolerant method for controlling the out-of-service of a device caused by an abnormal plane according to the present invention;
FIG. 2 is a schematic diagram illustrating a process of loading a service configuration and a management configuration sequence in an unmanaged security mode according to the present invention;
FIG. 3 is a schematic diagram illustrating a process of loading a service configuration and a management configuration sequence in a management security mode according to the present invention;
FIG. 4 is a diagram illustrating a conventional configuration checking process;
FIG. 5 is a flow chart of the present invention in an embedded implementation environment.
Detailed Description
The invention is described in detail below with reference to the figures and specific examples.
The invention provides a fault-tolerant method for equipment disconnection caused by control plane abnormality, which comprises the following steps as shown in figure 1:
firstly, a management security mode for accessing a management communication network is added in network element equipment.
Because each operator has a respective management communication network protocol standard, the invention does not specifically describe the protocol standard required to be supported, and the management security mode abstracted in the network element equipment can recover the management communication network configuration of the network element and the network element communication capability of the management communication network.
Secondly, decoupling configuration data of a control and management common plane into management configuration and service configuration; generally, the service configuration and the management configuration of the network element device have a partial dependency relationship, for example, the management configuration depends on a default MCC interface to join the VPN, and if the requirement of recovering the interworking capability of the management communication network is to be met by performing no service configuration but only the management configuration, the requirement of decoupling the management configuration from the service configuration must be met in design.
The completely decoupled service configuration and management configuration are sequentially loaded when a normal system is started (in a non-management security mode), and the service configuration and management configuration are respectively recovered (as shown in fig. 2);
thirdly, when the triggering condition of the management security mode is formed, the network element equipment enters the management security mode, neglects the start of service configuration, only loads the management configuration, and restores the system to the state before the restart of the management configuration, and then realizes the restoration of the communication capability of the management network of the network element equipment after each service entity restores the work according to the corresponding configuration of the management network (as shown in fig. 3); the invention only loads the management configuration file, thereby greatly reducing the problems of restarting the control plane caused by various service configuration errors (including processing errors, abnormal error configuration, insufficient memory and the like after the triggered service alarm is reported).
Triggering conditions for managing the secure mode include, but are not limited to:
(1) the control plane is restarted due to software exception, and the restarting times exceed the threshold times; in practical application, an external triggered control plane software defect may cause an abnormal restart of a control plane system, and if the system cannot be normally recovered after restart and continuous restart occurs, a downstream network element device will be taken off management, so that the restart times exceeding a threshold number can be set as a trigger condition for entering a management security mode;
(2) after the control plane is started, the service configuration check fails; since the network element devices access the management communication network and have the management configuration of the specific management communication network, the configuration must depend on the basic system service configuration, if the service configuration is inconsistent with the expected management configuration, the network management may be out of management, and therefore, if the condition that the service configuration is inconsistent with the verification is identified, the management security mode needs to be entered.
In the present invention, since the network element device only loads the management configuration after entering the management security mode, the system lacks complete service configuration, and in order to avoid the influence on the service, the conventional configuration check cannot be performed with the service single disk, and the conventional configuration check process is as shown in fig. 4.
As shown in fig. 5, in an embedded implementation environment, a specific workflow of the fault-tolerant method for controlling plane exception to cause device unmanaging provided by the present invention is as follows:
s101, starting a system;
step S102, judging whether a triggering condition for entering a management safety mode is met, and if so, executing step S104; otherwise, executing step S103;
step S103, respectively loading service configuration and management configuration, and then executing step S107;
step S104, the equipment system enters a management security mode and only loads management configuration;
step S105, judging whether the management security mode is entered currently, if so, executing step S107, otherwise, executing step S106;
s106, after the configuration recovery is finished, configuration checking is carried out on the single disk according to requirements;
and step S107, ending the program.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (2)

1. A fault-tolerant method for equipment disconnection caused by control plane abnormality is characterized by comprising the following steps:
adding a management security mode for accessing a management communication network in network element equipment; the management security mode can recover the management communication network configuration of the network element and the management communication network element communication capability of the management communication network;
the control and management converged public plane service configuration and management configuration have a partial dependency relationship; decoupling configuration data of a common plane with control and management convergence into management configuration and service configuration; when the system of the network element equipment is normally started, the service configuration and the management configuration can be sequentially loaded respectively to complete the respective recovery of the service configuration and the management configuration;
when the triggering condition of the management security mode is formed, the network element equipment enters the management security mode, neglects the start of service configuration, only loads the management configuration, and restores the system to the state before the restart of the management configuration, and then after each service entity restores the work according to the corresponding configuration of the management network, the restoration of the management network communication capability of the network element equipment is realized, so that the system restores the management communication network communication capability of the network element equipment;
the triggering conditions for managing the security mode include, but are not limited to:
the control plane is restarted due to software exception, and the restarting times exceed the threshold times;
and after the control plane is started, the service configuration check fails.
2. The method of claim 1, wherein after the network element device enters the management security mode, the service configuration is not checked for configuration with the service single disk.
CN201610051450.2A 2016-01-26 2016-01-26 Fault-tolerant method for equipment out-of-service caused by control plane abnormality Active CN105703950B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610051450.2A CN105703950B (en) 2016-01-26 2016-01-26 Fault-tolerant method for equipment out-of-service caused by control plane abnormality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610051450.2A CN105703950B (en) 2016-01-26 2016-01-26 Fault-tolerant method for equipment out-of-service caused by control plane abnormality

Publications (2)

Publication Number Publication Date
CN105703950A CN105703950A (en) 2016-06-22
CN105703950B true CN105703950B (en) 2020-04-21

Family

ID=56228609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610051450.2A Active CN105703950B (en) 2016-01-26 2016-01-26 Fault-tolerant method for equipment out-of-service caused by control plane abnormality

Country Status (1)

Country Link
CN (1) CN105703950B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102255799A (en) * 2011-06-23 2011-11-23 中国人民解放军国防科学技术大学 Internal network interface mapping method and device supporting separation of forwarding and control
CN104270341A (en) * 2014-09-03 2015-01-07 烽火通信科技股份有限公司 A data protocol forwarding system and method in an IPRAN
CN204291012U (en) * 2014-12-19 2015-04-22 深圳市邦彦信息技术有限公司 The MicroTCA platform that a kind of business datum is separated with management data
CN204859198U (en) * 2015-08-13 2015-12-09 国网智能电网研究院 OLT device management system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8990365B1 (en) * 2004-09-27 2015-03-24 Alcatel Lucent Processing management packets
US8050219B2 (en) * 2007-11-15 2011-11-01 Telefonaktiebolaget Lm Ericsson (Publ) Logical protocol architecture for wireless metropolitan area networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102255799A (en) * 2011-06-23 2011-11-23 中国人民解放军国防科学技术大学 Internal network interface mapping method and device supporting separation of forwarding and control
CN104270341A (en) * 2014-09-03 2015-01-07 烽火通信科技股份有限公司 A data protocol forwarding system and method in an IPRAN
CN204291012U (en) * 2014-12-19 2015-04-22 深圳市邦彦信息技术有限公司 The MicroTCA platform that a kind of business datum is separated with management data
CN204859198U (en) * 2015-08-13 2015-12-09 国网智能电网研究院 OLT device management system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
交换机数据平面、控制平面、管理平面分离设计——网络设备稳定的重要技术白皮书;福建星网锐捷网络有限公司;《道客巴巴》;20120715;全文 *

Also Published As

Publication number Publication date
CN105703950A (en) 2016-06-22

Similar Documents

Publication Publication Date Title
US11194679B2 (en) Method and apparatus for redundancy in active-active cluster system
RU2606053C2 (en) Method of controlling change in state interconnection node
CN107508694B (en) Node management method and node equipment in cluster
CN108429629A (en) Equipment fault restoration methods and device
CN108984349B (en) Method and device for electing master node, medium and computing equipment
CN104717077B (en) A kind of method, apparatus and system for managing data center
CN114840495B (en) Method, storage medium and equipment for preventing brain fracture of database cluster
CN109982065B (en) Method, device and storage medium for equipment fault recovery in video monitoring network
US20040153704A1 (en) Automatic startup of a cluster system after occurrence of a recoverable error
CN111026585B (en) Storage server hot standby switching method in recording and broadcasting system
CN103559188B (en) Metadata management method and management system
CN101980478A (en) Method and device for detecting and processing equipment failures and network equipment
CN112486718B (en) Database fault automatic switching method, device and computer storage medium
WO2024222514A1 (en) Functional safety system and method for functional safety system
CN105703950B (en) Fault-tolerant method for equipment out-of-service caused by control plane abnormality
CN108667640B (en) Communication method and device, and network access system
CN113377702A (en) Method and device for starting two-node cluster, electronic equipment and storage medium
WO2018137145A1 (en) Data synchronization method and apparatus
CN104836679A (en) Communication abnormity processing method and network element equipment
CN109617716B (en) Data center exception handling method and device
CN111399978A (en) OpenStack-based fault migration system and migration method
CN116719657A (en) Firmware fault log generation method, device, server and readable medium
CN110572292B (en) High availability system and method based on unidirectional transmission link
CN114185691A (en) Sidecar-based UDP multicast method and system
CN102111313A (en) Method and device for automatically recovering access user table

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant