CN110708245B - SDN data plane fault monitoring and recovery method under multi-controller architecture - Google Patents

SDN data plane fault monitoring and recovery method under multi-controller architecture Download PDF

Info

Publication number
CN110708245B
CN110708245B CN201910933770.4A CN201910933770A CN110708245B CN 110708245 B CN110708245 B CN 110708245B CN 201910933770 A CN201910933770 A CN 201910933770A CN 110708245 B CN110708245 B CN 110708245B
Authority
CN
China
Prior art keywords
switch
domain
fault
network
sdn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910933770.4A
Other languages
Chinese (zh)
Other versions
CN110708245A (en
Inventor
陆以勤
金冬子
覃健诚
程喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910933770.4A priority Critical patent/CN110708245B/en
Publication of CN110708245A publication Critical patent/CN110708245A/en
Application granted granted Critical
Publication of CN110708245B publication Critical patent/CN110708245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a SDN data plane fault monitoring and recovering method under a multi-controller architecture, which comprises the following steps: s1, synchronizing global topology among the SDN controllers, and constructing and updating a topological structure of an intra-domain network; s2, judging whether SDN network data plane link failure and switch node failure caused by port abnormality occur or not; s3, the controller resolves the fault; s4, the SDN controllers cooperate to determine a routing path of the arriving data flow, and then a flow table is issued to the switches on the path to complete the routing of the data flow. According to the invention, the detection rate is improved, meanwhile, only a small amount of network load is added, the flexibility and accuracy of detection are considered, and the fault recovery time is reduced.

Description

SDN data plane fault monitoring and recovery method under multi-controller architecture
Technical Field
The invention relates to the field of network reliability research under an SDN architecture, in particular to a fault monitoring and recovering method for an SDN data plane under a multi-controller architecture.
Background
Conventional networks integrate control and forwarding in the same physical device in a tightly coupled relationship. The SDN (software defined network) network technology decouples logic control and data forwarding, and a control plane remotely controls data equipment forwarding by utilizing an API (application programming interface), so that the two planes can be independently and flexibly expanded. Meanwhile, the controller can acquire the global view of the whole network through the centralized control on the SDN network logic, and the convenience of network control and management is improved. The SDN architecture includes a data plane, a control plane, and an application plane. The data plane is composed of network forwarding devices, and the SDN switch is only responsible for data flow forwarding tasks. The control plane is formed by logically centralized controllers and is responsible for controlling and managing the network devices of the data plane, maintaining network topology and state information. The application plane is composed of several SDN services.
SDN, like conventional networks, is inevitably threatened by various failures, resulting in network performance degradation or network paralysis. Common network fault monitoring methods are divided into active network fault monitoring and passive network fault monitoring. Briefly introduced as follows:
the first prior art is as follows: active network fault monitoring
The principle is as follows: obtaining fault information in a network by actively probing a probe message sent into the network
The disadvantages are as follows: it will affect the traffic in the network and increase the network load.
The second prior art is: passive network fault monitoring
The principle is as follows: reasoning and locating faults in a network by passively collecting fault information in the network
The disadvantages are as follows: the presence of symptom loss and symptom falseness in the network causes the failure monitoring of this technique to be inaccurate. There is a large time delay in collecting and processing information in the network in a large-scale distributed network, and the fault monitoring mechanism lacks real-time performance.
Common failure recovery mechanisms can also be classified into active and passive types. The passive recovery mechanism is to notify the controller after the network fails, and the controller reacquires the topology and reroutes the data stream to recover the failure. The active recovery mechanism is to provide redundancy, a backup path is provided by the controller in advance, and when a fault occurs, the fault is solved by switching the backup path. Briefly introduced as follows:
the first prior art is as follows: passive fault recovery mechanism
The principle is as follows: the controller is notified after the network fails, recalculates the route and issues flow table entries to the affected switch
The disadvantages are as follows: the time is more, the load pressure of the multiple controllers is also larger, and the 50ms fault recovery time required by an operator cannot be met.
The second prior art is: active fault recovery mechanism
The principle is as follows: redundancy is provided, the controller provides a backup path in advance, and the switch does not need to request the controller to establish a new path when a fault occurs, but directly switches to the backup path.
The disadvantages are as follows: the range of failures in the network is diverse, and backup paths cannot solve all failure problems, so that the flexibility and the applicability are deficient.
In summary, in the prior art, during fault detection, either the network load is increased to achieve a high detection accuracy, or the fault detection is inaccurate and the delay is large, so that the requirements of flexibility and fault recovery time cannot be met during fault recovery, and an application scenario is inherently limited by an SDN network architecture and lacks of expansibility.
Disclosure of Invention
The SDN data plane fault monitoring and recovery method under the multi-controller architecture is used for solving the problems that in the prior art, network load is increased to achieve higher detection accuracy rate during fault monitoring, or fault detection is inaccurate, delay is large, requirements on flexibility and fault recovery time cannot be met during fault recovery, an application scene is limited by inherent expansibility of an SDN, and the method is not suitable for large networks.
The invention is realized by at least one of the following technical schemes.
The SDN data plane fault monitoring and recovery method under the multi-controller architecture comprises the following steps:
s1, synchronizing global topology among the SDN controllers, and constructing and updating a topological structure of an intra-domain network;
s2, the SDN controller judges whether SDN network data plane link failure and switch node failure caused by Port abnormity occur through monitoring Port-status messages (Port state messages) and Echo messages;
s3, the SDN controller solves the fault, when the SDN controller detects the fault, active fault recovery is adopted, and when the active fault recovery fails, the fault is solved by passive fault recovery;
s4, the SDN controllers cooperate to determine a routing path of the arriving data flow, and then a flow table is issued to the switches on the path to complete the routing of the data flow.
Further, step S1 specifically includes: the method comprises the steps that a plurality of SDN controllers periodically send LLDP data packets to all switches connected with the SDN controllers through Packet _ out messages, and therefore the topological structures of networks in the SDN controller domains are built and updated;
the SDN controllers realize synchronous global topology through east-west interface communication, consistency of an underlying network and service processing logic is guaranteed, only updated information is transmitted in the process of synchronizing the global topology, network bandwidth is saved while the global topology is maintained, and network load is reduced.
Further, the SDN controller in step S2 determines whether a data plane link failure and a switch node failure caused by a Port exception occur by monitoring a Port-status message (Port status message) and an Echo message, and specifically includes:
1) judging the failure of the data plane link: the SDN controller captures a Port-status message sent by a data plane for fault monitoring; when the SDN controller analyzes the Port-status message to know that a certain Port of the switch is deleted, the SDN controller judges whether the Port of the switch is contained in the network according to the local network topology, and if the Port of the switch is contained in the network, the SDN controller considers that a data plane link fault caused by the switch Port fault occurs and needs to solve the fault; otherwise, the deletion of the port is considered to belong to normal network topology change;
2) judging the failure of the switch node: the SDN controller actively monitors switch nodes of a data plane, and judges whether the switch nodes have faults or not through receiving and sending Echo messages; when the SDN controller cannot receive Echo-reply messages of a certain switch for the first time, the SDN controller immediately sends the Echo-request messages to the switch again, and if the SDN controller cannot receive the Echo-reply messages of the switch, the switch node is considered to be in fault, and the fault is solved; if the SDN controller receives an Echo-reply message of the switch, the data plane operates normally.
Further, the step S3 of the SDN controller resolving the fault specifically includes the following steps:
(1) active failure recovery: the method comprises the steps that an SDN controller firstly sends a corresponding group table item to an SDN switch, wherein the group table item comprises a port number of a packet flow forwarding main path and a port number of a backup forwarding path, when the SDN controller detects that the main path fails and is unavailable, an OFPGC _ MODIFY instruction is sent to the switch on the failed path, the switch executes an action instruction corresponding to the group table item, the group table item action instruction of the next priority is selected according to the priority, meanwhile, the controller detects whether the backup forwarding path fails according to existing fault information, and if the backup forwarding path fails, the backup forwarding path is switched to the backup forwarding path to recover the fault;
(2) passive failure recovery: when the backup forwarding path in the active fault recovery cannot solve the data plane link fault, the SDN controller acquires the global topology structure of the data plane again through the LLDP protocol, performs rerouting, and issues a new forwarding path to the switch to complete fault recovery and complete normal operation of the fault recovery;
the SDN controller performs rerouting including performing domain division again, performing a rerouting process again, and determining a forwarding path for the data flow again.
Further, the domain division is to divide the SDN network data plane into a plurality of domains according to the network key elements artificially designated in advance; the network key elements comprise IP address prefixes appointed in the IP network;
different SDN controllers manage and control different domains while also separating switches into edge switches at the boundaries between domains and core switches within domains.
Furthermore, the pre-routing carries out the routing in advance before the network data flow arrives and issues the flow table;
in the pre-routing process, the hop count of the path is used as the routing cost, so the optimal path is the path with the minimum cost, the pre-routing is carried out domain by domain, for each domain, the SDN controller adopts a routing algorithm to obtain the optimal paths from all switches in the domain to the boundary switches, and then the SDN controller adds a flow table for each switch in the domain.
Further, the determining a forwarding path for a data flow specifically includes:
if the two network key elements are the same, the transmitting and receiving end is in the same domain and belongs to the intra-domain route, and the data forwarding path is determined according to the Openflow network route mode;
if the key elements of the network do not carry out cross-domain routing at the same time, the cross-domain routing is divided into three steps, namely calculating the inter-domain optimal routing, and routing the data to the target domain and the data routing in the target domain.
Further, the calculating of the inter-domain optimal route specifically includes:
the SDN controller determines source and destination domains of the data flow according to the source and destination network key elements of the data flow, and obtains an optimal route between the source domain and the destination domain based on a boundary switch by using a global topology and a routing algorithm.
Further, the data is routed to the destination domain, and the process includes: the SDN controller sends a flow table to a source switch according to the obtained inter-domain optimal route, the flow table modifies a target network key element of the flow into a network key element of a first boundary switch on a path, then the flow is routed from the source switch to the boundary switch of a source domain by virtue of pre-routing, meanwhile, the controller sends a flow table to each boundary switch on the optimal inter-domain route, and the action of the flow table item modifies the target network key element of the matched flow into a next boundary switch network key element on the optimal inter-domain route, so that the flow can be routed to the last boundary switch;
and changing the destination network key element of the data flow back to the destination network key element of the original data flow at the last boundary switch.
The data routing process in the destination domain comprises: and calculating the optimal path in the domain according to the routing mode in the OpenFlow network, and issuing the corresponding flow table item to each switch on the path, so that the data flow completes routing.
Further, step S4 specifically includes: the SDN network data plane firstly carries out domain division on the switch, then carries out pre-routing on the switch of the data plane, and finally determines the forwarding path of the data flow and routes the data flow.
Compared with the prior art, the invention has the beneficial effects that: the SDN controller monitors the switch Port fault by analyzing Port-status messages sent by a data plane and combining local network topology information, and monitors the switch node fault by immediately sending Echo-request messages to the switch which does not return the Echo-request messages again, so that the fault detection delay is small, and meanwhile, higher accuracy is guaranteed;
the method adopts a method of combining active and passive modes to carry out fault recovery, absorbs the advantage of small delay when the active fault recovery is directly switched to the backup path, and uses the passive mode as a standby mode to make up for the defect of inflexibility of relying on the backup path in the active mode; the network is divided into independent domains according to management of the SDN controllers, and a single SDN controller is responsible for the fault problem in the domain, so that the complexity of fault recovery can be reduced, wherein the fault monitoring and recovery of the boundary switch are completed by the previous SDN controller in the data flow direction;
the method adopts a method of routing the data flow by the cooperation of multiple controllers, and greatly reduces the load born by the routing decision processing of the controllers due to pre-routing, so the method has the advantage of being applied to a large SDN network.
Drawings
Fig. 1 is a general structure of a fault monitoring and recovering method for an SDN data plane under a multi-controller architecture in this embodiment;
fig. 2 is a data flow routing flowchart of the SDN data plane fault monitoring and recovery method under the multi-controller architecture in this embodiment;
fig. 3 is a flowchart of a SDN data plane fault monitoring and recovery method under the multi-controller architecture in this embodiment.
Detailed Description
The present embodiment will be described below with reference to the accompanying drawings.
As shown in fig. 1, the SDN data plane fault monitoring and recovering method under the multi-controller architecture of the present embodiment includes the following steps:
s1, synchronizing global topology among the SDN controllers, and constructing and updating a topological structure of the network in the domain, specifically as follows:
the method comprises the steps of coordinating routing and information synchronization, determining a routing path of a data flow by a plurality of SDN controllers in a coordinated mode, then issuing a flow table to a part of switches on the path to complete routing of the data flow, simultaneously enabling the plurality of SDN controllers to be in a parallel relation, controlling to acquire and update intra-domain topology information by periodically sending LLDP data packets, enabling the SDN controllers to communicate through east and west interfaces (such as AMQP protocol) to synchronize global topology, ensuring consistency of underlying network and service processing logic, and only transmitting update information in the process of synchronizing the global topology. The arrows in fig. 1 represent the data flows (data flow of the control plane, data flow of the data plane, and data flow of the control plane and the data plane).
S2, the SDN controller judges whether SDN network data plane link failure and switch node failure caused by Port abnormality occur or not by monitoring Port-status messages and Echo messages;
the system comprises a Port-status message and an SDN controller, wherein the Port-status message is used for monitoring a link fault caused by a switch Port fault occurring on a data plane, and the judgment principle is that the switch triggers the Port-status message when the Port state changes and informs the SDN controller of the change of the Port state, the SDN controller receives and analyzes the Port-status message, and the monitoring of the link fault caused by the Port fault is realized by combining local network topology information;
specifically, the judgment of the data plane link failure is that the SDN controller captures a Port-status message sent by the data plane for failure monitoring; when the SDN controller analyzes a Port-status message to know that a certain Port of the switch is deleted, the SDN controller judges whether the Port of the switch is contained in a network or not according to a local network topology (the network topology is formed by points and lines, wherein the switch is represented by one point, if the topology does not contain the point, the switch does not belong to the network topology), and if the Port of the switch is contained in the network, the SDN controller considers that a data plane link fault caused by the switch Port fault occurs and needs to solve the fault; otherwise, the deletion of the port is considered to belong to normal network topology change;
after the switch and the SDN controller are connected, the switch and the SDN controller periodically send Echo-request and Echo-reply messages to keep the connection, so that the switch node fault monitoring with higher accuracy can be achieved by immediately sending the Echo-request messages to the switch which does not return the Echo-reply messages again; the accuracy of fault monitoring is improved by not receiving Echo-reply messages twice.
Specifically, the judgment of the switch node fault is that the SDN controller actively monitors the switch node of the data plane, and whether the switch node has the fault is judged through receiving and sending Echo messages; when the SDN controller cannot receive Echo-reply messages of a certain switch for the first time (when the controller receives a certain data packet from a network, the controller checks a flag bit by itself), the SDN controller immediately sends the Echo-request messages to the switch again, and if the SDN controller cannot receive the Echo-reply messages of the switch, the switch node is considered to be in fault, and the fault is solved; if the SDN controller receives an Echo-reply message of the switch, the data plane operates normally.
S3, the SDN controller solves the fault, when the SDN controller detects the fault, active fault recovery is adopted, and when the active fault recovery fails, the fault is solved by passive fault recovery; monitoring the fault of the switch node with higher accuracy;
active failure recovery, which relies on providing redundant backup paths to replace failed paths to resolve failures;
specifically, active failure recovery: the method comprises the steps that an SDN controller firstly sends a corresponding group entry to an SDN switch, wherein the group entry comprises a port number of a packet flow forwarding main path and a port number of a backup forwarding path, when the SDN controller detects that the main path fails, an OFPGC _ MODIFY instruction is sent to the switch on the failed path, the switch executes an action instruction corresponding to the group entry, the group entry action instruction of the next priority is selected according to the priority, meanwhile, the controller detects whether the backup forwarding path fails according to the existing fault message by using a step S2, and if the backup forwarding path fails, the backup forwarding path is switched to the backup forwarding path to recover the fault; the failure is actively recovered by switching the available backup paths, which has the advantage of small failure recovery delay.
Passive fault recovery, which relies on an SDN controller to reacquire the topology and reroute the data stream to resolve the fault; passive fault recovery recovers a fault by reacquiring the global topology and rerouting, which has the advantage of fault recovery flexibility.
Specifically, passive failure recovery: when the backup forwarding path in the active fault recovery cannot solve the data plane link fault, the SDN controller acquires the global topology structure of the data plane again through the LLDP protocol, performs rerouting, and issues a new forwarding path to the switch to complete fault recovery and complete normal operation of the fault recovery;
the SDN controller performs rerouting, including performing domain division again, performing a rerouting process again, and determining a forwarding path for the data flow again.
The domain division is to divide the SDN network data plane into a plurality of domains according to the network key elements artificially designated in advance; the network key elements comprise IP address prefixes appointed in the IP network;
different SDN controllers manage and control different domains while also separating switches into edge switches at the boundaries between domains and core switches within domains.
The pre-routing carries out the routing in advance before the network data flow arrives and issues the flow table;
in the pre-routing process, hop count of a path is used as a routing cost, so that an optimal path is a path with the minimum cost, the pre-routing is carried out domain by domain, for each domain, an SDN controller obtains optimal paths from all switches in the domain to boundary switches by adopting a routing algorithm (such as a Floyd-Warshall algorithm), and then the SDN controller adds a flow table for each switch in the domain. The load of the controller can be reduced through pre-routing, and the problem of expansibility of a single controller is relieved to a certain extent.
The determining a forwarding path for a data stream specifically includes:
if the two network key elements are the same, the transmitting and receiving end is in the same domain and belongs to the intra-domain route, and the data forwarding path is determined according to the Openflow network route mode;
if the key elements of the network do not carry out cross-domain routing at the same time, the cross-domain routing is divided into three steps, namely calculating the inter-domain optimal routing, and routing the data to the target domain and the data routing in the target domain.
The calculating of the inter-domain optimal route specifically comprises the following steps:
the SDN controller determines source and destination domains of the data flow according to the source and destination network key elements of the data flow, and obtains an optimal route between the source domain and the destination domain based on a boundary switch by using a global topology and a routing algorithm.
The data is routed to a destination domain, and the process comprises the following steps: the SDN controller sends a flow table to a source switch according to the obtained inter-domain optimal route, the flow table modifies a target network key element of the flow into a network key element of a first boundary switch on a path, then the flow is routed from the source switch to the boundary switch of a source domain by virtue of pre-routing, meanwhile, the SDN controller sends a flow table to each boundary switch on the optimal inter-domain route, and the action of the flow table item modifies the target network key element of the matched flow into a next boundary switch network key element on the optimal inter-domain route, so that the flow can be routed to the last boundary switch;
and changing the destination network key element of the data flow back to the destination network key element of the original data flow at the last boundary switch.
The data routing process in the destination domain comprises: and calculating the optimal path in the domain according to the routing mode in the OpenFlow network, and issuing the corresponding flow table item to each switch on the path, so that the data flow completes routing.
S4, determining a routing path of the arriving data flow by cooperation of a plurality of SDN controllers, then issuing a flow table to the switch on the path, and completing routing of the data flow, specifically, a SDN network data plane performs domain division on the switch, then performs pre-routing on the switch of the data plane, and finally determines a forwarding path of the data flow, and routes the data flow.
Cooperative routing under multiple SDN controller architectures as shown in fig. 2, comprising the steps of:
step 201, an SDN controller periodically sends LLDP data packets to all switches connected to the SDN controller through Packet _ out messages, the switches send the LLDP data packets to all ports of the switches after receiving the messages, the switches, after receiving the LLDP data packets, package and send link information between two switches to the SDN controller through Packet _ in messages, the SDN controller builds and updates a topology structure of a network in a domain according to the information after collecting the link information in a management domain of the SDN controller, meanwhile, the SDN controllers are in a parallel relationship, the SDN controllers synchronize a global topology through east-west interfaces (such as an AMQP protocol), and consistency of an underlying network and service processing logic is guaranteed;
step 202, the data plane is divided into a plurality of domains, different SDN controllers manage and control different domains, and the domains are divided according to specified network key elements (such as IP address prefixes in an IP network), and meanwhile, the switches are also divided into edge switches at boundaries between domains and core switches in the domains;
step 203, the domain pre-routing is a process of pre-routing and issuing a flow table before the network data flow arrives, and is performed domain by domain, and the SDN controller obtains optimal paths from all switches in the domain to the boundary switches by adopting a routing algorithm (such as a Floyd-Warshall routing algorithm) and adds a flow table to each switch in the domain;
step 204, the data flow reaches the source exchanger from the source host;
step 205, judging routing conditions, judging whether the data route belongs to an intra-domain route or an inter-domain route according to the source end and destination segment network key elements, wherein if the network key elements are the same, the data route is the intra-domain route, otherwise, the data route is the inter-domain route;
step 206, if the routing is intra-domain routing, routing is performed according to a traditional Openflow network routing mode;
step 207, if the inter-domain routing exists, calculating the inter-domain optimal routing, determining source and destination domains of the data flow by the SDN controller according to the source and destination network key elements of the data flow, and obtaining the routing of the boundary switch between the source domain and the destination domain by using a global topology and a routing algorithm (such as a Floyd-Warshall routing algorithm);
step 208, the SDN controller issues a flow table to the source switch according to the optimal inter-domain route obtained in the previous step, the flow table modifies a destination network key element of the flow into a network key element of a first border switch in the path, the flow is routed from the source switch to the border switch of the source domain by means of a previous pre-route, and the SDN controller issues a flow table to each border switch on the optimal inter-domain route, the flow table item modifies the destination network key element of the matched flow into a next border switch network key element on the optimal inter-domain route, so that the flow can be routed to the last border switch, but the last border switch changes the destination network key element of the flow back to the destination network key element of the original data flow;
step 209, the data arrival routing in the destination domain is the same as step 206, and the traditional Openflow network routing mode is adopted.
The flow chart of the fault monitoring and fault recovery method shown in fig. 3 includes the following steps:
step 301, firstly, entering a fault monitoring stage, and capturing a Port-status message sent by a data plane by an SDN controller;
step 302, judging whether the Port-status message is normal, if so, performing step 304; if not, go to step 303;
step 303, judging whether the switch port belongs to the network, if not, entering step 304 to judge whether the Echo message is normal, otherwise, performing network topology matching in step 303 to judge whether the abnormal port belongs to the network;
step 303, judging whether the abnormal port belongs to the local network; if the abnormal port belongs to the local network, the switch node fault caused by the port fault is considered to occur, and the step 307 is entered;
if the abnormal port does not belong to the network, the deletion of the port is considered to belong to normal network topology change, and an Echo message judgment 304 is carried out;
step 304, judging whether the Echo message is normal, if the Echo message is normal, considering that the data plane normally runs, and entering step 305; if the Echo message is abnormal, step 306 is entered, and the SDN controller immediately sends an Echo-request message again to determine whether the Echo message is normal for the second time;
step 305, after normal operation, through proper time delay of step 315, step 301 is entered again, so as to perform periodic fault monitoring;
step 306, if the Echo message is normal in the second Echo message judgment, the data plane is considered to be normally operated, and the step 305 is entered, otherwise, the step 307 is entered;
step 307, a failure recovery phase, which comprises the following steps:
step 308, active failure recovery is adopted, and the failure of the data plane is solved by switching to a redundant backup path;
step 309, after the active failure is recovered, whether the link failure caused by the switch Port failure exists is judged normally through Port-status information;
step 310 and step 311, judging whether the switch node fault exists through the Echo message;
if the network is judged to have no fault through the steps 309, 310 and 311, the network normally operates, namely, the fault recovery is completed through an active method;
if the network is judged to have a fault through the steps 309, 310 and 311, the step 312 is entered for passive fault recovery;
313, the SDN controller re-acquires the global topology of the data plane and re-routes the data stream;
and step 314, formulating a new forwarding path and sending the new forwarding path to the switch to complete the fault recovery and normal operation.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus a necessary hardware platform, and certainly may be implemented by hardware, but in many cases, the former is a better embodiment. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments or some parts of the embodiments of the present invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (4)

1. An SDN data plane fault monitoring and recovery method under a multi-controller architecture is characterized by comprising the following steps:
s1, a plurality of SDN controllers carry out synchronous global topology, and a topological structure of an intra-domain network is constructed and updated;
s2, the SDN controller judges whether SDN network data plane link failure and switch node failure caused by Port abnormality occur or not by monitoring Port-status messages and Echo messages; the SDN controller judges whether a data plane link fault and a switch node fault caused by Port abnormity occur or not by monitoring Port-status messages and Echo messages, and the judging mode specifically comprises the following steps:
1) judging the failure of the data plane link: the SDN controller captures a Port-status message sent by a data plane for fault monitoring; when the SDN controller analyzes the Port-status message to know that a certain Port of the switch is deleted, the SDN controller judges whether the Port of the switch is contained in the network according to the local network topology, and if the Port of the switch is contained in the network, the SDN controller considers that a data plane link fault caused by the switch Port fault occurs and needs to solve the fault; otherwise, the deletion of the port is considered to belong to normal network topology change;
2) judging the failure of the switch node: the SDN controller actively monitors switch nodes of a data plane, and judges whether the switch nodes have faults or not through receiving and sending Echo messages; when the SDN controller cannot receive Echo-reply messages of a certain switch for the first time, the SDN controller sends the Echo-request messages to the switch again, and if the SDN controller cannot receive the Echo-reply messages of the switch, the switch node is considered to be in fault, and the fault is solved; if the SDN controller receives an Echo-reply message of the switch, the data plane normally operates;
s3, the SDN controller solves the fault, when the SDN controller detects the fault, active fault recovery is adopted, and when the active fault recovery fails, the fault is solved by passive fault recovery;
the fault resolution of the SDN controller specifically comprises the following steps: (1) active failure recovery: the method comprises the steps that an SDN controller firstly sends a corresponding group entry to an SDN switch, wherein the group entry comprises a port number of a packet flow forwarding main path and a port number of a backup forwarding path, when the SDN controller detects that the main path fails, an OFPGC _ MODIFY instruction is sent to the switch on the path with the failure, the switch executes an action instruction corresponding to the group entry, the group entry action instruction of the next priority is selected according to the priority, meanwhile, the SDN controller detects whether the backup forwarding path fails according to existing failure information, and if the backup forwarding path fails, the backup forwarding path is switched to the backup forwarding path to recover the failure;
(2) passive failure recovery: when the backup forwarding path in the active fault recovery cannot solve the Link fault of the data plane, the SDN controller acquires the global topology structure of the data plane again through an LLDP (Link Layer Discovery Protocol) Protocol, performs rerouting, and issues a new forwarding path to the switch to complete fault recovery and complete normal operation of the fault recovery;
the SDN controller performs rerouting, including domain division, a rerouting process and forwarding path determination for the data flow;
the domain division is to divide the SDN network data plane into a plurality of domains according to the network key elements artificially designated in advance; the network key elements comprise IP address prefixes appointed in the IP network;
different SDN controllers manage and control different domains, and simultaneously divide the switches into boundary switches at the boundary between the domains and switches in the domains;
the pre-routing is a process of pre-routing and issuing a flow table before the network data flow arrives;
in the pre-routing process, hop count of a path is used as routing cost, so that the optimal path is the path with the minimum cost, the pre-routing is carried out domain by domain, for each domain, the SDN controller adopts a Floyd-Warshall algorithm to obtain the optimal paths from all switches in the domain to boundary switches, and then the SDN controller adds a flow table to each switch in the domain by using the optimal paths;
the determining a forwarding path for the data stream specifically includes: if the two network key elements are the same, the transmitting and receiving end is in the same domain and belongs to the intra-domain route, and the data forwarding path is determined according to the Openflow network route mode;
if the key elements of the network are different, performing cross-domain routing, wherein the cross-domain routing is divided into three steps, namely calculating inter-domain optimal routing, and routing data to a target domain and routing data in the target domain;
s4, the SDN controllers cooperate to determine a routing path of the arriving data flow, and then a flow table is issued to the switches on the path to complete the routing of the data flow.
2. The SDN data plane fault monitoring and recovery method under a multi-controller architecture of claim 1, wherein step S1 specifically includes: a plurality of SDN controllers send LLDP (Link Layer Discovery Protocol) data packets to all switches connected with the SDN controllers through Packet _ out messages, so that the topological structures of networks in the SDN controller domain are built and updated;
the SDN controllers realize synchronous global topology through east-west interface communication, consistency of an underlying network and service processing logic is guaranteed, only updated information is transmitted in the process of synchronizing the global topology, network bandwidth is saved while the global topology is maintained, and network load is reduced.
3. The SDN data plane fault monitoring and recovery method under a multi-controller architecture of claim 2, wherein the computing inter-domain optimal routing specifically comprises: the SDN controller determines source and destination domains of the data flow according to the source and destination network key elements of the data flow, and obtains an inter-domain optimal route between the source domain and the destination domain based on a boundary switch by using a global topology and a routing algorithm.
4. The SDN data plane fault monitoring and recovery method under a multi-controller architecture of claim 3, wherein the data is routed to a destination domain by a process comprising: the SDN controller sends a flow table to a source switch according to the obtained inter-domain optimal route, the flow table modifies a target network key element of the flow into a network key element of a first boundary switch on a path, then the flow is routed from the source switch to the boundary switch of a source domain by virtue of pre-routing, meanwhile, the SDN controller sends a flow table to each boundary switch on the optimal inter-domain route, and the action of the flow table item modifies the target network key element of the matched flow into a next boundary switch network key element on the optimal inter-domain route, so that the flow can be routed to the last boundary switch;
changing the target network key element of the data flow back to the target network key element of the original data flow at the last boundary switch;
the data routing process in the destination domain comprises: and calculating the optimal path in the domain according to the routing mode in the OpenFlow network, and issuing the corresponding flow table item to each switch on the path, so that the data flow completes routing.
CN201910933770.4A 2019-09-29 2019-09-29 SDN data plane fault monitoring and recovery method under multi-controller architecture Active CN110708245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910933770.4A CN110708245B (en) 2019-09-29 2019-09-29 SDN data plane fault monitoring and recovery method under multi-controller architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910933770.4A CN110708245B (en) 2019-09-29 2019-09-29 SDN data plane fault monitoring and recovery method under multi-controller architecture

Publications (2)

Publication Number Publication Date
CN110708245A CN110708245A (en) 2020-01-17
CN110708245B true CN110708245B (en) 2021-10-22

Family

ID=69196551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910933770.4A Active CN110708245B (en) 2019-09-29 2019-09-29 SDN data plane fault monitoring and recovery method under multi-controller architecture

Country Status (1)

Country Link
CN (1) CN110708245B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114915602B (en) * 2021-01-29 2024-01-26 中移(苏州)软件技术有限公司 Processing method, processing device and terminal for flow table in virtual switch
CN112887202B (en) * 2021-02-02 2022-05-27 浙江工商大学 SDN link fault network convergence method based on sub-topology network
CN115086978B (en) * 2021-03-11 2024-05-07 中国移动通信集团四川有限公司 Network function virtualization SDN network system
CN113660140B (en) * 2021-08-17 2023-04-07 北京交通大学 Service function chain fault detection method based on data control plane hybrid sensing
CN113992569B (en) * 2021-09-29 2023-12-26 新华三大数据技术有限公司 Multipath service convergence method, device and storage medium in SDN network
CN114039833B (en) * 2021-11-09 2024-04-12 江苏大学 SRv 6-based industrial Internet multi-domain integrated architecture
CN115277424B (en) * 2022-06-23 2023-10-03 中国联合网络通信集团有限公司 Decision issuing method, device and storage medium in software defined network
CN115150322B (en) * 2022-09-06 2022-11-25 中勍科技股份有限公司 Multichannel RapidIO distribution system and fault self-isolation method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105871718A (en) * 2016-03-21 2016-08-17 东南大学 SDN (Software-Defined Networking) inter-domain routing implementation method
CN106506353A (en) * 2016-10-27 2017-03-15 吉林大学 Virtual network single link failure restoration methods and system based on SDN
CN106888163A (en) * 2017-03-31 2017-06-23 中国科学技术大学苏州研究院 The method for routing divided based on network domains in software defined network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10356011B2 (en) * 2014-05-12 2019-07-16 Futurewei Technologies, Inc. Partial software defined network switch replacement in IP networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105871718A (en) * 2016-03-21 2016-08-17 东南大学 SDN (Software-Defined Networking) inter-domain routing implementation method
CN106506353A (en) * 2016-10-27 2017-03-15 吉林大学 Virtual network single link failure restoration methods and system based on SDN
CN106888163A (en) * 2017-03-31 2017-06-23 中国科学技术大学苏州研究院 The method for routing divided based on network domains in software defined network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SDN故障监测和恢复技术的研究与实现;卞宇翔;《南京邮电大学硕士学位论文》;20180228;第3.3-3.5节,第4.1-4.3节 *

Also Published As

Publication number Publication date
CN110708245A (en) 2020-01-17

Similar Documents

Publication Publication Date Title
CN110708245B (en) SDN data plane fault monitoring and recovery method under multi-controller architecture
US5016243A (en) Automatic fault recovery in a packet network
US8441941B2 (en) Automating identification and isolation of loop-free protocol network problems
EP0452487B1 (en) Automatic fault recovery in a packet network
EP1511238B1 (en) Distributed and disjoint forwarding and routing system and method
US6983294B2 (en) Redundancy systems and methods in communications systems
EP0452466B1 (en) Automatic fault recovery in a packet network
US7155632B2 (en) Method and system for implementing IS-IS protocol redundancy
JP5941404B2 (en) Communication system, path switching method, and communication apparatus
JP2017508401A (en) Switch replacement of partial software defined network in IP network
JP2004173136A (en) Network management device
JP2009239359A (en) Communication network system, communication device, route design device, and failure recovery method
JP2002033767A (en) Network-managing system
JP2009303092A (en) Network equipment and line switching method
EP1940091B1 (en) Autonomous network, node device, network redundancy method and recording medium
KR102157711B1 (en) Methods for recovering failure in communication networks
CN111404734B (en) Cross-layer network fault recovery system and method based on configuration migration
WO2011120423A1 (en) System and method for communications system routing component level high availability
WO2023015897A1 (en) Intelligent control method, apparatus and system for optical network
JP4717796B2 (en) Node device and path setting method
CN114039833B (en) SRv 6-based industrial Internet multi-domain integrated architecture
Hraska et al. Enhanced Derived Fast Reroute Techniques in SDN
Hainana et al. Design of a NFV Traffic Engineering Middlebox for Efficient Link Failure Detection and Recovery in SDN Core Networks
Valcarenghi et al. Which resilience for the optical internet? an e-Photon/ONe+ outlook

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
OL01 Intention to license declared
OL01 Intention to license declared