CN115473828A - Fault detection method and system based on simulation network - Google Patents

Fault detection method and system based on simulation network Download PDF

Info

Publication number
CN115473828A
CN115473828A CN202210995053.6A CN202210995053A CN115473828A CN 115473828 A CN115473828 A CN 115473828A CN 202210995053 A CN202210995053 A CN 202210995053A CN 115473828 A CN115473828 A CN 115473828A
Authority
CN
China
Prior art keywords
network
fault
simulation
parameter
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210995053.6A
Other languages
Chinese (zh)
Other versions
CN115473828B (en
Inventor
吴功伟
林涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202210995053.6A priority Critical patent/CN115473828B/en
Publication of CN115473828A publication Critical patent/CN115473828A/en
Application granted granted Critical
Publication of CN115473828B publication Critical patent/CN115473828B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements

Abstract

The embodiment of the specification provides a fault detection method and a fault detection system based on a simulation network, wherein the fault detection method based on the simulation network comprises the following steps: receiving a network fault parameter aiming at a simulation network, and adjusting the currently running simulation network based on the network fault parameter, wherein the simulation network is generated based on a network configuration parameter of a physical network; monitoring the current running state of the simulation network, and determining a fault parameter for adjusting the physical network from the simulation network under the condition that the current running state is determined to be inconsistent with a preset fault state corresponding to the network fault parameter; the method and the device realize that the physical network can detect the fault of the network in advance before the network fault does not occur, thereby further avoiding the loss caused by the network fault.

Description

Fault detection method and system based on simulation network
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to a fault detection method based on a simulation network.
Background
The network is a huge network distribution system consisting of network equipment, network automation, network wind control and other components. Although there are sufficient processing mechanisms such as monitoring alarm and failure emergency handling in this huge system to detect failures occurring in the network, these processing mechanisms have disadvantages: both post-response and passive response are possible, and a network failure can only be detected after the network failure occurs.
With the continuous enlargement of network scale and the rapid increase of service types, the frequency of network failures is increased, and in the process of detecting network failures by adopting the existing processing mechanism, the loss caused by network failures cannot be avoided due to the defects of post response and the like of the existing processing mechanism; therefore, how to detect the network failure in advance before the network failure occurs is a problem that needs to be solved urgently.
Disclosure of Invention
In view of this, the embodiments of the present specification provide a fault detection method based on a simulation network. One or more embodiments of the present specification also relate to a fault detection system based on a simulation network, a fault detection apparatus based on a simulation network, a computing device, a computer-readable storage medium, and a computer program, so as to solve the technical defects in the prior art.
According to a first aspect of embodiments of the present specification, there is provided a fault detection method based on a simulation network, including:
receiving a network fault parameter aiming at a simulation network, and adjusting the currently running simulation network based on the network fault parameter, wherein the simulation network is generated based on a network configuration parameter of a physical network;
and monitoring the current operation state of the simulation network, and determining a fault parameter for adjusting the physical network from the simulation network under the condition that the current operation state is determined to be inconsistent with a preset fault state corresponding to the network fault parameter.
According to a second aspect of embodiments herein, there is provided an emulated network-based fault detection system, the system comprising a fault handling node, an emulated network node, and a fault detection node, wherein,
the fault handling node is configured to receive a network fault parameter for a simulated network, and adjust the simulated network currently running in the simulated network node based on the network fault parameter, wherein the simulated network is generated based on a network configuration parameter of a physical network;
the fault detection node is configured to monitor a current operation state of the simulation network, and determine a fault parameter for adjusting the physical network from the simulation network when the current operation state is determined to be inconsistent with a preset fault state corresponding to the network fault parameter.
According to a third aspect of the embodiments of the present specification, there is provided a fault detection method based on a simulation network, applied to detecting a management node, including:
receiving a fault detection request which is sent by a user based on a detection management interface and carries a network fault parameter and a preset fault state, wherein the detection management interface is an interface which is provided by a detection management node to the user and edits the network fault parameter and the preset fault state;
indicating a fault processing node, a simulation network node and a fault detection node included in the fault detection system based on the simulation network based on the network fault parameter and the preset fault state, and executing a network fault detection step;
and receiving the fault parameters sent by the fault detection node, and displaying the fault parameters to the user through the detection management interface.
According to a fourth aspect of embodiments of the present specification, there is provided a fault detection apparatus based on a simulation network, including:
an adjustment module configured to receive a network fault parameter for a simulation network, and adjust the currently running simulation network based on the network fault parameter, wherein the simulation network is generated based on a network configuration parameter of a physical network;
the determining module is configured to monitor a current operating state of the simulation network, and determine a fault parameter for adjusting the physical network from the simulation network when the current operating state is determined to be inconsistent with a preset fault state corresponding to the network fault parameter.
According to a fifth aspect of the embodiments of the present specification, there is provided a fault detection apparatus based on an emulated network, which is applied to detect a management node, and includes:
the system comprises a receiving module, a judging module and a processing module, wherein the receiving module is configured to receive a fault detection request which is sent by a user based on a detection management interface and carries network fault parameters and a preset fault state, and the detection management interface is an interface which is provided by a detection management node to the user and is used for editing the network fault parameters and the preset fault state;
the detection module is configured to indicate a fault processing node, a simulation network node and a fault detection node included in the fault detection system based on the simulation network based on the network fault parameter and the preset fault state, and execute a network fault detection step;
and the display module is configured to receive the fault parameters sent by the fault detection node and display the fault parameters to the user through the detection management interface.
According to a sixth aspect of embodiments herein, there is provided a computing device comprising:
a memory and a processor;
the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions, which when executed by the processor, implement the steps of the simulated network based fault detection method.
According to a seventh aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the emulated network-based fault detection method.
According to an eighth aspect of embodiments herein, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the simulation network based fault detection method.
The fault detection method based on the simulation network provided by the specification comprises the following steps: receiving a network fault parameter aiming at a simulation network, and adjusting the currently running simulation network based on the network fault parameter, wherein the simulation network is generated based on a network configuration parameter of a physical network; and monitoring the current operation state of the simulation network, and determining a fault parameter for adjusting the physical network from the simulation network under the condition that the current operation state is determined to be inconsistent with a preset fault state corresponding to the network fault parameter.
Specifically, the method generates a simulation network based on network configuration parameters of the physical network, and then adjusts the currently running simulation network by using network fault parameters, so as to simulate the network fault of the physical network; and then determining the current running state of the simulation network, determining fault parameters causing network faults from the simulation network when other conditions except the preset fault state occur, and subsequently adjusting the physical network based on the fault parameters, so that the faults existing in the network can be detected in advance before the network faults do not occur in the physical network, and further the loss caused by the network faults is avoided.
Drawings
Fig. 1 is a schematic view of an application scenario of a fault detection method based on a simulation network according to an embodiment of the present specification;
FIG. 2 is a flow chart of a method for fault detection based on a simulation network according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a network chaotic engineering system in a fault detection method based on a simulation network according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a simulation network system in a fault detection method based on a simulation network according to an embodiment of the present specification;
FIG. 5 is a flowchart illustrating a processing procedure of a fault detection method based on a simulation network according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a fault detection system based on a simulation network according to an embodiment of the present specification;
FIG. 7 is a flow diagram of another simulated network based fault detection method provided in an embodiment of the present description;
fig. 8 is a block diagram of a computing device according to an embodiment of the present disclosure.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can be termed a second and, similarly, a second can be termed a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at" \8230; "or" when 8230; \8230; "or" in response to a determination ", depending on the context.
First, the noun terms referred to in one or more embodiments of the present specification are explained.
Chaos engineering: the method is a subject for performing experiments on a distributed system, and the aim of chaotic engineering is to establish the capability and confidence of the system for resisting out-of-control conditions in a production environment; the method is a method or means for determining the abnormal condition of the system by actively introducing faults into the system and according to the performance of the system under various faults, wherein the principle of chaotic engineering comprises five principles of establishing a stable state hypothesis, real world events, tests in production, continuous automatic experiments, a minimum influence range and the like.
Network simulation: the method refers to a technology of simulating and networking similar to 1.
Three-dimensional radar: the system is a monitoring system for various unexpected alarms, and comprises IP address survival detection, internet traffic monitoring, high-risk event alarm, delay alarm, routing abnormity alarm and the like, wherein under the condition of alarm, the system alarms by using the falling indication of the three-dimensional radar.
Linux TC: the command is used for flow control in the Linux kernel.
KVM: the Virtual Machine is a short for Kernel-based Virtual Machine, is an open-source system virtualization module, and can create a virtualization environment.
Docker: is an open source application container engine.
CICD process: also known as CI/CD flow, where CI (persistent integration) is the process of building software and completing initial tests. CD (continuous deployment) is a process that integrates code with the infrastructure.
CV flow: and performing a continuous verification process.
The network comprises network equipment, network automation, network wind control and other components, and a huge network distribution system is formed. In this huge distributed network system, although there are sufficient processing mechanisms such as monitoring alarm and failure emergency treatment, which can detect the failure occurring in the network, these processing mechanisms have the disadvantages: the subsequent response and passive response are realized. With the continuous expansion of network scale and the rapid increase of service types, the frequency of network failures increases, and the adoption of the processing mechanism has higher systematic risk, for example, in practical applications, in the process of running the network of some internet enterprises, the Border Gateway Protocol (BGP) cancels the IP address prefix of the Domain Name System (DNS) hosting the internet enterprise; thereby preventing users of the internet enterprise from resolving, accessing and correlating domain names. Thus, the various application services developed by the internet enterprise are down globally and cannot be used for more than 6-7 hours, resulting in other stock prices falling by approximately 5%, market value evaporating to about $ 473 billion, and at least $ 6000 million of advertising revenue is lost.
Therefore, this risk trend is becoming more and more evident especially today with the ever-expanding network size and the dramatic increase in service types. Since the risks are unavoidable, whether the risks can be discovered and exposed in advance, rather than waiting for the risks to appear to be busy with dealing? In order to solve the problem, the chaos engineering technology is developed. Chaotic engineering also gives a possible answer to protect against systematic risks. However, due to the particularity of the network device (a large amount of upper-layer services with high stability requirements are carried), the risk of performing the chaotic engineering experiment on the online device is very high, and once an accident occurs, the stability of the network device is seriously affected, so that the upper-layer services carried by the network device are further affected. For example, the present specification provides a chaotic engineering system for performing chaotic engineering experiments, but the system performs chaotic experiments in an online environment. If the chaos engineering experiment triggers a fault, the system cannot guarantee timely solution (for example, similar to the fault without a plan in the above example, if the current network chaos test triggers, the time consumption may exceed an expected planning verification time window), and in addition, the systems are relatively limited for the chaos engineering related to the basic network. The main disadvantages are reflected in:
1. linux TC is used for simple network operation, and a systematic and complete injection means is not provided.
2. The main operation is for servers and not for network devices such as switches, routers.
3. The scale can only process a limited range, and all equipment of the whole network cannot be covered.
Therefore, how to perform the chaotic test in the basic network system and ensure the stability of the network device becomes a problem to be solved.
In the present specification, a fault detection method based on a simulation network is provided, and the present specification relates to a fault detection system based on a simulation network, a fault detection apparatus based on a simulation network, a network fault detection apparatus, a computing device, a computer-readable storage medium, and a computer program, which are described in detail one by one in the following embodiments.
Fig. 1 is a schematic diagram illustrating an application scenario of fault detection based on a simulation network according to an embodiment of the present specification, where the fault detection method based on a simulation network provided in the present specification may be applied to a fault detection system based on a simulation network, and as can be seen from fig. 1, the fault detection system based on a simulation network includes: the fault detection method comprises a fault processing node 102, a simulation network node 106 and a fault detection node 104, wherein specifically, the simulation network node 106 generates a simulation network simulating a real physical network based on network configuration parameters of the physical network, operates the simulation network, and then utilizes the fault processing node 102 to adjust the currently operated simulation network through network fault parameters, so as to simulate network faults occurring in the physical network; then, when the current operation state of the simulation network is determined and other conditions except the preset fault state occur, the fault detection node 104 is utilized to determine fault parameters causing network faults from the simulation network, and then the physical network can be adjusted based on the fault parameters, so that the faults existing in the network can be detected in advance before the network faults do not occur in the physical network, and further, the loss caused by the network faults is avoided.
Fig. 2 is a flowchart illustrating a method for fault detection based on a simulation network according to an embodiment of the present disclosure, which includes the following steps.
Step 202: receiving a network fault parameter aiming at a simulation network, and adjusting the currently running simulation network based on the network fault parameter, wherein the simulation network is generated based on a network configuration parameter of a physical network.
It should be noted that the fault detection method based on the emulated network provided in this specification may be applied to a fault detection system based on the emulated network, where the fault detection system based on the emulated network includes a fault handling node, an emulated network node, and a fault detection node.
The fault detection system based on the simulation network can be understood as a chaotic engineering system for chaotic engineering experiments, and in addition, in order to avoid influencing the stability of network equipment, the chaotic engineering system constructs a simulation network simulating a real physical network by introducing a network virtual technology.
The simulation network node may be understood as a node capable of generating a simulation network based on network configuration parameters of a physical network, and it should be noted that the simulation network simulates a simulation network corresponding to a real physical network through a network virtual technology.
The network configuration parameters may be understood as various configuration parameters in the real physical network, including but not limited to information such as server model, disk model of the server, disk capacity, etc., server CPU model, server port information, number and configuration information of routers, IP address, etc.
The fault processing node may be understood as a node capable of adjusting the simulation network based on the network fault parameter, for example, a fault injection platform in a network chaotic engineering system.
The network failure parameter may be understood as a network failure event that may occur or may have occurred in a real physical network, for example, the network failure parameter may be a server, switch or router down, a network outage, an abnormal underlying resource, and so on.
The adjustment of the simulation network based on the network fault parameter can be understood as that the fault processing node adjusts a simulation network device in the simulation network, or simulates an abnormal notification appearing in the simulation network, and the like. For example, the adjusting the simulation network based on the network fault parameter includes but is not limited to:
1. and the falling abnormity of the three-dimensional radar is constructed, and the emergency standard-reaching normality of the change scheme is ensured.
2. And constructing various software and hardware faults, such as simulation of board card abnormity, equipment restart, port down (failure), equipment isolation, port isolation and the like.
3. And constructing basic resource exceptions such as CPU full and memory full.
4. And calling a modification simulation connecting line through an API (application programming interface), and constructing various data forwarding exceptions, such as network congestion, control of packet sending delay, packet loss according to a proportion, modification of a packet structure and the like.
5. And constructing exceptions of various automation management and control systems interfaced with the simulation environment, such as automatic channel exceptions, stereo radar system exceptions and the like.
6. A network application system call exception is constructed, such as a GRPC (a Remote Procedure Call (RPC) technique) call exception.
7. And constructing various code exception injections, such as various code exceptions of C/C + +/GO/Python/Java and the like.
It should be noted that, the above manner of adjusting the simulation network can be understood as the capability of the fault injection platform (i.e., the fault processing node) in the network chaotic engineering system.
The fault detection node can be understood as a node which is used for monitoring the current operation state of the simulation network and can automatically and quickly locate a fault node (namely, a simulation network device which causes a network fault) so as to obtain fault information. In one embodiment provided by the present specification, the fault detection node includes a status monitoring sub-node and a fault determination sub-node.
It should be noted that, in the case that the network fault detection system based on the network virtual technology is a network chaotic engineering system, the simulation network node may be a network simulation system, the fault processing node may be a fault injection platform, the state monitoring sub-node may be a steady-state system, and the fault determination sub-node may be a risk rapid positioning and root cause analysis system. Specifically, referring to fig. 3, fig. 3 is a schematic structural diagram of a network chaotic engineering system in a fault detection method based on a simulation network according to an embodiment of the present disclosure.
Based on fig. 3, the network chaotic engineering system provided in the present specification includes: the system comprises a network simulation system, a fault injection platform, a steady state system, a network chaos automation platform and a root cause analysis system.
The network simulation system is a system capable of simulating a simulation network corresponding to a real physical network based on a virtual network technology, and specifically, refer to fig. 4, where fig. 4 is a schematic structural diagram of a simulation network system in a fault detection method based on a simulation network provided in an embodiment of the present specification.
Specifically, the principle of chaotic engineering is 'experiment in production environment', however, the fault detection method based on the simulation network provided by the specification considers that the fault detection method is not necessarily executed in the production environment, and only the closer to the online environment, the higher the chaotic engineering value is; in addition, considering the influence on network equipment caused by chaotic engineering in a real production environment, for a basic network, the risk of performing experiments in the production environment is very high, and the network problem of beating cattle in a mountain-isolation mode often occurs because the explosion radius is very difficult to control.
Therefore, the fault detection method based on the simulation network provided by the specification can better solve the problems and can prevent systematic risks of the network by adopting a simulation environment + chaotic engineering mode. In addition, under the condition of adopting the simulation environment, the explosion radius can be effectively controlled in the simulation environment, the influence on the production environment is avoided, and the minimum influence range is consistent with the minimum influence range of the chaos engineering principle II, namely the minimum explosion radius. And the simulation environment and the on-line environment can be highly consistent.
Specifically, referring to fig. 4, the simulation environment (i.e., the network simulation system in fig. 4) provided in the present specification utilizes a simulation interconnection technology to perform simulation networking on simulation images on different hosts according to the existing network connection and configuration, where the types of the simulation images include, but are not limited to, a simulation switch, a simulation router, a simulation server, a simulation streaming tester, and the like. If some simulation images exist, the existing images can be used for replacing missing images by adopting a multi-protocol logic equivalent replacement technology, so that the integrity of control plane simulation is ensured. After the simulation environment is built, actual routing information of the existing network is sampled and injected into the system, and the reachability of the routing address is detected through an address detection algorithm specific to the simulation environment. And finally, ensuring the reliability of the simulation environment by adopting a health check mechanism, a layered monitoring mechanism and centralized log output positioning. Thereby completing the construction of the simulation network.
The fault injection platform is a platform which can inject fault events in a real physical network into the simulation network, so that the simulation network can simulate network faults occurring in the real physical network. It should be noted that, in the fault detection method based on the simulation network provided in the present specification, considering that the principle of the chaotic engineering is "real world event", that is, various real world events are used for verification, that is, a fault that actually exists in a simulation production environment and has a theoretical basis is simulated, and therefore, a fault injection platform is built in the network chaotic engineering system.
The steady-state system is a node capable of establishing a stable state hypothesis. The principle of the chaotic engineering is to establish a stable state hypothesis, define monitoring indexes capable of directly reflecting services and expected changes, and build a steady state system in the network chaotic engineering system, wherein the steady state system has the following functions but is not limited to the following functions:
1. and monitoring network service functions such as end-to-end traffic, delay, jitter, and the like.
2. The escape system is provided with redundant escape channels under abnormal conditions.
Specifically, the step is to manage the simulation network in time when a major anomaly occurs in the simulation network, and avoid the capability of further expanding the explosion radius, and specifically may be: and in the case of serious fault of the simulation network, suspending the simulation network running in the network simulation system.
3. The effectiveness of emergency treatment is expected, namely, the emergency treatment needs to be found within a specified time and rolled back within a specified time.
4. And reporting the key alarm information by 100 percent.
5. Ensuring that the network device traffic HASH is uniform in any case.
The risk rapid positioning and root cause analysis system can automatically judge the chaotic result by combining the result (namely the network operation condition) of the steady-state system, and automatically and rapidly position the fault node through the root cause analysis system. And finally, feeding the found risk as a new fault back to the network chaotic automation platform, and sending the new fault to a fault injection platform by the network chaotic automation platform to complete automatic closed loop.
The network chaos automation platform is a platform facing a user in a network chaos engineering system, and the user can edit and manage fault injection capability (namely network fault parameters) and a preset fault state corresponding to the fault injection capability through the network chaos automation platform. It should be noted that, considering that the principle of chaotic engineering is a continuous automatic operation experiment, the chaotic engineering automatic platform is set up in the chaotic engineering system, so that the fault injection capability can be effectively managed and arranged, the fault injection capability can be automatically issued, and the automatic chaotic engineering test-based CV process is constructed by combining with the CI \ CD process.
Based on the above content, the fault detection method based on the simulation network in the specification provides a network simulation chaotic engineering system with low cost and high availability; compared with the existing chaotic engineering system, the system for performing the network chaotic engineering experiment by combining network simulation and chaotic engineering, which is provided by the specification, can systematically perform the simulation chaotic engineering experiment, although the simulation environment cannot be completely consistent with the production environment, the simulation environment is a low-cost high-fidelity environment which is closest to the online environment, and the maximum advantage of the system is that the explosion radius can be effectively controlled on the basis of ensuring the consistency of network control surfaces, and the system cannot influence the production environment and real-time service; even if unexpected abnormal conditions occur in the chaotic engineering process, the repair verification time in the simulation environment is controllable, and the abnormal conditions cannot be effectively controlled in the production environment, so that the systematic risk of a basic network is solved at low cost.
Based on the foregoing, in an embodiment provided in this specification, the adjusting the currently operating simulation network based on the network fault parameter includes:
determining the currently running simulation network, and determining equipment to be adjusted corresponding to the network fault parameter from simulation network equipment of the simulation network;
and adjusting the equipment to be adjusted based on the network fault parameters.
Wherein the emulated network device may be understood as an emulated device for building the emulated network, the emulated network device including, but not limited to, an emulated server, an emulated router, and the like.
Specifically, the fault processing node determines a currently running simulation network in the simulation network nodes, and determines a device to be adjusted corresponding to a network fault parameter from simulation network devices of the simulation network; and then, the equipment to be adjusted is adjusted based on the network fault parameters, so that the network fault of the physical network is simulated, and the problems that the explosion radius cannot be controlled, the real physical network is paralyzed, the service operated by the real physical network is influenced and the like caused by chaotic engineering experiments in the physical network are avoided. And the problems of overhigh manufacturing cost, waste of a large amount of funds and the like caused by constructing a real physical network to carry out chaotic engineering are avoided.
In the above example, the network fault parameter is a network fault event, based on which, in the case that the network fault event is CPU full, the fault injection platform determines all the simulation server CPUs in the simulation network based on the network fault event, and adjusts the current computational power of the simulation server CPUs to be the load state.
Furthermore, in an embodiment provided by the present specification, the fault detection method based on the simulation network provided by the present specification considers the problem of continuous automatic operation experiment, so that a chaos engineering automation platform is built in the network chaos engineering system, and therefore, a user can effectively manage and arrange fault injection capability based on the chaos engineering automation platform and automatically issue the fault injection capability; specifically, the receiving a network fault parameter for the simulation network includes:
receiving a fault detection request sent by a user, wherein the fault detection request carries a network fault parameter aiming at a simulation network and a preset fault state corresponding to the network fault parameter.
The fault handling request may be understood as a request for instructing the fault handling node to perform adjustment on the simulation network, for example, the fault handling request may be an automation process, a CV process, a CI/CD process, or the like.
Specifically, a fault processing request generated by a user based on the chaotic engineering automation platform is received, wherein the fault processing request comprises a network fault parameter generated by effectively managing and arranging fault injection capacity and a preset fault state corresponding to the network fault parameter.
Further, in an embodiment provided in this specification, in order to avoid the problems of high cost and instability of physical network devices caused by implementing chaos engineering in a physical network, the simulation network technology is introduced into a network chaos engineering system and a simulation network corresponding to the physical network is simulated based on the simulation network technology, and then a chaos engineering test is performed on the simulation network, so as to avoid the problems of high cost and instability of physical network devices caused by using the physical network, specifically, before adjusting the currently running simulation network based on the network fault parameter, the method further includes:
and acquiring the network configuration parameters of the physical network, and constructing a simulation network based on simulation network equipment generated by the network configuration parameters.
According to the above example, the network simulation system receives a network construction request sent by the user, wherein the network construction request carries network configuration parameters; and responding to the network construction request, generating simulated network equipment based on the network configuration parameters, and constructing a simulated network through the simulated network equipment.
Further, in order to ensure the authenticity of the simulation network, in the process of constructing the simulation network, not only the simulation network device needs to be generated, but also the simulation network needs to be configured based on configuration parameters such as routing information, connection, IP address, port address, communication protocol and the like of the network device in the real physical network, so as to further ensure the consistency of the simulation network and the real physical network, and avoid the problem of fault detection error caused by inconsistency between the two, specifically, the simulation network device generated based on the network configuration parameters constructs the simulation network, which includes:
determining network equipment parameters and corresponding network equipment configuration parameters from the network configuration parameters;
and generating simulation network equipment based on the network equipment parameters, and configuring the simulation network equipment based on the network equipment configuration parameters to obtain a simulation network.
The network device parameters can be understood as hardware device parameters such as the CPU model, the port number and the like of physical devices in a real physical network; the network device configuration parameters may be understood as routing information, wires, IP addresses, communication protocols, etc. in the real physical network.
According to the above example, the network simulation system determines network device parameters and corresponding network device configuration parameters from the network configuration parameters, generates simulation network devices such as a simulation server and a simulation router based on the network device parameters, and configures routing information, connection, IP addresses and communication protocols corresponding to the simulation network devices based on the routing information, connection, IP addresses and communication protocols in the real physical network, thereby obtaining a simulation network having high consistency with the real physical network
In an embodiment provided in this specification, in order to improve the generation efficiency of the simulated network device, parameters for generating different types of simulated network devices, for example, parameters for generating a simulation server and parameters for generating a simulation router, are stored in advance in the simulated network node; therefore, when the corresponding simulated network device needs to be generated based on the network device parameter, the parameter which is corresponding to the network device parameter and used for generating the simulated network device can be directly determined, and then the simulated network device is generated based on the parameter, so that the problem that a large amount of time is consumed for analyzing the network device parameter in the process of generating the simulated network device through the network device parameter is solved.
Or, in an embodiment provided in this specification, because the physical network devices included in the physical network are too many and the update speed is fast, or a device manufacturer performs security on the physical network devices, and the like, there is a certain hysteresis in the parameters that are pre-stored in the simulation network node and used for generating different types of simulation network devices. Specifically, the generating the simulated network device based on the network device parameter includes:
generating corresponding simulation network equipment according to the network equipment parameters under the condition that the corresponding simulation network equipment parameters do not exist in the network equipment parameters; or alternatively
And generating simulated network equipment based on the simulated network equipment parameters under the condition that the corresponding simulated network equipment parameters exist in the network equipment parameters.
The emulated network device parameters may be understood as parameters for generating the emulated network device, such as a device image.
Along with the above example, the port number and port name of the emulated switch/router in the network emulation system may be inconsistent with the port number and port name of the switch/router in the real physical device, for example, gigabit1/0/0/2 on the real physical device and Ethernet0/0/1 on the emulated device; therefore, in the process of constructing the simulation network by the network simulation system, a simulation technology is adopted, based on the existing simulation equipment and the port name Gigabit1/0/0/2 of the real physical equipment, the simulation equipment which is actually needed is simulated, and the automatic conversion of the physical port and the simulation port is realized when the equipment command is issued, so that the problem of port inconsistency related to configuration fault injection and steady-state data acquisition is solved.
And under the condition that the equipment parameters of the simulation equipment in the network simulation system are consistent with the equipment parameters of the real physical equipment, directly generating the simulation equipment corresponding to the real physical equipment.
Step 204: and monitoring the current operation state of the simulation network, and determining a fault parameter for adjusting the physical network from the simulation network under the condition that the current operation state is determined to be inconsistent with a preset fault state corresponding to the network fault parameter.
The fault parameter for adjusting the physical network may be understood as a parameter that characterizes a network fault occurring in the simulation network and can be used for adjusting the physical network, for example, an identifier of a fault simulation network device that causes the network fault, or fault information of the fault simulation network device (for example, a device port name is wrong, a network port cannot communicate, and the like).
Specifically, the fault detection method based on the simulation network provided in this specification can monitor the current operating state of the simulation network in real time after the simulation network is adjusted according to the network fault parameter, determine the fault parameter causing the network fault from the simulation network when the current operating state of the simulation network is determined and other conditions except the preset fault state corresponding to the network fault parameter occur, and subsequently adjust the physical network based on the fault parameter.
Wherein the current operating state may be understood as an operating state of a network in the simulation network, for example, the simulation network includes a simulation router a and a simulation router B communicatively connected to the simulation router a; when the communication between the simulation router a and the simulation router B is interrupted by the fault injection platform, the current operating state of the simulation network is that the communication between the simulation router a and the simulation router B cannot be performed.
Specifically, the fault detection method based on the simulation network provided in this specification can monitor the current operating state of the simulation network in real time after the simulation network is adjusted by the network fault parameter, and determine that an unexpected fault condition occurs in the simulation network when it is determined that the current operating state is inconsistent with the preset fault state corresponding to the network fault parameter, so that the current operating state of the simulation network can be sent to the fault determination sub-node for fault detection, and the fault parameter for adjusting the physical network is obtained from the simulation network.
Further, in practical applications, in consideration of the fact that in the process of performing a chaos engineering experiment, the explosion radius needs to be limited to a certain extent, so as to avoid the problem that a constructed simulation network is paralyzed in a large scale due to the adjustment of a fault injection platform, thereby ensuring the stability of the simulation network, specifically, the fault detection method based on the simulation network further includes:
monitoring the current operation state of the simulation network, suspending the operation of the simulation network under the condition that the current operation state is determined to meet the preset abnormal condition, and determining the fault parameter for adjusting the physical network from the simulation network.
The preset abnormal condition may be set according to an actual application scenario, for example, the preset abnormal condition is that the downtime rate of the simulated network device in the simulated network is greater than 70%; the explosion radius of the simulation network is larger than a preset radius threshold value and the like.
Along with the above example, in order to avoid a large unexpected fault occurring in the simulation network in the process of performing the chaos engineering experiment based on the simulation network, some abnormal conditions are preset, for example, the downtime rate of the simulation network equipment in the simulation network is greater than 70%; and in the process of carrying out the chaotic engineering experiment, if the current operation state of the simulation network meets the abnormal condition, determining that the simulation network has a large fault, suspending the operation of the simulation network in order to ensure the problem of the simulation network, carrying out fault analysis on the current operation state, and obtaining fault parameters for adjusting the physical network from the simulation network.
Further, the determining the fault parameter for adjusting the physical network from the simulated network includes:
determining a preset fault state corresponding to the current operation state based on the network fault parameter corresponding to the current operation state;
determining fault network equipment and fault information of the fault network equipment from simulation network equipment of the simulation network based on the current operation state, the network fault parameter and the preset fault state;
and determining the fault network equipment and the corresponding fault information as fault parameters for adjusting the physical network.
In the above example, when an unexpected fault condition occurs in the simulation network, a network fault event corresponding to the current operating state of the simulation network is determined, where the network fault event is an event for adjusting the simulation network in the implementation of the current round of chaotic work, that is, a network fault event causing the simulation network to generate the current operating state;
and then, determining a preset fault state corresponding to the network fault event, analyzing the simulation network based on the network fault event, the preset fault state and the current operation state information, and determining simulation network equipment causing the current operation state to be inconsistent with the preset fault state from a plurality of simulation network equipment of the simulation network.
For example, there are an emulated router a and an emulated router B in communication with emulated router a in the emulated network, and there is a corresponding backup emulated router for each of emulated router a and emulated router B; if the fault injection platform is utilized, according to the network fault event that the router A is down, after the simulated router A executes the down operation, theoretically, a backup simulated router of the simulated router A should actively take over the work of the simulated router A and communicate with the simulated router B, and under the condition, the current running state of the simulated network is matched with the preset fault state corresponding to the network fault event that the router A is down.
However, when the steady-state system monitors that the two are not matched, that is, the backup simulation router of the simulation router a does not establish communication with the simulation router B, the steady-state system sends the current operating state of the simulation network to the analysis system;
the analysis system analyzes the simulation network based on the current operating state, the network fault event of the downtime of the router A and the preset fault state corresponding to the network fault event, and determines the reason of causing the abnormal problem to be that: the backup router of the simulation router A cannot be started normally, so that the fault equipment in the simulation network is determined to be the backup simulation router of the simulation router A and the fault information of the corresponding backup simulation router.
Then, the analysis system sends the backup simulation router and the corresponding fault information to a network chaotic automation platform, so that simulation equipment and reasons causing the abnormal problems are displayed for a user, and the corresponding physical equipment in a real physical network can be adjusted subsequently based on the simulation equipment and the reasons.
The fault detection method based on the simulation network provided by the specification is characterized in that a simulation network is generated based on network configuration parameters of a physical network, and then the currently running simulation network is adjusted by utilizing network fault parameters, so that network faults occurring in the physical network are simulated; and then determining the current running state of the simulation network, determining fault parameters causing network faults from the simulation network when other conditions except the preset fault state occur, and subsequently adjusting the physical network based on the fault parameters, so that the faults existing in the network can be detected in advance before the network faults do not occur in the physical network, and further the loss caused by the network faults is avoided.
The following description further explains the fault detection method based on the simulation network by taking an application of the fault detection method based on the simulation network provided in the present specification in a chaos engineering scene based on network simulation as an example with reference to fig. 5. Fig. 5 is a flowchart illustrating a processing procedure of a fault detection method based on a simulation network according to an embodiment of the present specification, where the fault detection method based on the simulation network is applied to a network chaotic engineering system, and the system includes: the method specifically comprises the following steps of a network simulation system, a fault injection platform, a steady-state system, a network chaos automation platform, a risk rapid positioning and root cause analysis system (hereinafter referred to as an analysis system).
Step 502: the network chaos automation platform displays a chaos experiment arrangement interface on a user terminal.
Step 504: and arranging network faults, recovery events and corresponding preset stable states by a user based on the chaotic experiment arrangement interface, and selecting a fault injection object.
The recovery event is used for recovering the simulation network to an initial state after the chaos engineering is finished, and the initial state refers to a state of the simulation network when no fault injection is carried out.
Step 506: and after the arrangement is finished, the user sends a chaotic engineering experiment execution request to the network chaotic automation platform through the experiment arrangement interface.
Step 508: and the network chaotic automation platform responds to the chaotic engineering experiment execution request, sends the network fault event to the fault injection platform and instructs the fault injection platform to perform fault injection on the simulation network.
Step 510: and the fault injection platform responds to the indication of the network chaotic automation platform and performs fault injection on the simulation network running in the network simulation system.
The simulation network is constructed based on network configuration, connection and routing information of a real physical network.
Step 512: the steady-state system monitors the current running state of the simulation network in real time based on the indication of the network chaos automation platform in the chaos engineering experiment process; and under the condition that the current running state is determined to be inconsistent with the preset state, sending the current running state of the simulation network to an analysis system.
For example, a simulation router a and a simulation router B communicating with the simulation router a exist in the simulation network, and a corresponding backup simulation router exists in each of the simulation router a and the simulation router B; if the fault injection platform is utilized, according to the network fault event that the router A is down, after the simulated router A executes the down operation, theoretically, a backup simulated router of the simulated router A should actively take over the work of the simulated router A and communicate with the simulated router B, and under the condition, the current running state of the simulated network is matched with the preset state corresponding to the network fault event that the simulated router A is down.
However, when the steady-state system monitors that the two are not matched, that is, when the backup simulation router of the simulation router a does not establish communication with the simulation router B, the steady-state system determines that an unexpected abnormal condition occurs, and therefore, the current operating state of the simulation network is sent to the analysis system;
step 514: and the analysis system analyzes the fault simulation network equipment in the simulation network and the corresponding fault reason according to the network fault event edited by the user, the preset state and the current running state of the simulation network based on the indication of the network chaos automation platform in the process of carrying out the chaos engineering experiment.
For example, the analysis system analyzes the simulation network based on the current operating state, the network fault event that the simulation router a is down, and the preset fault state corresponding to the network fault event, and determines that the reason for causing the abnormal problem is as follows: the backup simulation router of the simulation router a cannot be started normally, so that it is determined that the fault device appearing in the simulation network is the backup simulation router of the simulation router a and the fault information corresponding to the backup simulation router.
Step 516: and the analysis system is used for sending the fault simulation network equipment and the corresponding fault reason to the network chaotic automation platform.
Step 518: the network chaos automation platform displays the fault simulation network equipment and the corresponding fault reason to a user through a chaos experiment arrangement interface on a user terminal, so that the subsequent user can conveniently adjust a real physical network. And finally, the network chaotic automation platform restores the fault injection to the previous state, so that other experiments can be continued subsequently.
The analysis system sends the backup simulation router and the corresponding fault information to a network chaos automation platform, so that simulation equipment and reasons causing the abnormal problems are displayed for a user, and the corresponding physical equipment in a real physical network can be adjusted subsequently based on the simulation equipment and the reasons.
The fault detection method based on the simulation network provided by the specification provides a solution combining a chaos engineering experiment in a network simulation environment, and by constructing a simulation environment of a control surface 1 of the existing network equipment and butting a monitoring and control system, the coverage comprehensiveness is ensured, the explosion radius is effectively controlled, and the risk possibly brought to the online network by the chaos engineering can be effectively prevented.
Moreover, richer network anomaly injection capability is provided, including various fault injections performed on network equipment (switches, routers), servers, network controllers and the like, including various data forwarding anomalies constructed by calling and modifying simulation connecting lines through an API (application programming interface), anomalies constructed in various automation systems butted with a simulation environment and the like; thereby addressing the systematic risks of the underlying network.
Fig. 6 is a schematic structural diagram of a fault detection system based on a simulation network according to an embodiment of the present disclosure, where the system includes a fault processing node 602, a simulation network node 604, and a fault detection node 606, where,
the fault handling node 602 configured to receive a network fault parameter for a simulated network, and adjust the simulated network currently running in the simulated network node 604 based on the network fault parameter, wherein the simulated network is generated based on a network configuration parameter of a physical network;
the fault detection node 606 is configured to monitor a current operating state of the simulation network, and determine a fault parameter for adjusting the physical network from the simulation network when it is determined that the current operating state is inconsistent with a preset fault state corresponding to the network fault parameter.
For the explanation of the fault processing node 602, the simulation network node 604, and the fault detecting node 606 included in the fault detecting system based on the simulation network, reference may be made to the corresponding or corresponding contents in the fault detecting method based on the simulation network, which is not described in detail in this specification.
Specifically, the fault handling node 602 in the fault detection system based on a simulation network provided in the present specification can receive a network fault parameter for the simulation network, and adjust a currently running simulation network in the simulation network nodes 604 based on the network fault parameter.
After the simulation network is adjusted by the fault processing node 602 according to the network fault parameter, the current operating state of the simulation network can be monitored in real time by using the fault detection node 606, when the current operating state of the simulation network is determined and other conditions except a preset fault state corresponding to the network fault parameter occur, the fault parameter causing the network fault is determined from the simulation network, and then the physical network can be adjusted based on the fault parameter.
In an embodiment provided by the specification, the fault detection system based on the simulation network provided by the specification considers the problem of continuous automatic operation experiment, so that the chaos engineering automation platform is set up in the network chaos engineering system, and a user can effectively manage and arrange fault injection capability based on the chaos engineering automation platform and automatically issue the fault injection capability. Specifically, the system further comprises a detection management node;
the detection management node is configured to receive a fault detection request sent by a user, wherein the fault detection request carries a network fault parameter for the simulation network and a preset fault state corresponding to the network fault parameter;
in response to the fault detection request, sending a fault handling instruction generated based on the network fault parameter to the fault handling node 602; and
and sending the fault detection instruction generated based on the network fault parameter and the preset fault state to the fault detection node 606.
Wherein the fault handling instruction may be understood as an instruction instructing the fault handling node 602 to adjust the simulation network; the fault detection instructions may be understood as instructions that instruct the fault detection node 606 to detect the simulated network and determine fault parameters.
Specifically, the detection management node can provide a detection management interface (e.g., a chaos experiment arrangement interface) for a user, and display the detection management interface to the user on a user terminal, where the detection management interface includes various detection management units (e.g., a control, a button, and an option, etc. used for editing and arranging network fault parameters and preset fault states in the detection management interface). For example, when the fault detection method based on the simulation network is applied to a chaos engineering experiment scene, the detection management structure can arrange an interface for displaying a chaos experiment on a user terminal.
And the user can edit and arrange the network fault parameters by clicking, inputting, selecting and other operations on various detection management units in the detection management interface, so as to send a fault detection request to the detection management node. For example, a user arranges network failure and recovery events, and corresponding preset stable states based on an experimental arrangement interface displayed on a user terminal, and selects an object for failure injection (i.e., a simulation network). It should be noted that, the fault detection method based on the simulation network provided in this specification may construct a plurality of simulation networks, and therefore, in the process of performing the chaos engineering experiment, a user needs to select a simulation network for fault injection.
Then, the detection management node responds to the fault detection request, generates a fault processing instruction based on the network fault parameter, and sends the fault processing instruction to the fault processing node 602;
the fault processing node 602 is capable of receiving a fault processing instruction sent by the detection management node and obtaining a network fault parameter from the fault processing instruction; the simulated network can then be adjusted using the network fault parameter in response to the fault handling instruction. The fault detection node 606 can receive a fault detection instruction sent by the detection management node, and obtain a network fault parameter and a corresponding preset fault state from the fault detection instruction; the simulation network can then be tested and fault parameters determined in response to the fault detection instructions.
In one embodiment provided herein, the fault detection node 606 includes a status monitoring sub-node and a fault determination sub-node; the state monitoring sub-node is configured to monitor a current operation state of the simulation network, and send the current operation state to the fault determination sub-node when the current operation state is determined to be inconsistent with a preset fault state corresponding to the network fault parameter.
Specifically, after the simulation network is adjusted by the network fault parameter, the current operating state of the simulation network can be monitored by the state monitoring sub-node in real time, and when it is determined that the current operating state is inconsistent with the preset fault state corresponding to the network fault parameter, it is determined that an unexpected fault condition occurs in the simulation network, so that the state monitoring sub-node can send the current operating state of the simulation network to the fault determining sub-node for fault detection.
Further, in practical applications, in consideration of the fact that in the process of performing a chaos engineering experiment, the explosion radius needs to be limited to a certain extent, so as to avoid the problem that a constructed simulation network is paralyzed in a large scale due to the adjustment of a fault injection platform, thereby ensuring the stability of the simulation network, specifically, the fault detection method based on the simulation network further includes:
the state monitoring sub-node monitors a current operating state of the simulation network, suspends the operation of the simulation network in the simulation network node 604 when it is determined that the current operating state meets a preset abnormal condition, and sends the current operating state of the simulation network to the failure determination sub-node. For the explanation of the state monitoring child node, reference may be made to the corresponding or corresponding explanation in the above fault detection method based on the simulation network, which is not described in detail herein.
Wherein the fault determining sub-node is configured to determine a fault parameter for adjusting the physical network from the simulated network based on the current operating state.
Specifically, the fault determination sub-node determines a preset fault state corresponding to the current operating state based on the network fault parameter corresponding to the current operating state; determining fault network equipment and fault information of the fault network equipment from simulation network equipment of the simulation network based on the current operation state, the network fault parameter and the preset fault state; and determining the fault network equipment and the corresponding fault information as fault parameters for adjusting the physical network. For the explanation of the fault determination sub-node, reference may be made to the corresponding or corresponding explanation in the above fault detection method based on the simulation network, which is not described in detail herein.
In the fault detection system based on the simulation network provided in this specification, a simulation network is generated in the simulation network node 604 based on the network configuration parameters of the physical network, and then the currently running simulation network is adjusted by the fault processing node 602 through the network fault parameters, so as to simulate the network fault occurring in the physical network; then, by using the fault detection node 606, when the current operation state of the simulation network is determined and other conditions except the preset fault state occur, a fault parameter causing a network fault is determined from the simulation network, and subsequently, the physical network can be adjusted based on the fault parameter, so that the fault existing in the network can be detected in advance before the network fault does not occur in the physical network, and the loss caused by the network fault is further avoided.
Fig. 7 is a flowchart illustrating another simulation network-based fault detection method applied to detecting a management node according to an embodiment of the present disclosure, and the method specifically includes the following steps.
Step 702: and receiving a fault detection request which is sent by a user based on the detection management interface and carries the network fault parameters and the preset fault state.
The detection management interface is an interface which is provided by the detection management node for the user and edits the network fault parameters and the preset fault state.
Step 704: and indicating the fault detection system based on the simulation network to comprise a fault processing node, a simulation network node and a fault detection node based on the network fault parameter and the preset fault state, and executing a network fault detection step.
The simulation network-based fault detection system is the simulation network-based fault detection system in the above embodiment, and therefore the fault detection system based on the simulation network is indicated to include the fault processing node, the simulation network node, and the fault detection node, and the network fault detection step is executed.
Step 706: and receiving the fault parameters sent by the fault detection node, and displaying the fault parameters to the user through the detection management interface.
In another fault detection method based on a simulation network provided in this specification, the detection management node can provide a detection management interface which is displayed on a user terminal and enables a user to edit a network fault parameter and a preset fault state. And the user can edit the network fault parameters and the preset fault state based on the detection management interface and send a fault detection request carrying the network fault parameters and the preset fault state to the detection management node.
The detection management node can receive a fault detection request sent by a user, respond to the fault detection request, indicate a fault processing node, a simulation network node and a fault detection node included in a network fault detection system based on a network virtual technology based on the network fault parameter and a preset fault state, and execute a network fault detection step in the fault detection method based on the simulation network.
And finally, after the network fault detection is finished, the detection management node can receive the fault parameters sent by the fault detection node and display the fault parameters to a user through the detection management interface.
It should be noted that, for the explanation of the another fault detection method based on the simulation network, reference may be made to the corresponding or corresponding contents in the above fault detection method based on the simulation network, and this specification does not specifically limit this.
Another fault detection method based on a simulation network provided in this specification, which is in response to a fault detection request sent by a user based on a detection management interface, indicates a fault processing node, a simulation network node, and a fault detection node included in a network fault detection system based on a network virtual technology based on a network fault parameter and a preset fault state carried in the fault detection request, and executes a network fault detection step; therefore, the network fault can be detected in advance before the physical network fails, and the fault parameters sent by the fault detection nodes are displayed to the user through the detection management interface after the network fault detection is finished, so that the subsequent user can adjust the physical network based on the fault parameters, and the loss caused by the network fault is further avoided.
The above is an illustrative scheme of another fault detection method based on a simulation network according to this embodiment. It should be noted that the technical solution of the another fault detection method based on the simulation network belongs to the same concept as the above technical solution of the network fault detection, and details of the another fault detection method based on the simulation network, which are not described in detail, can be referred to the description of the above technical solution of the network fault detection.
Corresponding to the above method embodiments, the present specification further provides an embodiment of a fault detection apparatus based on a simulation network, where the apparatus includes:
an adjustment module configured to receive a network fault parameter for a simulation network, and adjust the currently running simulation network based on the network fault parameter, wherein the simulation network is generated based on a network configuration parameter of a physical network;
the determining module is configured to monitor a current operating state of the simulation network, and determine a fault parameter for adjusting the physical network from the simulation network when it is determined that the current operating state is inconsistent with a preset fault state corresponding to the network fault parameter.
Optionally, the fault detection apparatus based on a simulation network further includes a network construction module configured to:
and acquiring the network configuration parameters of the physical network, and constructing a simulation network based on simulation network equipment generated by the network configuration parameters.
Optionally, the network building module is further configured to:
determining network equipment parameters and corresponding network equipment configuration parameters from the network configuration parameters;
and generating simulation network equipment based on the network equipment parameters, and configuring the simulation network equipment based on the network equipment configuration parameters to obtain a simulation network.
Optionally, the network construction module is further configured to:
generating corresponding simulation network equipment according to the network equipment parameters under the condition that the corresponding simulation network equipment parameters do not exist in the network equipment parameters; or alternatively
And generating simulated network equipment based on the simulated network equipment parameters under the condition that the corresponding simulated network equipment parameters exist in the network equipment parameters.
Optionally, the adjusting module is further configured to:
determining the currently running simulation network, and determining equipment to be adjusted corresponding to the network fault parameter from simulation network equipment of the simulation network;
and adjusting the equipment to be adjusted based on the network fault parameters.
Optionally, the adjusting module is further configured to:
receiving a fault detection request sent by a user, wherein the fault detection request carries a network fault parameter aiming at a simulation network and a preset fault state corresponding to the network fault parameter.
Optionally, the fault detection apparatus based on a simulation network further includes a suspend operation module configured to:
monitoring the current operation state of the simulation network, suspending the operation of the simulation network under the condition that the current operation state is determined to meet the preset abnormal condition, and determining the fault parameter for adjusting the physical network from the simulation network.
Optionally, the determining module is further configured to:
determining a preset fault state corresponding to the current operation state based on the network fault parameter corresponding to the current operation state;
determining fault network equipment and fault information of the fault network equipment from simulation network equipment of the simulation network based on the current operation state, the network fault parameter and the preset fault state;
and determining the fault network equipment and the corresponding fault information as fault parameters for adjusting the physical network.
In the fault detection device based on the simulation network provided by the present specification, a simulation network is generated based on the network configuration parameters of the physical network, and then the currently running simulation network is adjusted by using the network fault parameters, so as to simulate the network fault occurring in the physical network; and then determining the current operation state of the simulation network, determining fault parameters causing network faults from the simulation network when other conditions except the preset fault state occur, and subsequently adjusting the physical network based on the fault parameters, so that the faults existing in the network can be detected in advance before the network faults do not occur in the physical network, and further the loss caused by the network faults is avoided.
The above is a fault detection device based on the simulation network in this embodiment. It should be noted that the technical solution of the fault detection apparatus based on the simulation network and the above technical solution of the fault detection method based on the simulation network belong to the same concept, and details of the technical solution of the fault detection apparatus based on the simulation network, which are not described in detail, can be referred to the description of the technical solution of the fault detection method based on the simulation network.
Corresponding to the above method embodiment, the present specification further provides another embodiment of a fault detection device based on a simulation network, where the device is applied to a detection management module, and includes:
the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is configured to receive a fault detection request which is sent by a user based on a detection management interface and carries network fault parameters and a preset fault state, and the detection management interface is an interface which is provided by a detection management node to the user and edits the network fault parameters and the preset fault state;
the detection module is configured to indicate a fault processing node, a simulation network node and a fault detection node included in the fault detection system based on the simulation network based on the network fault parameter and the preset fault state, and execute a network fault detection step;
and the display module is configured to receive the fault parameters sent by the fault detection node and display the fault parameters to the user through the detection management interface.
Another fault detection device based on a simulation network, which is provided by the present specification, is configured to, in response to a fault detection request sent by a user based on a detection management interface, instruct a network fault detection system based on a network virtual technology to include a fault processing module, a simulation network module, and a fault detection module based on a network fault parameter and a preset fault state carried in the fault detection request, and execute a network fault detection step in the above fault detection method based on a simulation network; therefore, the network fault can be detected in advance before the physical network has no network fault, and the fault parameters sent by the fault detection module are displayed to a user through the detection management interface after the network fault is detected, so that the subsequent user can adjust the physical network based on the fault parameters, and the loss caused by the network fault is further avoided.
The above is another fault detection apparatus based on a simulation network according to this embodiment. It should be noted that the technical solution of the another fault detection apparatus based on a simulation network and the above technical solution of the another fault detection method based on a simulation network belong to the same concept, and details of the another fault detection apparatus based on a simulation network, which are not described in detail, can be referred to the description of the above technical solution of the another fault detection method based on a simulation network.
FIG. 8 illustrates a block diagram of a computing device 800, according to one embodiment of the present description. The components of the computing device 800 include, but are not limited to, memory 810 and a processor 820. The processor 820 is coupled to the memory 810 via a bus 830, and the database 850 is used to store data.
Computing device 800 also includes access device 840, access device 840 enabling computing device 800 to communicate via one or more networks 860. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 840 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 800, as well as other components not shown in FIG. 8, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device structure shown in FIG. 8 is for purposes of example only and is not limiting as to the scope of the description. Other components may be added or replaced as desired by those skilled in the art.
Computing device 800 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 800 may also be a mobile or stationary server.
Wherein the processor 820 is configured to execute computer-executable instructions, which when executed by the processor 820, implement the steps of the above-described simulated network based fault detection method.
The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the fault detection method based on the simulation network belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the fault detection method based on the simulation network.
An embodiment of the present specification further provides a computer-readable storage medium storing computer-executable instructions, which when executed by a processor, implement the steps of the above-mentioned simulation network-based fault detection method.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the above-mentioned fault detection method based on the emulated network, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned fault detection method based on the emulated network.
An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer program causes the computer to execute the steps of the above fault detection method based on the simulation network.
The above is an illustrative scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program is the same as the technical solution of the fault detection method based on the simulation network, and details of the technical solution of the computer program, which are not described in detail, can be referred to the description of the technical solution of the fault detection method based on the simulation network.
The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of combinations of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the embodiments. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims (14)

1. A fault detection method based on a simulation network comprises the following steps:
receiving a network fault parameter aiming at a simulation network, and adjusting the currently running simulation network based on the network fault parameter, wherein the simulation network is generated based on a network configuration parameter of a physical network;
and monitoring the current operation state of the simulation network, and determining a fault parameter for adjusting the physical network from the simulation network under the condition that the current operation state is determined to be inconsistent with a preset fault state corresponding to the network fault parameter.
2. The simulated network based fault detection method of claim 1, prior to adjusting the currently operating simulated network based on the network fault parameter, further comprising:
and acquiring the network configuration parameters of the physical network, and constructing a simulation network based on simulation network equipment generated by the network configuration parameters.
3. The method for fault detection based on simulation network according to claim 2, wherein the step of building the simulation network based on the simulation network device generated by the network configuration parameters comprises:
determining network equipment parameters and corresponding network equipment configuration parameters from the network configuration parameters;
and generating simulation network equipment based on the network equipment parameters, and configuring the simulation network equipment based on the network equipment configuration parameters to obtain a simulation network.
4. The simulated network based fault detection method of claim 3, said generating simulated network devices based on said network device parameters comprising:
generating corresponding simulated network equipment according to the network equipment parameters under the condition that the corresponding simulated network equipment parameters do not exist in the network equipment parameters; or
And generating simulated network equipment based on the simulated network equipment parameters under the condition that the corresponding simulated network equipment parameters exist in the network equipment parameters.
5. The simulated network based fault detection method of claim 1, said adjusting said currently operating simulated network based on said network fault parameter, comprising:
determining the currently running simulation network, and determining equipment to be adjusted corresponding to the network fault parameter from simulation network equipment of the simulation network;
and adjusting the equipment to be adjusted based on the network fault parameters.
6. The simulated network based fault detection method of claim 1, said receiving network fault parameters for a simulated network, comprising:
receiving a fault detection request sent by a user, wherein the fault detection request carries a network fault parameter aiming at a simulation network and a preset fault state corresponding to the network fault parameter.
7. The simulated network based fault detection method of claim 1, further comprising:
monitoring the current operation state of the simulation network, suspending the operation of the simulation network under the condition that the current operation state is determined to meet the preset abnormal condition, and determining the fault parameter for adjusting the physical network from the simulation network.
8. The simulated network based fault detection method of claim 1, said determining fault parameters from the simulated network that adjust the physical network comprising:
determining a preset fault state corresponding to the current operation state based on the network fault parameter corresponding to the current operation state;
determining fault network equipment and fault information of the fault network equipment from simulation network equipment of the simulation network based on the current operation state, the network fault parameter and the preset fault state;
and determining the fault network equipment and the corresponding fault information as fault parameters for adjusting the physical network.
9. A fault detection system based on an emulated network, the system comprising a fault handling node, an emulated network node, and a fault detection node, wherein,
the fault processing node is configured to receive a network fault parameter for a simulation network, and adjust the simulation network currently running in the simulation network node based on the network fault parameter, wherein the simulation network is generated based on a network configuration parameter of a physical network;
the fault detection node is configured to monitor a current operation state of the simulation network, and determine a fault parameter for adjusting the physical network from the simulation network when the current operation state is determined to be inconsistent with a preset fault state corresponding to the network fault parameter.
10. The emulated network-based fault detection system of claim 9, the system further comprising a detection management node;
the detection management node is configured to receive a fault detection request sent by a user, wherein the fault detection request carries a network fault parameter for the simulation network and a preset fault state corresponding to the network fault parameter;
responding to the fault detection request, and sending a fault processing instruction generated based on the network fault parameter to the fault processing node; and
and sending the fault detection instruction generated based on the network fault parameters and the preset fault state to the fault detection node.
11. The emulated network based fault detection system of claim 9, the fault detection node comprising a status monitoring sub-node and a fault determination sub-node;
the state monitoring sub-node is configured to monitor a current operating state of the simulation network, and send the current operating state to the fault determining sub-node under the condition that the current operating state is determined to be inconsistent with a preset fault state corresponding to the network fault parameter;
the fault determination sub-node is configured to determine a fault parameter for adjusting the physical network from the simulation network based on the current operating state.
12. A fault detection method based on a simulation network is applied to detecting management nodes and comprises the following steps:
receiving a fault detection request which is sent by a user based on a detection management interface and carries a network fault parameter and a preset fault state, wherein the detection management interface is an interface which is provided by a detection management node to the user and edits the network fault parameter and the preset fault state;
indicating a fault processing node, a simulation network node and a fault detection node included in the fault detection system based on the simulation network based on the network fault parameter and the preset fault state, and executing a network fault detection step;
and receiving the fault parameters sent by the fault detection node, and displaying the fault parameters to the user through the detection management interface.
13. A computing device, comprising:
a memory and a processor;
the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions which, when executed by the processor, implement the steps of the emulated network-based fault detection method of any of claims 1 to 8, or the emulated network-based fault detection method of claim 12.
14. A computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the steps of the simulated network based fault detection method of any of claims 1 to 8 or the simulated network based fault detection method of claim 12.
CN202210995053.6A 2022-08-18 2022-08-18 Fault detection method and system based on simulation network Active CN115473828B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210995053.6A CN115473828B (en) 2022-08-18 2022-08-18 Fault detection method and system based on simulation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210995053.6A CN115473828B (en) 2022-08-18 2022-08-18 Fault detection method and system based on simulation network

Publications (2)

Publication Number Publication Date
CN115473828A true CN115473828A (en) 2022-12-13
CN115473828B CN115473828B (en) 2024-01-05

Family

ID=84366403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210995053.6A Active CN115473828B (en) 2022-08-18 2022-08-18 Fault detection method and system based on simulation network

Country Status (1)

Country Link
CN (1) CN115473828B (en)

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001017169A2 (en) * 1999-08-31 2001-03-08 Accenture Llp A system, method and article of manufacture for a network-based predictive fault management system
US20050169185A1 (en) * 2004-01-30 2005-08-04 Microsoft Corporation Fault detection and diagnosis
US20060072707A1 (en) * 2004-09-30 2006-04-06 International Business Machines Corporation Method and apparatus for determining impact of faults on network service
WO2007147936A1 (en) * 2006-06-21 2007-12-27 Teliasonera Ab A method, a system and a computer program product for troubleshooting
CN101136801A (en) * 2007-03-06 2008-03-05 中兴通讯股份有限公司 Network fault detecting method
CN102546243A (en) * 2011-12-23 2012-07-04 广东电网公司电力科学研究院 Fault simulation analysis method for SP Guru-based electric power dispatching data network
CN102724064A (en) * 2012-05-17 2012-10-10 清华大学 Method for building network application simulation system
WO2015091785A1 (en) * 2013-12-19 2015-06-25 Bae Systems Plc Method and apparatus for detecting fault conditions in a network
CN107947988A (en) * 2017-11-28 2018-04-20 华信塞姆(成都)科技有限公司 A kind of Real Time Communication Network analogue system
US20180227167A1 (en) * 2017-02-08 2018-08-09 Macau University Of Science And Technology System, method, computer program and data signal for fault detection and recovery of a network
CN108449210A (en) * 2018-03-21 2018-08-24 中国人民解放军陆军炮兵防空兵学院郑州校区 A kind of EIGRP routing networks fault monitoring system
US20180337828A1 (en) * 2017-05-19 2018-11-22 Microsoft Technology Licensing, Llc System and method for mapping a connectivity state of a network
US10771316B1 (en) * 2017-11-30 2020-09-08 Amazon Technologies, Inc. Debugging of a network device through emulation
AU2020103179A4 (en) * 2020-11-02 2021-01-14 China Southern Power Grid Research Institute A Fault Locating Method of Power Grid Based on Network Theory
WO2021017364A1 (en) * 2019-07-26 2021-02-04 京信通信系统(中国)有限公司 Network failure diagnosis method and apparatus, network device, and storage medium
CN112887148A (en) * 2021-01-29 2021-06-01 烽火通信科技股份有限公司 Method and device for simulating and predicting network flow
CN113300871A (en) * 2020-09-14 2021-08-24 阿里巴巴集团控股有限公司 Networking method and device of simulation network
CN114070710A (en) * 2020-09-22 2022-02-18 北京市天元网络技术股份有限公司 Communication network fault analysis method and device based on digital twin
CN114285732A (en) * 2021-12-23 2022-04-05 中国建设银行股份有限公司 Network fault positioning method, system, storage medium and electronic equipment
CN114785666A (en) * 2022-06-22 2022-07-22 北京必示科技有限公司 Network fault troubleshooting method and system

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001017169A2 (en) * 1999-08-31 2001-03-08 Accenture Llp A system, method and article of manufacture for a network-based predictive fault management system
US20050169185A1 (en) * 2004-01-30 2005-08-04 Microsoft Corporation Fault detection and diagnosis
CN1665205A (en) * 2004-01-30 2005-09-07 微软公司 Fault detection and diagnosis
US20060072707A1 (en) * 2004-09-30 2006-04-06 International Business Machines Corporation Method and apparatus for determining impact of faults on network service
WO2007147936A1 (en) * 2006-06-21 2007-12-27 Teliasonera Ab A method, a system and a computer program product for troubleshooting
CN101136801A (en) * 2007-03-06 2008-03-05 中兴通讯股份有限公司 Network fault detecting method
CN102546243A (en) * 2011-12-23 2012-07-04 广东电网公司电力科学研究院 Fault simulation analysis method for SP Guru-based electric power dispatching data network
CN102724064A (en) * 2012-05-17 2012-10-10 清华大学 Method for building network application simulation system
WO2015091785A1 (en) * 2013-12-19 2015-06-25 Bae Systems Plc Method and apparatus for detecting fault conditions in a network
US20180227167A1 (en) * 2017-02-08 2018-08-09 Macau University Of Science And Technology System, method, computer program and data signal for fault detection and recovery of a network
US20180337828A1 (en) * 2017-05-19 2018-11-22 Microsoft Technology Licensing, Llc System and method for mapping a connectivity state of a network
CN107947988A (en) * 2017-11-28 2018-04-20 华信塞姆(成都)科技有限公司 A kind of Real Time Communication Network analogue system
US10771316B1 (en) * 2017-11-30 2020-09-08 Amazon Technologies, Inc. Debugging of a network device through emulation
CN108449210A (en) * 2018-03-21 2018-08-24 中国人民解放军陆军炮兵防空兵学院郑州校区 A kind of EIGRP routing networks fault monitoring system
WO2021017364A1 (en) * 2019-07-26 2021-02-04 京信通信系统(中国)有限公司 Network failure diagnosis method and apparatus, network device, and storage medium
CN113300871A (en) * 2020-09-14 2021-08-24 阿里巴巴集团控股有限公司 Networking method and device of simulation network
CN114070710A (en) * 2020-09-22 2022-02-18 北京市天元网络技术股份有限公司 Communication network fault analysis method and device based on digital twin
AU2020103179A4 (en) * 2020-11-02 2021-01-14 China Southern Power Grid Research Institute A Fault Locating Method of Power Grid Based on Network Theory
CN112887148A (en) * 2021-01-29 2021-06-01 烽火通信科技股份有限公司 Method and device for simulating and predicting network flow
CN114285732A (en) * 2021-12-23 2022-04-05 中国建设银行股份有限公司 Network fault positioning method, system, storage medium and electronic equipment
CN114785666A (en) * 2022-06-22 2022-07-22 北京必示科技有限公司 Network fault troubleshooting method and system

Also Published As

Publication number Publication date
CN115473828B (en) 2024-01-05

Similar Documents

Publication Publication Date Title
US20200267069A1 (en) Link switching method, link switching device, network communication system, and computer-readable storage medium
CN111831569A (en) Test method and device based on fault injection, computer equipment and storage medium
CN106911648B (en) Environment isolation method and equipment
US11003516B2 (en) Geographical redundancy and dynamic scaling for virtual network functions
US20220052923A1 (en) Data processing method and device, storage medium and electronic device
GB2536750A (en) Methods and apparatus to provide redundancy in a process control system
Fonseca et al. Resilience of sdns based on active and passive replication mechanisms
US9241007B1 (en) System, method, and computer program for providing a vulnerability assessment of a network of industrial automation devices
CN112291075B (en) Network fault positioning method and device, computer equipment and storage medium
CN102291262B (en) The method, apparatus and system of a kind of disaster tolerance
Ramesh et al. The smart network management automation algorithm for administration of reliable 5G communication networks
CN114116912A (en) Method for realizing high availability of database based on Keepalived
CN111371592B (en) Node switching method, device, equipment and storage medium
Seliuchenko et al. Automated recovery of server applications for SDN-based internet of things
US6931357B2 (en) Computer network monitoring with test data analysis
CN116132519A (en) Device management method, device and readable storage medium
US20240097979A1 (en) Fabric availability and synchronization
CN111405004B (en) Switch management method and device, equipment and storage medium
JP6555721B2 (en) Disaster recovery system and method
CN110351122B (en) Disaster recovery method, device, system and electronic equipment
CN115473828B (en) Fault detection method and system based on simulation network
CN109104333B (en) GIT-based distributed cluster synchronization method and device
CN111756826A (en) DLM lock information transmission method and related device
CN105550065A (en) Database server communication management method and device
Kulkarni et al. REARM: Renewable energy based resilient deployment of Virtual Network Functions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant