CN114113984A - Fault drilling method, device, terminal equipment and medium based on chaotic engineering - Google Patents

Fault drilling method, device, terminal equipment and medium based on chaotic engineering Download PDF

Info

Publication number
CN114113984A
CN114113984A CN202111437555.9A CN202111437555A CN114113984A CN 114113984 A CN114113984 A CN 114113984A CN 202111437555 A CN202111437555 A CN 202111437555A CN 114113984 A CN114113984 A CN 114113984A
Authority
CN
China
Prior art keywords
fault
response result
target
fault response
expected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111437555.9A
Other languages
Chinese (zh)
Inventor
刘俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Yizhangtong Cloud Technology Shenzhen Co ltd
Original Assignee
Ping An Yizhangtong Cloud Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Yizhangtong Cloud Technology Shenzhen Co ltd filed Critical Ping An Yizhangtong Cloud Technology Shenzhen Co ltd
Priority to CN202111437555.9A priority Critical patent/CN114113984A/en
Publication of CN114113984A publication Critical patent/CN114113984A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/28Testing of electronic circuits, e.g. by signal tracer
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R1/00Details of instruments or arrangements of the types included in groups G01R5/00 - G01R13/00 and G01R31/00
    • G01R1/28Provision in measuring instruments for reference values, e.g. standard voltage, standard waveform

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The embodiment of the invention discloses a fault drilling method, a fault drilling device, terminal equipment and a medium based on chaotic engineering. Acquiring a chaotic engineering test plan matched with a target system; injecting the target fault into the system according to the fault response influence factor, and monitoring the actual fault response result of the system aiming at the target fault in real time; when the difference between the expected fault response result and the actual fault response result meets the dynamic adjustment condition, matching and adjusting the fault response influence factor; and returning to execute the operation of injecting the target fault into the system according to the fault response influence factor until the actual fault result is matched with the expected fault response result or the fault drilling ending condition is met, and recording the mapping relation between the current fault response influence factor and the actual fault response result. The problem that the fault injection causes the system to exceed the controllable range is solved, and the automation degree, the test efficiency, the test range and the robustness of the chaotic engineering are improved.

Description

Fault drilling method, device, terminal equipment and medium based on chaotic engineering
Technical Field
The embodiment of the invention relates to a computer data processing technology, in particular to a fault drilling method, a fault drilling device, terminal equipment and a medium based on chaotic engineering.
Background
Chaos engineering is a complex technical means for improving the elastic capability of a technical architecture. The main implementation method is to introduce faults of each layer to the whole system randomness under a certain service background flow, and observe the performance of the system through a perfect monitoring means, thereby discovering hidden dangers of the system, pertinently solving related problems, and simultaneously establishing the capability and confidence of the system for resisting out-of-control conditions in the production environment.
The conventional chaotic engineering system mainly focuses on realizing multi-dimensional fault injection and constructing a monitoring system of the whole system, namely, the prior art mainly focuses on what kind of faults are injected into a system to be tested in what action range, and then the system to be tested makes what kind of fault response. However, once the injected fault exceeds the range that the system to be tested can bear, the whole system may enter an out-of-control state and fail to respond to the actual service requirement, and further, the existing chaotic fault system cannot meet the requirements of people on the system elasticity and the system robustness.
Disclosure of Invention
The embodiment of the invention provides a fault drilling method, a fault drilling device, terminal equipment and a medium based on chaotic engineering, which are used for avoiding the problem that the injection of a fault causes the system to exceed a controllable range, and improving the automation degree of the chaotic engineering and the robustness of the system.
In a first aspect, an embodiment of the present invention provides a method for performing fault drilling based on chaotic engineering, where the method includes:
acquiring a chaotic engineering test plan matched with a target system, wherein the chaotic engineering test plan comprises the following steps: a fault response impact factor and an expected fault response result corresponding to a target fault to be injected;
injecting the target fault into the target system according to the fault response influence factor, and monitoring the actual fault response result of the target system aiming at the target fault in real time;
when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, performing matched adjustment on the fault response influence factor;
and returning to execute the operation of injecting the target fault into the target system according to the fault response influence factor until the actual fault result is matched with the expected fault response result or the fault drilling ending condition is met, and recording the mapping relation between the current fault response influence factor and the current actual fault response result.
In a second aspect, an embodiment of the present invention further provides a fault drilling device based on chaotic engineering, where the fault drilling device based on chaotic engineering includes:
the chaotic engineering test plan acquisition module is used for acquiring a chaotic engineering test plan matched with a target system, and the chaotic engineering test plan comprises the following components: a fault response impact factor and an expected fault response result corresponding to a target fault to be injected;
the actual fault response result monitoring module is used for injecting the target fault into the target system according to the fault response influence factor and monitoring the actual fault response result of the target system aiming at the target fault in real time;
the fault response influence factor adjusting module is used for carrying out matching adjustment on the fault response influence factor when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjusting condition;
and the mapping relation recording module is used for returning and executing the operation of injecting the target fault into the target system according to the fault response influence factor until the actual fault result is matched with the expected fault response result or the fault drilling ending condition is met, and recording the mapping relation between the current fault response influence factor and the current actual fault response result.
In a third aspect, an embodiment of the present invention further provides a terminal device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, where the processor implements the chaotic engineering-based fault drilling method according to any embodiment of the present invention when executing the computer program.
In a fourth aspect, an embodiment of the present invention further provides a storage medium readable by a computer, and having a computer program stored thereon, where the program is executed by a processor to implement the chaotic engineering based fault drilling method according to any embodiment of the present invention.
According to the technical scheme provided by the embodiment of the invention, a chaotic engineering test plan matched with a target system is obtained; injecting the target fault into the target system according to the fault response influence factor, and monitoring the actual fault response result of the target system aiming at the target fault in real time; when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, performing matched adjustment on the fault response influence factor; and returning to execute the operation of injecting the target fault into the target system according to the fault response influence factor until the actual fault result is matched with the expected fault response result or the fault drilling ending condition is met, and recording the mapping relation between the current fault response influence factor and the current actual fault response result. The chaotic engineering system can avoid the influence of fault injection on the system beyond a controllable range, can further improve the automation degree of the chaotic engineering, triggers more random faults in a full-automatic mode, further improves the testing efficiency and the testing range, effectively covers various scenes, and improves the robustness of the chaotic engineering system.
Drawings
Fig. 1 is a flowchart of a fault drilling method based on chaotic engineering according to an embodiment of the present invention;
fig. 2 is a flowchart of another fault drilling method based on chaotic engineering according to an embodiment of the present invention;
fig. 3 is a flowchart of another fault drilling method based on chaotic engineering according to an embodiment of the present invention;
fig. 4 is a flowchart of another fault drilling method based on chaotic engineering according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a fault drilling device based on chaotic engineering according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present invention. It should be understood that the drawings and the embodiments of the present invention are illustrative only and are not intended to limit the scope of the present invention.
It should be understood that the various steps recited in the method embodiments of the present invention may be performed in a different order and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present invention are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in the present invention are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that reference to "one or more" unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present invention are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
The embodiment of the invention can be used for testing the chaotic engineering of the computer system realized based on the artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Fig. 1 is a flowchart of a fault drilling method based on chaotic engineering according to an embodiment of the present invention. The method and the device are suitable for the situation of fault injection in the production process or the test process of the chaotic engineering system. The method of the embodiment may be executed by a fault drilling device based on chaotic engineering, which may be implemented by software and/or hardware, and the device may be configured in a server or a terminal device. As shown in fig. 1, the method specifically includes the following steps:
and S110, acquiring a chaotic engineering test plan matched with the target system.
The chaotic engineering test plan comprises the following steps: a fault response impact factor corresponding to a target fault to be injected and an expected fault response result.
The chaotic engineering can be a complex technical means for improving the elastic capability of the technical architecture. The main implementation method is to introduce faults of each layer to the whole system randomness under a certain service background flow, and observe the performance of the system through a perfect monitoring means, thereby discovering hidden dangers of the system, pertinently solving related problems, and simultaneously establishing the capability and confidence of the system for resisting out-of-control conditions in the production environment. The chaotic engineering test plan may specifically refer to description information of a chaotic engineering acting on the target system, and is used to describe what kind of fault is injected into the target system, and what kind of response can be achieved by the target system for the injected fault is expected.
In this embodiment, the target system is a service execution system for applying the chaotic engineering, and the service execution system may include: and the system equipment is used for performing service processing functions such as data measurement, data calculation or data transmission. Each of the above system devices may include a plurality of terminal devices, and a server or a management node for performing unified management on the plurality of terminal devices, may also include only a plurality of servers, and the like, which is not limited in this embodiment.
Optionally, the system device may be an actual physical machine, or may also be a virtual machine installed on the physical machine.
The target fault to be injected may be: various abnormal problems may occur in the operation process of the target system, for example, various types such as network disconnection of system equipment, equipment downtime, failure of load balancing service, service timeout, database connection timeout, unreadable storage space or memory disorder may occur. The target fault is typically associated with an actual operational scenario of the target system.
The fault response impact factor may be an impact degree of a target fault on the chaotic engineering, and may specifically include a fault action degree of the target fault and an expected traffic load degree. The expected fault response result may refer to what kind of response the target system is expected to achieve for the injected fault with respect to the injected fault. For example, the service is continuously and normally executed, the switching between the main server and the standby server is completed within a set time length, or the service processing is recovered after a set fault response time length, and the expected fault response result generally reflects the tolerance of the user to a certain fault of the target system.
And S120, injecting the target fault into the target system according to the fault response influence factor, and monitoring the actual fault response result of the target system aiming at the target fault in real time.
Wherein, the actual fault response result may refer to a response that the target system can achieve for the injected fault. For example, the service is continuously and normally executed, the switching between the primary and standby servers is completed within a certain time period, or the service processing can be recovered after the set fault response time period, and the actual fault response result refers to a response actually generated by the target system for the injected target fault.
After a certain fault A is injected into the target system, if the actual fault response result of the target system meets the expected fault response result, the target system can normally respond to the occurrence of the fault A; if the actual fault response result of the target system does not meet the expected fault response result, it indicates that the target system cannot normally cope with the occurrence of the fault a, and therefore, performance optimization or upgrading needs to be performed on the target system.
The condition that the actual fault response result of the target system meets the expected fault response result means that the actual fault response result falls within the control range of the expected fault response result, and the condition that the actual fault response result of the target system does not meet the expected fault response result means that the actual fault response result does not fall within the control range of the expected fault response result.
For example, if the corresponding expected failure response result is to recover the service processing within 5 minutes for the failure a injected by the target system, when the recovery service processing time in the actual failure response result is less than or equal to 5 minutes, it indicates that the actual failure response result falls within the control range of the expected failure response result, that is, the target system can normally respond to the occurrence of the failure a; and when the processing time of the recovery service in the actual fault response result is greater than 5 minutes, it indicates that the actual fault response result does not fall within the control range of the expected fault response result, that is, the target system cannot normally cope with the occurrence of the fault a.
Optionally, the fault response impact factor may include: degree of fault effect of the target fault; correspondingly, injecting the target fault into the target system according to the fault response impact factor may include:
forming a fault parameter pointing to at least one target system device in the target system according to the target fault and the fault action degree of the target fault; and issuing the fault parameters to each target system device so as to inject target faults matched with the fault action degree into each target system device.
The fault action degree may be used to describe a degree of influence of a fault to be injected into the target system on the target system or system equipment in the target system. Specifically, the fault effect degree may include the number of system devices injecting the fault, and/or the size of the fault injected in a single system device. The fault parameters may include a target fault and a magnitude of a fault effect level, etc. The target system device may be an actual physical machine, or may be a virtual machine installed on the physical machine.
In one specific example, if it is desired to inject a fault into the target system: if 5 system devices in the target system are powered off, the target fault can be set to be powered off, and the number of the system devices with injected faults is set to be 5. In another specific example, if it is desired to inject a fault into the target system: if the target server database request in the target system is overtime for 5 minutes, the target fault can be set as the database request overtime, and the size of the fault injected in the single system device is overtime for 5 minutes.
In this embodiment, a fault parameter directed to at least one target system device in the target system is first formed according to the target fault and the fault action degree of the target fault. Further, the fault parameters are issued to each target system device, so as to inject target faults matching the fault action degree into each target system device. Correspondingly, the actual fault response result of the target system aiming at the target fault is monitored in real time.
The advantages of such an arrangement are: and the fault parameters are issued to each target system device, and the target faults matched with the fault action degrees are injected, so that the actual fault response results of the target faults are monitored in real time. Therefore, the fault response results of the current target fault and the fault action degree of the target fault can be obtained in real time, and the obtained actual fault response results are more accurate and timely, so that the working efficiency and the automation degree of the chaotic engineering are improved.
S130, acquiring the difference type between the expected fault response result and the actual fault response result: if it is determined that the difference between the expected fault response result and the actual fault response result satisfies the dynamic adjustment condition, performing S140; if it is determined that the actual fault result matches the expected fault response result or the end fault drill condition is satisfied, S150 is performed.
In this embodiment, a technical solution for dynamically adjusting the fault action degree of a target fault by monitoring an actual fault response result of a target system for the target fault is creatively provided to solve the problems that the prior art is not concerned about the correlation between fault injection and system monitoring and the fault injection cannot be adjusted according to the affected degree of the system.
The above arrangement is introduced primarily with regard to two aspects, one being that since the target fault can be injected in the target system in a production environment, and therefore, if the actual fault response result after the target fault injection is far worse than the expected fault response result, it means that the normal service response of the user cannot be satisfied, which greatly affects the user's experience of using the target system and the service processing performance of the target system, and therefore, when the actual fault response result is monitored to determine that the current target system cannot normally deal with the target fault, the actual fault response result can be relieved by reducing the fault response influence factor, and further, the fault action degree of the target fault can be obtained, and the expected fault response result can be achieved, so that a new test scene and test data can be dynamically obtained.
On the other hand, if the actual fault response result after the target fault injection is better than the expected fault response result, for example, the service recovery time is shorter, it indicates that the target system can normally cope with the target fault while a certain margin is left, and further, the corresponding limit of the target system for the target fault can be found by improving the fault response impact factor, thereby achieving the technical effect of dynamically obtaining new test scenarios and test data.
The dynamic adjustment condition may be some condition that needs to be adjusted accordingly due to the requirement of the fault response result. For example, when the actual fault response result is determined to be much better than the expected fault response result, or when the actual fault response result is determined to be much worse than the expected fault response result, the difference between the expected fault response result and the actual fault response result is determined to satisfy the dynamic adjustment condition.
Wherein, the difference between the expected fault response result and the actual fault response result of the target system satisfies the dynamic adjustment condition, which specifically means: the difference value (absolute value difference) between the expected fault response result and the actual fault response result is greater than or equal to a preset difference threshold value.
Correspondingly, it can be understood that, when the difference between the expected fault response result and the actual fault response result of the target system does not satisfy the dynamic adjustment condition, specifically, the following conditions are used: the difference value between the expected fault response result and the actual fault response result is smaller than or equal to the preset difference threshold value (absolute value difference), that is, the expected fault response result is matched with the actual fault response result; or, through adjustment of a preset number of times or adjustment of a preset adjustment mode, the difference values between the expected fault response result and the actual fault response result are both greater than or equal to the preset difference threshold value, that is, the fault drilling termination condition is met.
In a specific example, the expected failure response result is set to be that the recovery service processing time is less than or equal to 5 minutes, the preset difference threshold value is 1 minute, and when the recovery service processing time in the actual failure response result is 2 minutes, it can be determined that the dynamic adjustment condition is satisfied between the expected failure response result and the actual failure response result of the target system; when the restoration traffic processing time in the actual fault response result is 4 minutes 55 seconds, it may be determined that the expected fault response result of the target system matches the actual fault response result.
And S140, adjusting the matching of the fault response influence factors, and returning to execute S120.
Optionally, when it is determined that the difference between the expected fault response result and the actual fault response result satisfies the dynamic adjustment condition, performing matching adjustment on the fault response impact factor may include: when the actual fault response result is determined to fall within the control range of the expected fault response result, improving the fault action degree of the target fault according to a preset action degree improvement proportion; and when the actual fault response result is determined not to fall into the control range of the expected fault response result, reducing the fault action degree of the target fault according to a preset action degree reduction proportion.
As described above, when it is determined that the actual fault response result is far better than the expected fault response result, it may be determined that the actual fault response result falls within the control range of the expected fault response result; when it is determined that the actual fault response result is far worse than the expected fault response result, it is determined that the actual fault response result does not fall within the control range of the expected fault response result.
Illustratively, if a target fault a injected into a target system is a downtime fault, the fault action degree is a downtime fault occurring in 3 of 10 system devices, a corresponding expected fault response result is to restore service processing within 5 minutes, and a preset difference threshold value is 1 minute, when the service restoration processing time in the actual fault response result is less than or equal to 4 minutes, it indicates that the actual fault response result falls within the control range of the expected fault response result, that is, the target system has a certain margin while being capable of normally responding to the occurrence of the target fault a, and thus the fault action degree of the target fault can be correspondingly improved; when the processing time of the recovered service in the actual fault response result is longer than 6 minutes, it indicates that the actual fault response result does not fall within the control range of the expected fault response result, that is, the target system cannot normally respond to the occurrence of the target fault a, and the whole target system is in an out-of-control state, so that the fault action degree of the target fault needs to be correspondingly reduced.
For example, when the target system can normally cope with the occurrence of the target failure a, that is, the processing time of the recovery service in the actual failure response result is less than or equal to 4 minutes, the failure effect degree of the target failure may be increased. Specifically, the failure action degree can be increased from 3 of the 10 system devices to 4 of the 10 system devices. When the processing time of the recovery service in the actual fault response result at this time is still less than or equal to 4 minutes, the fault action degree of the target fault can be further improved, that is, the fault action degree can be improved from 4 in 10 pieces of system equipment to 5 in 10 pieces of system equipment. Correspondingly, if the processing time of the recovered service is longer than 6 minutes, the fault action degree of the target fault can be further reduced, the fault action degree can be reduced from the downtime of 3 of 10 system devices to the downtime of 2 of 10 system devices, and so on until the actual fault result is matched with the expected fault response result or the fault drilling termination condition is met.
Specifically, the target fault can be injected into the target system in the production environment, and therefore, if the actual fault response result after the target fault injection is far worse than the expected fault response result, it is indicated that the normal service response of the user cannot be met, which may greatly affect the user experience of the target system and the service processing performance of the target system, and therefore, when it is determined that the current target system cannot normally respond to the target fault by monitoring the actual fault response result, the actual fault response result may be relieved by reducing the fault action degree of the target fault, and further, the expected fault response result may be reached by obtaining what fault action degree of the target fault, so as to dynamically obtain a new test scenario and test data.
Further, if the actual fault response result after the target fault injection is better than the expected fault response result. For example, if the service recovery time is shorter, it indicates that the target system may normally cope with the target failure and may leave a certain margin. Furthermore, the response limit of the target system for the target fault can be searched by improving the fault action degree of the target fault, and further, the technical effect of dynamically obtaining a new test scene and test data is achieved.
In this embodiment, the fault response impact factors are adjusted in a matching manner by determining whether the difference between the expected fault response result and the actual fault response result satisfies the dynamic adjustment condition, until the actual fault result matches the expected fault response result or the end fault drilling condition is satisfied (for example, the adjustment of the fault response impact factors for a set number of times has been completed).
The advantages of such an arrangement are: the response of the target system to the target fault can be calculated more accurately and reasonably, so that the influence of fault injection on the system beyond a controllable range can be avoided, more random faults are triggered in a full-automatic mode, the test efficiency and the test range are further improved, various scenes are effectively covered, and the robustness of the chaotic engineering system is improved.
And S150, recording the mapping relation between the current fault response influence factor and the current actual fault response result.
The mapping relationship may refer to a corresponding relationship between the current fault response impact factor and the matched current actual fault response result.
Illustratively, if the target failure a is injected for the target system, the corresponding actual failure response results in a recovery of the traffic processing within 5 minutes. A bar can be built up accordingly such as: the mapping relation of target failure A and recovery of service processing within 5 minutes.
Optionally, the monitoring of the target system, the injection of the target fault, and the injection of the simulated service traffic are all implemented by probes arranged on system devices of the target system.
The probe can be a process running on the system device or a plug-in, etc. In particular, the probes can be distinguished according to the functions they perform: the monitoring probe, the fault probe and the flow probe are used for respectively realizing the operations of monitoring a target system, injecting a target fault, injecting simulated service flow and the like.
The advantages of such an arrangement are: the monitoring of the target system, the injection of the target fault and the injection of the simulation service flow are realized through the probes arranged on each system device of the target system. Therefore, monitoring of the target system, injection of target faults and injection of simulation service flow are more embodied and programmed, more random faults are triggered in a full-automatic mode, testing efficiency and testing range are further improved, various scenes are effectively covered, and robustness of the chaotic engineering system is improved.
According to the technical scheme provided by the embodiment of the invention, a chaotic engineering test plan matched with a target system is obtained; injecting the target fault into the target system according to the fault response influence factor, and monitoring the actual fault response result of the target system aiming at the target fault in real time; when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, performing matched adjustment on the fault response influence factor; and returning to execute the operation of injecting the target fault into the target system according to the fault response influence factor until the actual fault result is matched with the expected fault response result or the fault drilling ending condition is met, and recording the mapping relation between the current fault response influence factor and the current actual fault response result. The chaotic engineering system can avoid the influence of fault injection on the system beyond a controllable range, can further improve the automation degree of the chaotic engineering, triggers more random faults in a full-automatic mode, further improves the testing efficiency and the testing range, effectively covers various scenes, and improves the robustness of the chaotic engineering system.
Fig. 2 is a flowchart of another fault drilling method based on chaotic engineering according to an embodiment of the present invention. In this embodiment, the fault response impact factor further includes: the expected traffic load level.
Correspondingly, the method specifically comprises the following steps:
and S210, acquiring a chaotic engineering test plan matched with the target system.
The chaotic engineering test plan comprises the following steps: a fault response impact factor corresponding to a target fault to be injected and an expected fault response result. The fault response impact factor may include a fault action degree of the target fault, an expected traffic load degree, and the like.
And S220, injecting the target fault into the target system according to the fault action degree of the target fault and the expected traffic load degree.
In this embodiment, injecting the target fault into the target system according to the fault response impact factor may specifically include: and injecting the target fault into the target system according to the fault action degree of the target fault and the expected traffic load degree.
The target fault is injected into the target system according to the fault action degree of the target fault and the expected traffic load degree, and specifically may include:
forming a fault parameter pointing to at least one target system device in the target system according to the target fault and the fault action degree of the target fault;
issuing the fault parameters to each target system device so as to inject target faults matched with fault action degrees into each target system device;
if the target system is determined to be in the production environment currently, acquiring the actual service flow load degree in the target system;
if the actual service flow load degree is determined to be smaller than the expected service flow load degree, calculating a flow value to be supplemented;
generating a simulated service flow matched with the flow value to be supplemented, and injecting the simulated service flow into the target system to control the target system to form an actual fault response result aiming at the target fault under the expected service flow load degree.
In this embodiment, the target system may be in a production environment or a test environment. The actual traffic load degree may be a load degree of actual traffic in the production environment of the current target system.
Illustratively, the actual traffic load level in the target system may be 80% or 100%. Specifically, when the actual traffic load degree is 100%, the actual traffic load degree in the target system is full. Accordingly, there is typically a minimum load (e.g., 20%) and a maximum load (i.e., full load) for the actual traffic load level in each target system.
The expected traffic load degree may refer to what load degree the target system is expected to reach for the traffic, for the traffic load degree.
Illustratively, the traffic load level is expected to be 80% for the target system. If the actual traffic load degree in the target system is 60%, the chaotic engineering test which is really required by the user cannot be met, and then the matched traffic value can be supplemented, so that the actual traffic load degree is 80%.
It should be noted again that, if it is determined that the actual service traffic load degree is smaller than the expected service traffic load degree, it indicates that the current actual generation environment can already meet (or exceed) the expected service traffic load degree required by the user, at this time, it is only necessary to use the current actual service traffic load degree without injecting any simulated service traffic into the target system, and correspondingly, when the fault response impact factor is subsequently adjusted in a matching manner, it is only necessary to adjust the fault action degree of the target fault in a matching manner.
In addition, if it is determined that the target system is currently in a production environment, a simulated traffic flow matched with the expected traffic flow load degree may be directly generated and injected into the target system to control the target system to form an actual fault response result for the target fault under the expected traffic flow load degree. And S230, monitoring the actual fault response result of the target system aiming at the target fault in real time.
S240, acquiring the difference type between the expected fault response result and the actual fault response result: if it is determined that the difference between the expected fault response result and the actual fault response result satisfies the dynamic adjustment condition, performing S250; if it is determined that the actual fault result matches the expected fault response result or the end fault drill condition is satisfied, S260 is performed.
And S250, adjusting the matching of the fault response influence factors, and returning to execute S220.
Optionally, when it is determined that the difference between the expected fault response result and the actual fault response result satisfies the dynamic adjustment condition, performing matched adjustment on the fault response impact factor, where the adjusted adjustment includes one of:
the matching adjustment is carried out according to the sequence of firstly adjusting the fault action degree and then adjusting the fault action degree.
Illustratively, if the corresponding expected fault response result is to recover the service processing within 5 minutes for the target system injected target fault a, and for the target system, the actual service traffic load degree in the production environment is 60%, the expected service traffic load degree is 80%, and by injecting the corresponding simulated service traffic 60M/s into the target system, the current service traffic load degree in the target system reaches 80%. Correspondingly, the target fault A is injected into the target system, and the corresponding actual fault response result can be measured to be the recovery service processing within 3 minutes. For the above-described case, there can be four cases for which matching is adjusted.
In the first aspect, only the degree of the malfunction effect can be adjusted in a matching manner. Since the target fault a is injected into the target system, it can be measured that the corresponding actual fault response result is that the service processing is recovered within 3 minutes. But the expected fault response result is that the service processing is recovered within 5 minutes, so that the fault action degree is increased, and the actual fault response result of the target system aiming at the target fault is monitored in real time until the actual fault result is matched with the expected fault response result or the condition of finishing the fault drilling is met.
In a second aspect, only the matching adjustment may be made to the expected traffic load level. When the target fault A is injected into a target system, the expected traffic load degree is 80% and does not reach the full load by adding the simulated traffic matched with the to-be-supplemented traffic value, so that the expected traffic load degree can be correspondingly improved. For example, 90%, a new supplementary flow value can be calculated, and a new simulated service flow matched with the to-be-supplemented flow value is generated correspondingly and injected into the target system. And monitoring the actual fault response result of the target system aiming at the target fault in real time until the actual fault result is matched with the expected fault response result or the condition of finishing fault drilling is met.
And in the third aspect, the matching adjustment is carried out according to the sequence of firstly adjusting the fault action degree and then adjusting the expected service flow load degree. Assuming that the target fault a is injected into the target system, it can be measured that the corresponding actual fault response result is that the service processing is recovered within 3 minutes. But the expected failure response results in restoration of traffic processing within 5 minutes, requiring an increased degree of failure.
In one case, when the fault action degree reaches the preset maximum fault action degree, the actual fault response result of the target system still falls within the control range of the expected fault response result, for example, 6 of 10 system devices are down, and it is determined that the actual fault response result is the service processing is recovered within 4 minutes, at this time, the fault processing limit of the target system can be detected by further deteriorating the injection environment of the target fault by correspondingly increasing the expected traffic load degree;
in another case, when the actual fault response result of the target system is adjusted from the condition that the actual fault response result falls within the control range of the expected fault response result to the condition that the actual fault response result does not fall within the control range of the expected fault response result by increasing the fault action degree, for example, when 4 of 10 system devices are crashed, the actual fault response result is determined to be the condition that the business process is recovered within 4 minutes, and when the target fault is adjusted to 5 of 10 system devices are crashed by increasing the fault action degree, the business process is recovered within 6 minutes, at this time, the expected traffic load degree can be decreased on the basis that 5 of 10 system devices are crashed, or the expected traffic load degree can be increased on the basis that 4 of 10 system devices are crashed, until the actual fault result matches the expected fault response result or the end fault drill condition is met.
And in the fourth aspect, the matching adjustment is carried out according to the sequence of firstly adjusting the load degree of the expected service flow and then adjusting the fault action degree. Assuming that the target fault a is injected into the target system, it can be measured that the corresponding actual fault response result is that the service processing is recovered within 3 minutes. But the expected failure response result is that the service processing is recovered within 5 minutes, and meanwhile, if it is determined that the load degree of the expected service flow is reached by adding the simulated service flow matched with the value of the flow to be supplemented currently, for example, 80%, the full load is not reached, so that the load degree of the expected service flow can be correspondingly improved. For example, 90%, a new supplementary flow value can be calculated, and a new simulated service flow matched with the to-be-supplemented flow value is generated correspondingly and injected into the target system.
In one case, after the expected traffic load degree is adjusted to the maximum load of the target system, the actual fault response result of the target system still falls within the control range of the expected fault response result, for example, the current expected traffic load degree is adjusted to 100%, at this time, the fault action degree can be correspondingly increased to further deteriorate the injection environment of the target fault, so as to detect the fault handling limit of the target system;
in another case, when the actual fault response result of the target system is determined to fall within the control range of the expected fault response result by increasing the degree of the expected traffic load, to the extent that it does not fall within the control range of the expected fault response result, e.g., when the expected traffic load degree is 90%, determining that the actual fault response result is the recovery of the traffic processing within 4 minutes, when the degree of increasing the expected traffic load is 95%, it is determined that the actual fault response result is to resume the traffic process within 6 minutes, and at this time, the degree of malfunction can be reduced on the basis of the degree of expected traffic load of 95%, or when the expected traffic load degree is 90%, increasing the fault action degree until the actual fault result is matched with the expected fault response result or the fault drilling ending condition is met.
Optionally, when it is determined that the difference between the expected fault response result and the actual fault response result satisfies the dynamic adjustment condition, the adjusting of matching the expected traffic load degree may include:
when the actual fault response result is determined to fall into the control range of the expected fault response result, the load degree of the expected service flow is increased according to a preset load degree increasing proportion; and when the actual fault response result is determined not to fall into the control range of the expected fault response result, reducing the load degree of the expected service flow according to a preset load degree reduction proportion.
Illustratively, if the target fault a injected into the target system has an expected traffic load degree of 80%, the expected traffic load degree does not reach the full load, and the actual fault response result falls within the control range of the expected fault response result, that is, the target system can normally cope with the occurrence of the target fault a, so the expected traffic load degree can be correspondingly increased, for example, by 90%, a new supplementary traffic value can be calculated, and a new simulated traffic matched with the to-be-supplemented traffic value is correspondingly generated and injected into the target system.
Correspondingly, when the corresponding expected traffic load degree is 60%, the lowest load is not reached, and the actual fault response result does not fall within the control range of the expected fault response result, that is, the target system cannot normally cope with the occurrence of the target fault a, so that the expected traffic load degree can be correspondingly reduced.
In this embodiment, by determining whether the difference between the expected fault response result and the actual fault response result satisfies the dynamic adjustment condition, the expected traffic load degree and the fault action degree in the fault response impact factor, or the expected traffic load degree and the fault action degree are adjusted in a matching manner at the same time until the actual fault result matches the expected fault response result or the condition for ending the fault drilling is satisfied.
And S260, recording the mapping relation between the current fault response influence factor and the current actual fault response result.
The advantages of such an arrangement are: by adding the matching adjustment of the expected service flow load degree in the fault response influence factor, the tolerance degree of the target system to the target fault can be calculated more accurately and reasonably, so that the influence of fault injection on the system beyond a controllable range can be avoided, more random faults are triggered in a full-automatic mode, the test efficiency and the test range are further improved, various scenes are effectively covered, and the robustness of the chaotic engineering system is improved.
According to the technical scheme provided by the embodiment of the invention, a chaotic engineering test plan matched with a target system is obtained; injecting the target fault into the target system according to the expected service flow load degree in the fault response influence factor, and monitoring the actual fault response result of the target system aiming at the target fault in real time; when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, performing matched adjustment on the fault response influence factor; and returning to execute the operation of injecting the target fault into the target system according to the fault response influence factor until the actual fault result is matched with the expected fault response result or the fault drilling ending condition is met, and recording the mapping relation between the current fault response influence factor and the current actual fault response result. The method can avoid the influence of fault injection on the system beyond a controllable range, trigger more random faults in a full-automatic mode, further improve the testing efficiency and the testing range, effectively cover various scenes and improve the robustness of the chaotic engineering system.
Fig. 3 is a flowchart of another fault drilling method based on chaotic engineering according to an embodiment of the present invention. In this embodiment, when it is determined that the difference between the expected fault response result and the actual fault response result satisfies the dynamic adjustment condition, the matching adjustment is performed according to the sequence of adjusting the fault action degree first and then adjusting the expected traffic load degree.
Correspondingly, the method specifically comprises the following steps:
and S310, acquiring a chaotic engineering test plan matched with the target system.
The chaotic engineering test plan comprises the following steps: a fault response impact factor corresponding to a target fault to be injected and an expected fault response result. The fault response impact factor may include a fault action degree of the target fault, an expected traffic load degree, and the like.
S320, injecting the target fault into the target system according to the fault action degree of the target fault and the expected traffic load degree, and monitoring the actual fault response result of the target system aiming at the target fault in real time.
S330, acquiring the difference type between the expected fault response result and the actual fault response result: if it is determined that the difference between the expected fault response result and the actual fault response result satisfies the dynamic adjustment condition, performing S340; if it is determined that the actual fault result matches the expected fault response result or the end fault drill condition is satisfied, S350 is performed.
And S340, carrying out matching adjustment on the fault response influence factors according to the sequence of firstly adjusting the fault action degree and then adjusting the expected service flow load degree, and returning to execute the S320.
Illustratively, if the corresponding expected fault response result is to recover the service processing within 5 minutes for the target system injected target fault a, and for the target system, the actual service traffic load degree in the production environment is 60%, the expected service traffic load degree is 80%, and by injecting the corresponding simulated service traffic 60M/s into the target system, the current service traffic load degree in the target system reaches 80%. Correspondingly, the target fault A is injected into the target system, and the corresponding actual fault response result can be measured to be the recovery service processing within 3 minutes. For the above-described case, there can be four cases for which matching is adjusted.
Firstly, if the corresponding expected fault response result is to recover service processing within 5 minutes for a target fault a injected by a target system, when the recovery service processing time in the actual fault response result is less than or equal to 5 minutes, it indicates that the actual fault response result falls within the control range of the expected fault response result, that is, the target system can normally cope with the occurrence of the target fault a; and when the processing time of the recovery service in the actual fault response result is greater than 5 minutes, it indicates that the actual fault response result does not fall within the control range of the expected fault response result, that is, the target system cannot normally cope with the occurrence of the fault a.
Specifically, the target fault a is injected into the target system, and it can be measured that the corresponding actual fault response result is that service processing is recovered within 3 minutes, that is, when the processing time of the recovered service in the actual fault response result is less than or equal to 5 minutes, the fault action degree of the target fault can be increased.
In one case, when the fault action degree reaches the preset maximum fault action degree, the actual fault response result of the target system still falls within the control range of the expected fault response result, for example, 6 of 10 system devices are down, and the actual fault response result is determined to recover the service processing within 4 minutes, at this time, the injection environment of the target fault can be further deteriorated by correspondingly increasing the expected traffic load degree, so as to detect the fault processing limit of the target system.
In another case, when the actual fault response result of the target system is adjusted from the condition that the actual fault response result falls within the control range of the expected fault response result to the condition that the actual fault response result does not fall within the control range of the expected fault response result by increasing the fault action degree, for example, when 4 of 10 system devices are crashed, the actual fault response result is determined to be the condition that the business process is recovered within 4 minutes, and when the target fault is adjusted to 5 of 10 system devices are crashed by increasing the fault action degree, the business process is recovered within 6 minutes, at this time, the expected traffic load degree can be decreased on the basis that 5 of 10 system devices are crashed, or the expected traffic load degree can be increased on the basis that 4 of 10 system devices are crashed, until the actual fault result matches the expected fault response result or the end fault drill condition is met.
Specifically, the expected traffic load level is adjusted. When the target fault A is injected into a target system, the expected traffic load degree is 80% and does not reach the full load by adding the simulated traffic matched with the to-be-supplemented traffic value, so that the expected traffic load degree can be correspondingly improved. For example, 90%, a new supplementary flow value can be calculated, and a new simulated service flow matched with the to-be-supplemented flow value is generated correspondingly and injected into the target system. And monitoring the actual fault response result of the target system aiming at the target fault in real time until the actual fault result is matched with the expected fault response result or the condition of finishing fault drilling is met.
And S350, recording the mapping relation between the current fault response influence factor and the current actual fault response result.
According to the technical scheme provided by the embodiment of the invention, a chaotic engineering test plan matched with a target system is obtained; injecting the target fault into the target system according to the expected service flow load degree in the fault response influence factor, and monitoring the actual fault response result of the target system aiming at the target fault in real time; when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, performing matching adjustment according to the sequence of firstly adjusting the fault action degree and then adjusting the expected service flow load degree; and returning to execute the operation of injecting the target fault into the target system according to the fault response influence factor until the actual fault result is matched with the expected fault response result or the fault drilling ending condition is met, and recording the mapping relation between the current fault response influence factor and the current actual fault response result. The method can avoid the influence of fault injection on the system beyond a controllable range, trigger more random faults in a full-automatic mode, further improve the testing efficiency and the testing range, effectively cover various scenes and improve the robustness of the chaotic engineering system.
Fig. 4 is a flowchart of another fault drilling method based on chaotic engineering according to an embodiment of the present invention. In this embodiment, when it is determined that the difference between the expected fault response result and the actual fault response result satisfies the dynamic adjustment condition, the matching adjustment is performed according to the sequence of adjusting the expected traffic load degree first and then adjusting the fault action degree.
Correspondingly, the method specifically comprises the following steps:
and S410, acquiring a chaotic engineering test plan matched with the target system.
The chaotic engineering test plan comprises the following steps: a fault response impact factor corresponding to a target fault to be injected and an expected fault response result. The fault response impact factor may include a fault action degree of the target fault, an expected traffic load degree, and the like.
And S420, injecting the target fault into the target system according to the fault action degree of the target fault and the expected traffic load degree, injecting the target fault into the target system, and monitoring the actual fault response result of the target system aiming at the target fault in real time.
S430, acquiring the difference type between the expected fault response result and the actual fault response result: if it is determined that the difference between the expected fault response result and the actual fault response result satisfies the dynamic adjustment condition, performing S440; if it is determined that the actual fault result matches the expected fault response result or the end fault drill condition is satisfied, S450 is performed.
And S440, carrying out matching adjustment on the fault response influence factors according to the sequence of firstly adjusting the load degree of the expected service flow and then adjusting the fault action degree, and returning to execute the S420.
Illustratively, if the corresponding expected fault response result is to recover the service processing within 5 minutes for the target system injected target fault a, and for the target system, the actual service traffic load degree in the production environment is 60%, the expected service traffic load degree is 80%, and by injecting the corresponding simulated service traffic 60M/s into the target system, the current service traffic load degree in the target system reaches 80%. Correspondingly, the target fault A is injected into the target system, and the corresponding actual fault response result can be measured to be the recovery service processing within 3 minutes. For the above-described case, there can be four cases for which matching is adjusted.
First, assuming that a target fault a is injected into a target system, it can be measured that a corresponding actual fault response result is a recovery service process within 3 minutes. But the expected failure response result is that the service processing is recovered within 5 minutes, and meanwhile, if it is determined that the load degree of the expected service flow is reached by adding the simulated service flow matched with the value of the flow to be supplemented currently, for example, 80%, the full load is not reached, so that the load degree of the expected service flow can be correspondingly improved. For example, 90%, a new supplementary flow value can be calculated, and a new simulated service flow matched with the to-be-supplemented flow value is generated correspondingly and injected into the target system.
In one case, after the expected traffic load degree is adjusted to the maximum load of the target system, the actual fault response result of the target system still falls within the control range of the expected fault response result, for example, the current expected traffic load degree is adjusted to 100%, at this time, the fault action degree can be correspondingly increased to further deteriorate the injection environment of the target fault, so as to detect the fault handling limit of the target system;
in another case, when the actual fault response result of the target system is determined to fall within the control range of the expected fault response result by increasing the degree of the expected traffic load, to the extent that it does not fall within the control range of the expected fault response result, e.g., when the expected traffic load degree is 90%, determining that the actual fault response result is the recovery of the traffic processing within 4 minutes, when the degree of increasing the expected traffic load is 95%, it is determined that the actual fault response result is to resume the traffic process within 6 minutes, and at this time, the degree of malfunction can be reduced on the basis of the degree of expected traffic load of 95%, or when the expected traffic load degree is 90%, increasing the fault action degree until the actual fault result is matched with the expected fault response result or the fault drilling ending condition is met.
S450, recording the mapping relation between the current fault response influence factor and the current actual fault response result.
According to the technical scheme provided by the embodiment of the invention, a chaotic engineering test plan matched with a target system is obtained; injecting the target fault into the target system according to the expected service flow load degree in the fault response influence factor, and monitoring the actual fault response result of the target system aiming at the target fault in real time; when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, carrying out matching adjustment according to the sequence of firstly adjusting the expected service flow load degree and then adjusting the fault action degree; and returning to execute the operation of injecting the target fault into the target system according to the fault response influence factor until the actual fault result is matched with the expected fault response result or the fault drilling ending condition is met, and recording the mapping relation between the current fault response influence factor and the current actual fault response result. The method can avoid the influence of fault injection on the system beyond a controllable range, trigger more random faults in a full-automatic mode, further improve the testing efficiency and the testing range, effectively cover various scenes and improve the robustness of the chaotic engineering system.
Fig. 5 is a schematic structural diagram of a chaos engineering-based fault drilling device according to another embodiment of the present invention, which may be implemented by software and/or hardware and may be configured in a server or a terminal device. As shown in fig. 5, the apparatus may specifically include: the system comprises a chaotic engineering test plan acquisition module 510, an actual fault response result monitoring module 520, a fault response impact factor adjustment module 530 and a mapping relation recording module 540.
The chaotic engineering test plan obtaining module 510 is configured to obtain a chaotic engineering test plan matched with a target system; an actual fault response result monitoring module 520, configured to inject the target fault into the target system according to the fault response impact factor, and monitor an actual fault response result of the target system for the target fault in real time; a fault response impact factor adjusting module 530, configured to perform matching adjustment on the fault response impact factor when it is determined that a difference between an expected fault response result and an actual fault response result satisfies a dynamic adjustment condition; and the mapping relation recording module 540 is configured to return to execute an operation of injecting the target fault into the target system according to the fault response impact factor until the actual fault result matches the expected fault response result or a fault drilling completion condition is met, and record a mapping relation between the current fault response impact factor and the current actual fault response result.
According to the technical scheme provided by the embodiment of the invention, a chaotic engineering test plan matched with a target system is obtained; injecting the target fault into the target system according to the fault response influence factor, and monitoring the actual fault response result of the target system aiming at the target fault in real time; when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, performing matched adjustment on the fault response influence factor; and returning to execute the operation of injecting the target fault into the target system according to the fault response influence factor until the actual fault result is matched with the expected fault response result or the fault drilling ending condition is met, and recording the mapping relation between the current fault response influence factor and the current actual fault response result. The chaotic engineering system can avoid the influence of fault injection on the system beyond a controllable range, can further improve the automation degree of the chaotic engineering, triggers more random faults in a full-automatic mode, further improves the testing efficiency and the testing range, effectively covers various scenes, and improves the robustness of the chaotic engineering system.
On the basis of the foregoing embodiments, the fault response impact factor includes: degree of fault effect of the target fault; the actual fault response result monitoring module 520 may be specifically configured to: forming a fault parameter pointing to at least one target system device in the target system according to the target fault and the fault action degree of the target fault; and issuing the fault parameters to each target system device so as to inject target faults matched with the fault action degree into each target system device.
On the basis of the foregoing embodiments, the fault response impact factor adjusting module 530 may be specifically configured to: when the actual fault response result is determined to fall within the control range of the expected fault response result, improving the fault action degree of the target fault according to a preset action degree improvement proportion; and when the actual fault response result is determined not to fall into the control range of the expected fault response result, reducing the fault action degree of the target fault according to a preset action degree reduction proportion.
On the basis of the foregoing embodiments, the fault response impact factor further includes: the expected traffic load degree; the actual fault response result monitoring module 520 may be specifically configured to: if the target system is determined to be in the production environment currently, acquiring the actual service flow load degree in the target system; if the actual service flow load degree is determined to be smaller than the expected service flow load degree, calculating a flow value to be supplemented; and generating a simulation service flow matched with the flow value to be supplemented, and injecting the simulation service flow into the target system.
On the basis of the foregoing embodiments, the fault response impact factor adjusting module 530 may be specifically configured to: when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, only performing matched adjustment on the fault action degree; when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, only carrying out matched adjustment on the expected service flow load degree; when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, performing matching adjustment according to the sequence of firstly adjusting the fault action degree and then adjusting the expected service flow load degree; and when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, performing matching adjustment according to the sequence of firstly adjusting the expected traffic load degree and then adjusting the fault action degree.
On the basis of the foregoing embodiments, when it is determined that the difference between the expected failure response result and the actual failure response result satisfies the dynamic adjustment condition, the adjustment for matching the expected traffic load degree may be specifically used to: when the actual fault response result is determined to fall into the control range of the expected fault response result, the load degree of the expected service flow is increased according to a preset load degree increasing proportion; and when the actual fault response result is determined not to fall into the control range of the expected fault response result, reducing the load degree of the expected service flow according to a preset load degree reduction proportion.
On the basis of the above embodiments, the monitoring of the target system, the injection of the target fault, and the injection of the simulation service traffic are all realized by probes arranged on system devices of the target system.
The fault drilling device based on the chaotic engineering can execute the fault drilling method based on the chaotic engineering provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Fig. 6 is a schematic structural diagram of a terminal device according to another embodiment of the present invention. As shown in fig. 6, the apparatus includes a processor 610, a memory 620, an input device 630, and an output device 640; the number of processors 610 in the device may be one or more, and one processor 610 is taken as an example in fig. 6; the processor 610, the memory 620, the input device 630 and the output device 640 in the apparatus may be connected by a bus or other means, and fig. 6 illustrates an example of a connection by a bus.
The memory 620, as a computer-readable storage medium, can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the chaos engineering-based fault drilling method in the embodiment of the present invention (for example, the chaos engineering test plan obtaining module 510, the actual fault response result monitoring module 520, the fault response impact factor adjusting module 530, and the mapping relation recording module 540). The processor 610 executes various functional applications and data processing of the device by running software programs, instructions and modules stored in the memory 620, so as to implement the chaos engineering-based fault drilling method described above, and the method includes: acquiring a chaotic engineering test plan matched with a target system; injecting the target fault into the target system according to the fault response influence factor, and monitoring the actual fault response result of the target system aiming at the target fault in real time; when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, performing matched adjustment on the fault response influence factor; and returning to execute the operation of injecting the target fault into the target system according to the fault response influence factor until the actual fault result is matched with the expected fault response result, and recording the mapping relation between the current fault response influence factor and the expected fault response result.
The memory 620 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 620 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 620 can further include memory located remotely from the processor 610, which can be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input means 630 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the device. The output device 640 may include a display device such as a display screen.
Embodiments of the present invention also provide a computer-readable storage medium containing computer-readable instructions, which when executed by a computer processor, perform a chaos engineering-based fault drilling method, the method including: acquiring a chaotic engineering test plan matched with a target system; injecting the target fault into the target system according to the fault response influence factor, and monitoring the actual fault response result of the target system aiming at the target fault in real time; when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, performing matched adjustment on the fault response influence factor; and returning to execute the operation of injecting the target fault into the target system according to the fault response influence factor until the actual fault result is matched with the expected fault response result, and recording the mapping relation between the current fault response influence factor and the expected fault response result.
Of course, the computer-readable storage medium provided by the embodiments of the present invention includes computer-readable instructions, which are not limited to the operations of the method described above, and can also perform related operations in the chaos engineering-based fault drilling method provided by any embodiments of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the fault drilling device based on the chaotic engineering, each unit and each module included in the embodiment are only divided according to functional logic, but are not limited to the above division, as long as the corresponding function can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A fault drilling method based on chaotic engineering is characterized by comprising the following steps:
acquiring a chaotic engineering test plan matched with a target system, wherein the chaotic engineering test plan comprises the following steps: a fault response impact factor and an expected fault response result corresponding to a target fault to be injected;
injecting the target fault into the target system according to the fault response influence factor, and monitoring the actual fault response result of the target system aiming at the target fault in real time;
when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, performing matched adjustment on the fault response influence factor;
and returning to execute the operation of injecting the target fault into the target system according to the fault response influence factor until the actual fault result is matched with the expected fault response result or the fault drilling ending condition is met, and recording the mapping relation between the current fault response influence factor and the current actual fault response result.
2. The method of claim 1, wherein the fault response impact factor comprises: degree of fault effect of the target fault;
according to the fault response influence factor, injecting the target fault into the target system, including:
forming a fault parameter pointing to at least one target system device in the target system according to the target fault and the fault action degree of the target fault;
and issuing the fault parameters to each target system device so as to inject target faults matched with the fault action degree into each target system device.
3. The method of claim 2, wherein the matching adjustment of the fault response impact factor upon determining that a difference between an expected fault response result and an actual fault response result satisfies a dynamic adjustment condition comprises:
when the actual fault response result is determined to fall within the control range of the expected fault response result, improving the fault action degree of the target fault according to a preset action degree improvement proportion;
and when the actual fault response result is determined not to fall into the control range of the expected fault response result, reducing the fault action degree of the target fault according to a preset action degree reduction proportion.
4. The method of claim 2, wherein the fault response impact factor further comprises: the expected traffic load degree;
according to the fault response impact factor, injecting the target fault into the target system, further comprising:
if the target system is determined to be in the production environment currently, acquiring the actual service flow load degree in the target system;
if the actual service flow load degree is determined to be smaller than the expected service flow load degree, calculating a flow value to be supplemented;
and generating a simulation service flow matched with the flow value to be supplemented, and injecting the simulation service flow into the target system.
5. The method of claim 4, wherein the matching adjustment of the fault response impact factor upon determining that a difference between an expected fault response result and an actual fault response result satisfies a dynamic adjustment condition comprises one of:
when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, only performing matched adjustment on the fault action degree;
when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, only carrying out matched adjustment on the expected service flow load degree;
when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, performing matching adjustment according to the sequence of firstly adjusting the fault action degree and then adjusting the expected service flow load degree; and
and when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjustment condition, performing matching adjustment according to the sequence of firstly adjusting the expected traffic load degree and then adjusting the fault action degree.
6. The method of claim 5, wherein the adjusting the matching of the expected traffic load degree when it is determined that the difference between the expected fault response result and the actual fault response result satisfies the dynamic adjustment condition comprises:
when the actual fault response result is determined to fall into the control range of the expected fault response result, the load degree of the expected service flow is increased according to a preset load degree increasing proportion;
and when the actual fault response result is determined not to fall into the control range of the expected fault response result, reducing the load degree of the expected service flow according to a preset load degree reduction proportion.
7. The method according to any one of claims 1 to 6, wherein the monitoring of the target system, the injection of the target fault and the injection of the simulated traffic are all performed by probes provided on system devices of the target system.
8. A chaos engineering based fault drilling device is characterized by comprising:
the chaotic engineering test plan acquisition module is used for acquiring a chaotic engineering test plan matched with a target system, and the chaotic engineering test plan comprises the following components: a fault response impact factor and an expected fault response result corresponding to a target fault to be injected;
the actual fault response result monitoring module is used for injecting the target fault into the target system according to the fault response influence factor and monitoring the actual fault response result of the target system aiming at the target fault in real time;
the fault response influence factor adjusting module is used for carrying out matching adjustment on the fault response influence factor when the difference between the expected fault response result and the actual fault response result is determined to meet the dynamic adjusting condition;
and the mapping relation recording module is used for returning and executing the operation of injecting the target fault into the target system according to the fault response influence factor until the actual fault result is matched with the expected fault response result or the fault drilling ending condition is met, and recording the mapping relation between the current fault response influence factor and the current actual fault response result.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the chaotic engineering based fault drilling method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the chaotic engineering-based fault drilling method according to any one of claims 1 to 7.
CN202111437555.9A 2021-11-29 2021-11-29 Fault drilling method, device, terminal equipment and medium based on chaotic engineering Pending CN114113984A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111437555.9A CN114113984A (en) 2021-11-29 2021-11-29 Fault drilling method, device, terminal equipment and medium based on chaotic engineering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111437555.9A CN114113984A (en) 2021-11-29 2021-11-29 Fault drilling method, device, terminal equipment and medium based on chaotic engineering

Publications (1)

Publication Number Publication Date
CN114113984A true CN114113984A (en) 2022-03-01

Family

ID=80367758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111437555.9A Pending CN114113984A (en) 2021-11-29 2021-11-29 Fault drilling method, device, terminal equipment and medium based on chaotic engineering

Country Status (1)

Country Link
CN (1) CN114113984A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114609995A (en) * 2022-03-04 2022-06-10 亚信科技(南京)有限公司 Fault control method, device, system, equipment, medium and product
CN115033415A (en) * 2022-06-21 2022-09-09 北京同创永益科技发展有限公司 Chaotic engineering fault evaluation method based on FMEA

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108663202A (en) * 2018-05-03 2018-10-16 国家电网公司 GIS mechanical failure diagnostic methods based on chaos cuckoo algorithm and system
US20190205233A1 (en) * 2017-12-28 2019-07-04 Hyundai Motor Company Fault injection testing apparatus and method
CN110308969A (en) * 2019-06-26 2019-10-08 深圳前海微众银行股份有限公司 Failure drilling method, device, equipment and computer storage medium
US10795793B1 (en) * 2018-11-19 2020-10-06 Intuit Inc. Method and system for simulating system failures using domain-specific language constructs
CN112540887A (en) * 2020-12-16 2021-03-23 北京奇艺世纪科技有限公司 Fault drilling method and device, electronic equipment and storage medium
CN112631846A (en) * 2020-12-25 2021-04-09 广州品唯软件有限公司 Fault drilling method and device, computer equipment and storage medium
CN112988494A (en) * 2021-03-15 2021-06-18 北京字跳网络技术有限公司 Abnormity testing method and device and electronic device
CN113010393A (en) * 2021-02-25 2021-06-22 北京四达时代软件技术股份有限公司 Fault drilling method and device based on chaotic engineering

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190205233A1 (en) * 2017-12-28 2019-07-04 Hyundai Motor Company Fault injection testing apparatus and method
CN108663202A (en) * 2018-05-03 2018-10-16 国家电网公司 GIS mechanical failure diagnostic methods based on chaos cuckoo algorithm and system
US10795793B1 (en) * 2018-11-19 2020-10-06 Intuit Inc. Method and system for simulating system failures using domain-specific language constructs
CN110308969A (en) * 2019-06-26 2019-10-08 深圳前海微众银行股份有限公司 Failure drilling method, device, equipment and computer storage medium
CN112540887A (en) * 2020-12-16 2021-03-23 北京奇艺世纪科技有限公司 Fault drilling method and device, electronic equipment and storage medium
CN112631846A (en) * 2020-12-25 2021-04-09 广州品唯软件有限公司 Fault drilling method and device, computer equipment and storage medium
CN113010393A (en) * 2021-02-25 2021-06-22 北京四达时代软件技术股份有限公司 Fault drilling method and device based on chaotic engineering
CN112988494A (en) * 2021-03-15 2021-06-18 北京字跳网络技术有限公司 Abnormity testing method and device and electronic device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卫津逸: "基于混沌方法的信息系统韧性能力测试技术研究", 《社会科学Ⅰ辑》, no. 3, 15 March 2021 (2021-03-15) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114609995A (en) * 2022-03-04 2022-06-10 亚信科技(南京)有限公司 Fault control method, device, system, equipment, medium and product
CN114609995B (en) * 2022-03-04 2024-07-23 亚信科技(南京)有限公司 Fault control method, device, system, equipment, medium and product
CN115033415A (en) * 2022-06-21 2022-09-09 北京同创永益科技发展有限公司 Chaotic engineering fault evaluation method based on FMEA
CN115033415B (en) * 2022-06-21 2023-04-18 北京同创永益科技发展有限公司 Chaotic engineering fault evaluation method based on FMEA

Similar Documents

Publication Publication Date Title
Kiss et al. A clustering-based approach to detect cyber attacks in process control systems
Beccuti et al. Quantification of dependencies between electrical and information infrastructures
CN114113984A (en) Fault drilling method, device, terminal equipment and medium based on chaotic engineering
CN106685676B (en) Node switching method and device
CN107451040A (en) Localization method, device and the computer-readable recording medium of failure cause
CN113010392B (en) Big data platform testing method, device, equipment, storage medium and system
CN111239569A (en) Arc fault detection method, device, equipment and storage medium
CN110716875A (en) Concurrency test method based on feedback mechanism in domestic office environment
CN114168429A (en) Error reporting analysis method and device, computer equipment and storage medium
CN114609995B (en) Fault control method, device, system, equipment, medium and product
US20140325277A1 (en) Information processing technique for managing computer system
CN111159029A (en) Automatic testing method and device, electronic equipment and computer readable storage medium
CN116489046A (en) Reliability test method, device, equipment, medium and system of shunt equipment
Soualhia et al. Automated traces-based anomaly detection and root cause analysis in cloud platforms
CN111338609A (en) Information acquisition method and device, storage medium and terminal
CN114726622B (en) Back door attack influence evaluation method for power system data driving algorithm, system thereof and computer storage medium
CN104462664A (en) Real-time monitoring method for distributed simulation system
CN115391110A (en) Test method of storage device, terminal device and computer readable storage medium
AU2014200806B1 (en) Adaptive fault diagnosis
CN108616527A (en) One kind is towards SQL injection bug excavation method and device
CN113032260A (en) Fault injection simulation test method and system based on componentized distributed system
CN113986618A (en) Cluster brain split automatic repairing method, system, device and storage medium
CN117519052B (en) Fault analysis method and system based on electronic gas production and manufacturing system
Xu et al. Fault diagnosis for the virtualized network in the cloud environment using reinforcement learning
Zou et al. Online prediction of server crash based on running data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination