CN112540887A - Fault drilling method and device, electronic equipment and storage medium - Google Patents

Fault drilling method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112540887A
CN112540887A CN202011488116.6A CN202011488116A CN112540887A CN 112540887 A CN112540887 A CN 112540887A CN 202011488116 A CN202011488116 A CN 202011488116A CN 112540887 A CN112540887 A CN 112540887A
Authority
CN
China
Prior art keywords
fault
drilling
target object
service
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011488116.6A
Other languages
Chinese (zh)
Inventor
耿瑞
刘小如
钮麟
方天宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202011488116.6A priority Critical patent/CN112540887A/en
Publication of CN112540887A publication Critical patent/CN112540887A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention provides a fault drilling method, a fault drilling device, electronic equipment and a storage medium, wherein the method comprises the following steps: under the condition of triggering a fault drilling instruction, acquiring a preset fault drilling scheme aiming at the service; analyzing a target object identifier and fault information included in the fault drilling scheme, and determining a fault injection engine corresponding to the fault information; injecting a fault corresponding to the fault information for a target object corresponding to the target object identification by using a fault injection engine, wherein the target object is any object included in the service; after the fault corresponding to the fault information is injected, index data of the target object is acquired, and a fault drilling result for the service is determined based on the index data. Therefore, the fault injection engine injects the target object in the fault injection service corresponding to the fault information in the fault drilling scheme to realize the automation of the fault drilling, so that the time consumed by the fault drilling can be effectively reduced, and the fault drilling efficiency is improved.

Description

Fault drilling method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for fault drilling, an electronic device, and a storage medium.
Background
Nowadays, with the development of computer technology, many traditional industries are gradually transforming to the internet + and micro services, distributed services and the like are rapidly popularized. People enjoy the benefits of the internet + and also see the negative aspects, especially in a complex distributed service, if there is a machine failure, it may cause the entire service to fail. In order to reduce the probability and the influence range of the fault, fault drilling is generally required to be performed on micro services, distributed services and the like so as to find out the defects of the micro services, the distributed services and the like in advance.
In the related art, most of testers design fault drilling schemes according to their own experience, and perform fault drilling on micro services, distributed services and the like manually based on the fault drilling schemes, so that the fault drilling consumes more time and has lower efficiency.
Disclosure of Invention
Embodiments of the present invention provide a fault drilling method, device, electronic device, and storage medium, so as to achieve the beneficial effects of reducing time consumed by fault drilling and providing fault drilling efficiency.
In a first aspect of the embodiments of the present invention, a method for fault drilling is provided, where the method includes:
under the condition of triggering a fault drilling instruction, acquiring a preset fault drilling scheme aiming at a service, wherein the fault drilling scheme comprises at least one piece of fault information;
analyzing the target object identification and the fault information included in the fault drilling scheme, and determining a fault injection engine corresponding to the fault information;
injecting a fault corresponding to the fault information for a target object corresponding to the target object identification by using the fault injection engine, wherein the target object is any object included in the service;
after the fault corresponding to the fault information is injected, index data of the target object is obtained, and a fault drilling result for the service is determined based on the index data.
In an optional embodiment, before acquiring the preset fault drilling scheme for the service in the case of triggering the fault drilling instruction, the method further includes:
acquiring a service level identifier corresponding to a service input by a user and a target object identifier corresponding to a target object included in the service;
determining a fault information set corresponding to a service level identifier according to a mapping relation between the preset service level identifier and a preset fault information set;
extracting at least one piece of fault information from the fault information set based on a preset fault information extraction rule;
generating a fault drilling scheme comprising the target object identification and at least one piece of fault information, and storing the fault drilling scheme into a fault drilling scheme library;
the acquiring of the preset fault drilling scheme for the service comprises the following steps: and acquiring the fault drilling scheme from the fault drilling scheme library.
In an optional embodiment, in the case that the fault drilling instruction is triggered, acquiring a preset fault drilling scheme for the service includes:
triggering a fault drilling instruction based on a preset instruction triggering mode, wherein the instruction triggering mode at least comprises one of the following modes: a random triggering mode, a periodic triggering mode and a user triggering mode;
and under the condition of triggering the fault drilling instruction, acquiring a preset fault drilling scheme aiming at the service.
In an optional embodiment, the determining a fault injection engine corresponding to the fault information includes:
determining a fault classification corresponding to the fault information;
and determining a fault injection engine corresponding to the fault classification according to a preset mapping relation between the fault classification and the fault injection engine.
In an optional embodiment, the injecting, by the fault injection engine, the fault corresponding to the fault information for the target object corresponding to the target object identification includes:
determining a fault generation instruction corresponding to the fault information in the fault injection engine according to a corresponding relation between preset fault information and the fault generation instruction in the fault injection engine;
and sending the fault generation instruction to a target object corresponding to the target object identification by using the fault injection engine so that the target object executes the fault generation instruction to generate a fault corresponding to the fault information.
In an optional embodiment, the method further comprises:
after determining that the target object generates a fault according to the index data of the target object, determining a fault recovery instruction corresponding to the fault information in the fault injection engine according to a preset corresponding relationship between the fault information and the fault recovery instruction in the fault injection engine;
and sending the fault recovery instruction to the target object by utilizing the fault injection engine so that the target object executes the fault recovery instruction to recover to be normal.
In an optional embodiment, the method further comprises:
acquiring operation data generated by the service in a fault drilling process, wherein the operation data at least comprises log data;
and generating a fault drilling report based on the operation data, and evaluating the effect of the fault drilling result.
In a second aspect of the embodiments of the present invention, there is also provided a fault drilling apparatus, including:
the system comprises a scheme acquisition module, a fault drilling module and a fault drilling module, wherein the scheme acquisition module is used for acquiring a preset fault drilling scheme aiming at service under the condition of triggering a fault drilling instruction, and the fault drilling scheme comprises at least one piece of fault information;
the scheme analysis module is used for analyzing the target object identifier and the fault information included in the fault drilling scheme;
the engine determining module is used for determining a fault injection engine corresponding to the fault information;
a fault injection module, configured to inject, by using the fault injection engine, a fault corresponding to the fault information for a target object corresponding to the target object identifier, where the target object is any object included in the service;
the data acquisition module is used for acquiring index data of the target object after the fault corresponding to the fault information is injected;
a result determination module to determine a fault drilling result for the service based on the metric data.
In a third aspect of the embodiments of the present invention, there is further provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor configured to implement the fault drilling method according to any one of the first aspect described above when executing a program stored in a memory.
In a fourth aspect of the embodiments of the present invention, there is also provided a storage medium having instructions stored therein, which when run on a computer, cause the computer to perform the fault drilling method according to any one of the first aspects.
In a fifth aspect of the embodiments of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the fault drilling method described in any one of the above first aspects.
According to the technical scheme provided by the embodiment of the invention, under the condition of triggering the fault drilling instruction, a preset fault drilling scheme for the service is obtained, the target object identification and the fault information included in the fault drilling scheme are analyzed, the fault injection engine corresponding to the fault information is determined, the fault corresponding to the fault information is injected for the target object corresponding to the target object identification by using the fault injection engine, after the fault corresponding to the fault information is injected, index data of the target object is obtained, and the fault drilling result for the service is determined based on the index data. Therefore, the fault injection engine injects the target object in the fault injection service corresponding to the fault information in the fault drilling scheme to realize the automation of the fault drilling, so that the time consumed by the fault drilling can be effectively reduced, and the fault drilling efficiency is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic diagram of a fault drilling system according to an embodiment of the present invention;
fig. 2 is a schematic flow chart illustrating an implementation of a fault drilling method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of fault injection shown in an embodiment of the present invention;
fig. 4 is a schematic flow chart illustrating another fault drilling method implemented in the embodiment of the present invention;
fig. 5 is a schematic structural diagram of a fault drilling device shown in the embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device shown in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
As shown in fig. 1, a schematic structural diagram of a fault drilling system according to an embodiment of the present invention includes an effect evaluation system 11, an intelligent scenario arrangement system 12, an intelligent monitoring and warning system 13, a fault drilling scenario library 14, a scheduling execution engine 15, a historical fault drilling scenario library 16, a fault injection engine 17, and a service 18. The intelligent scheme arranging system 12 includes fault drilling scheme arranging strategies such as manual arranging and intelligent arranging, the fault injection engine 17 includes fault injection engines such as chaos engineering tools chaos blade, an automation platform StackStorm and a user-defined fault injection tool, and the service 18 includes objects such as a server, a container, a database, a middleware and network equipment. The detailed implementation of the respective functions of the effect evaluation system 11, the intelligent scheme scheduling system 12, the intelligent monitoring and warning system 13, the fault drilling scheme library 14, the scheduling execution engine 15, the historical fault drilling scheme library 16, the fault injection engine 17, the service 18, and the like are described in detail below.
As shown in fig. 2, an implementation flow diagram of a fault drilling method provided in an embodiment of the present invention is shown, where the method specifically includes the following steps:
s201, under the condition that a fault drilling instruction is triggered, a preset fault drilling scheme aiming at a service is obtained, wherein the fault drilling scheme comprises at least one piece of fault information.
In the embodiment of the invention, a user can arrange and design the whole process of performing fault drilling aiming at services (such as website safety monitoring service, information safety monitoring service, payment service, video playing service and the like) in advance in a self-service manner to generate a corresponding fault drilling scheme, so that the fault drilling scheme aiming at the services can be obtained under the condition of triggering a fault drilling instruction.
The fault drilling scheme may be embodied in a workflow form, and the fault drilling scheme includes at least one piece of fault information. For example, based on manual orchestration (i.e., a fault drilling scheme orchestration strategy) in the intelligent scheme orchestration system 12 as shown in fig. 1, a user self-orchestrates in advance the entire process of designing a fault drilling for service a: the method comprises the steps of firstly, surge of a CPU in the server, secondly, network delay in the server, and thirdly, network packet loss in the server … …, so that a fault drilling scheme aiming at the service A in a workflow form can be generated, and the surge of the CPU in the server, the network delay in the server, the network packet loss in the server and the like can be regarded as fault information in the fault drilling scheme, namely specific fault drilling content.
The fault drilling scheme can be specifically stored in the fault drilling scheme library in the embodiment of the invention, and the fault drilling scheme for the service can be acquired from the fault drilling scheme library under the condition of triggering the fault drilling instruction. For example, as shown in fig. 1, the fault drilling plan library 14 stores a fault drilling plan for the service a in the fault drilling plan library, when the user clicks to start fault drilling, the fault drilling instruction is triggered, the service identifier a input by the user is acquired, and the fault drilling plan for the service a corresponding to the service identifier a is determined from the fault drilling plan library according to the correspondence between the preset service identifier (for example, the service identifier may be a service ID, and the service name may be the service name) and the fault drilling plan, so as to acquire the fault drilling plan for the service a from the fault drilling plan library.
In addition, in the embodiment of the present invention, an instruction triggering manner may be preset, where the instruction triggering manner includes at least one of the following: the fault drilling method comprises a random triggering mode, a periodic triggering mode and a user triggering mode, so that a fault drilling instruction can be triggered based on a preset instruction triggering mode, and a preset fault drilling scheme aiming at service is obtained under the condition that the fault drilling instruction is triggered, so that fault injection is closer to reality.
For the random triggering mode, the fault drilling instruction can be triggered randomly, for example, the fault drilling instruction is triggered at 12 points on the first day, and the fault drilling instruction … … is triggered at 10 points on the second day, so that the fault drilling can be performed randomly. For the periodic triggering mode, the fault command may be triggered periodically, for example, a fault drilling command may be triggered at an interval of 1 hour, so that fault drilling may be performed periodically. For the user triggering mode, the user may actively trigger the fault drilling instruction, for example, the user clicks a certain button to trigger the fault drilling instruction, so that the user determines the time for performing fault drilling.
S202, analyzing the target object identification and the fault information included in the fault drilling scheme, and determining a fault injection engine corresponding to the fault information.
For the fault drilling scheme, the fault drilling scheme further comprises a target object identifier besides at least one piece of fault information, and the embodiment of the invention can analyze the target object identifier and at least one piece of fault information in the fault drilling scheme and determine a fault injection engine corresponding to at least one piece of fault information.
For example, for the fault drilling scheme, as shown in table 1 below, 3 pieces of fault information are included, respectively: the method comprises the following steps of surge of a CPU in the server, network delay in the server, network packet loss in the server and the like, and further comprises a target object identifier: server, the embodiment of the invention can analyze 3 pieces of fault information such as server in the fault drilling scheme, CPU surge in the server, network delay in the server, network packet loss in the server and the like.
Figure BDA0002839931280000071
TABLE 1
For different fault information, corresponding fault injection engines exist, the fault injection engine corresponding to the fault information may be determined in the embodiment of the present invention, where the fault injection engine may be a chaos engineering tool, chaos blade, or may also be an automation platform, StackStorm, which is not limited in the embodiment of the present invention.
For example, as shown in table 2 below, for the chaos engineering tool chaos blade corresponding to the fault information of the fault injection engines such as chaos engineering tool chaos blade and automation platform StackStorm included in the fault injection engine 17 shown in fig. 1, the embodiment of the present invention may determine the corresponding fault injection engine for the chaos engineering tool chaos blade corresponding to the fault information of CPU surge in the server, network delay in the server, network packet loss in the server, and the like: chaos engineering tool, chaos blade.
Fault information Fault injection engine
CPU surge in server Chaos engineering tool chaos blade
Network latency in a server Chaos engineering tool chaos blade
Network packet loss in a server Chaos engineering tool chaos blade
Downtime of main database of MySQL database Automatic platform StackStorm
From-store downtime Automatic platform StackStorm
TABLE 2
Specifically, in the embodiment of the present invention, the scheduling execution engine may be used to analyze the target object identifier and the at least one piece of fault information in the fault drilling scheme, and determine the fault injection engine corresponding to the at least one piece of fault information by using the scheduling execution engine, which is not limited in this embodiment of the present invention.
S203, injecting a fault corresponding to the fault information for a target object corresponding to the target object identification by using the fault injection engine, wherein the target object is any object included in the service.
For a target object corresponding to the target object identifier, the target object may be any object included in the service, for example, the target object may be a server included in the service 18 shown in fig. 1, may be a container included in the service 18 shown in fig. 1, may be a database included in the service 18 shown in fig. 1, may be middleware included in the service 18 shown in fig. 1, and may be a network device included in the service 18 shown in fig. 1, which is not limited in this embodiment of the present invention. Therefore, the fault injection engine is used for injecting the fault corresponding to the fault information aiming at the target object, and the fault corresponding to the fault information acts on the target object so that the target object generates the corresponding fault.
Specifically, according to a preset corresponding relation between fault information and a fault generation instruction in a fault injection engine, a fault generation instruction corresponding to the fault information in the fault injection engine is determined, and the fault injection engine is used for sending the fault generation instruction to a target object corresponding to a target object identifier, so that the target object executes the fault generation instruction and generates a fault corresponding to the fault information.
For example, as shown in fig. 3, in the first step, a chaos engineering tool, chaos blade, is used to inject a CPU surge fault corresponding to the CPU surge into a server, so that the server generates the CPU surge fault; secondly, injecting a network delay fault corresponding to network delay aiming at the server by using a chaos engineering tool, namely, ChaosBlade, so that the server generates the network delay fault; and thirdly, injecting a network packet loss fault corresponding to the network packet loss aiming at the server by using a chaos engineering tool, namely, ChaosBlade, so that the server generates the network packet loss fault. The fault drilling scheme is a workflow form, and the injection operation (i.e. the first step, the second step and the third step) is continuously executed, and the fault result does not need to be generated in the process.
Specifically, in the first step, a fault generation instruction corresponding to the surge of the fault information CPU in the chaos engineering tool chaos blade is determined, and the chaos engineering tool chaos blade is used to send the fault generation instruction to the target object server corresponding to the target object identifier, so that the server executes the fault generation instruction corresponding to the surge of the fault information CPU to generate a CPU surge fault.
And secondly, determining a fault generation instruction corresponding to the network delay of the fault information in the chaos engineering tool, and sending the fault generation instruction to a target object server corresponding to the target object identifier by using the chaos engineering tool, namely, chaos blade, so that the server executes the fault generation instruction corresponding to the network delay of the fault information to generate the network delay fault.
And thirdly, determining a fault generation instruction corresponding to the network packet loss of the fault information in the chaos engineering tool, namely, ChaosBlade, and sending the fault generation instruction to a target object server corresponding to the target object identifier by using the chaos engineering tool, so that the server executes the fault generation instruction corresponding to the network packet loss of the fault information to generate a network packet loss fault.
S204, after the fault corresponding to the fault information is injected, index data of the target object is obtained, and a fault drilling result for the service is determined based on the index data.
After the fault corresponding to the fault information is injected into the target object corresponding to the target object identifier by using the fault injection engine, the fault injection effect needs to be monitored, and index data of the target object and the running state of the service can be acquired and displayed in real time.
The index data of the target object is associated with the failure information, and the failure drilling result for the service can be determined based on the index data, and in addition, the failure drilling result for the service can be determined in accordance with the operating state of the service. In addition, the fault drilling plans can also be stored in a historical fault drilling library 16 as shown in fig. 1 for checking that those fault drilling are all performed.
For example, index data of a server is associated with a surge of a fault information CPU, specifically, the usage rate of the CPU may be present, and the operating state of a service may be specifically normal or down, as shown in fig. 1, in the case that the usage rate of the CPU rises to 90% and the operating state of the service is normal, based on the index data of the server and the operating state of the service, it may be determined that a fault drilling success for the service is performed, which indicates that the CPU surge fault of the server has almost no influence on the service, and an alarm does not need to be triggered.
For example, for the index data of the server and the network packet loss of the fault information are associated, specifically, the data packet loss rate may be a data packet loss rate, and for the operation state of the service, specifically, the operation state may be normal or down, as shown in fig. 1, when the data packet loss rate rises to a certain threshold (for example, 4%), and the operation state of the service is down, based on the index data of the server and the operation state of the service, a failure in performing the fault on the service may be determined, an alarm may be triggered, and related personnel may be notified in time, which indicates that the network packet loss fault of the server has an influence on the service.
Through the above description of the technical solution provided by the embodiment of the present invention, in the case of triggering a fault drilling instruction, a preset fault drilling scheme for a service is obtained, a target object identifier and fault information included in the fault drilling scheme are analyzed, a fault injection engine corresponding to the fault information is determined, a fault corresponding to the fault information is injected for a target object corresponding to the target object identifier by using the fault injection engine, after the fault corresponding to the fault information is injected, index data of the target object is obtained, and a fault drilling result for the service is determined based on the index data. Therefore, the fault injection engine injects the target object in the fault injection service corresponding to the fault information in the fault drilling scheme to realize the automation of the fault drilling, so that the time consumed by the fault drilling can be effectively reduced, and the fault drilling efficiency is improved.
As shown in fig. 4, an implementation flow diagram of another fault drilling method provided in the embodiment of the present invention is shown, where the method specifically includes the following steps:
s401, under the condition that a fault drilling instruction is triggered, a preset fault drilling scheme aiming at a service is obtained, wherein the fault drilling scheme comprises at least one piece of fault information.
In the embodiment of the invention, in order to reduce the burden of arranging and designing the fault drilling scheme by a user and avoid part of fault omission tests, a service level identifier corresponding to a service input by the user and a target object identifier of a target object included in the service can be acquired, a fault information set corresponding to the service level identifier is determined according to the mapping relation between the preset service level identifier and a preset fault information set, at least one piece of fault information is extracted from the fault information set based on a preset fault information extraction rule, a fault drilling scheme comprising the target object identifier and at least one piece of fault information is generated and stored in a fault drilling scheme library, and the fault drilling scheme aiming at the service is acquired from the fault drilling scheme library under the condition of triggering a fault drilling instruction, so that the automatic arrangement of the fault drilling scheme is realized.
In the process of generating the fault drilling scheme including the target object identifier and the at least one piece of fault information, the at least one piece of fault information may be sorted based on a preset fault drilling order, and then the fault drilling scheme including the target object identifier and the at least one piece of fault information is generated.
For example, for a service, an S level, an a level, a B level, and a C level may be preset, where the S level is most important, and the rest levels are sequentially, and based on the intelligent orchestration (i.e., the fault drilling scheme orchestration policy) in the intelligent scheme orchestration system 12 shown in fig. 1, the service level identifier input by the user is obtained: s, and the identity of the server included in the service: server, according to the mapping relationship between the preset service level identifier and the preset fault information set, as shown in table 3 below, determines the fault information set 1 corresponding to the service level identifier S.
Service level identification Set of fault information
S Set of fault information 1
A Failure information set 2
B Failure information set 3
C Failure information set 4
TABLE 3
For the fault information set, at least one piece of fault information may be extracted from the fault information set based on a preset fault information extraction rule, specifically, at least one piece of fault information may be randomly extracted from the fault information set, which is not limited in the embodiment of the present invention. And for the extracted fault information, associating the fault information with a target object corresponding to the target object identification.
For example, for failure information set 1, and a target object identifies the corresponding target object: the server, based on the intelligent arrangement (i.e. the fault drilling scheme arrangement strategy) in the intelligent scheme arrangement system 12 shown in fig. 1, randomly extracts three pieces of fault information from the fault information set 1: the three pieces of fault information, such as CPU surge in the server, network delay in the server and network packet loss in the server, belong to server faults and are associated with a target object server.
For three pieces of fault information, such as surge of a CPU in a server, network delay in the server, network packet loss in the server, and the like, sequencing can be performed according to a preset fault drilling sequence: the method comprises the steps of firstly, surge of a CPU in a server, secondly, network delay in the server, and thirdly, network packet loss in the server, so that a fault drilling scheme including three pieces of fault information such as the surge of the CPU in the server, the network delay in the server, the network packet loss in the server, and the like, and a target object identifier server can be generated and stored in a fault drilling scheme library, such as the fault drilling scheme library 14 shown in fig. 1, so that under the condition that a fault drilling instruction is triggered, a fault drilling scheme for a service is acquired from the fault drilling scheme library, and thus, automatic drilling of the fault drilling scheme is realized, that is, intelligent drilling (i.e., a fault drilling scheme drilling strategy) in the intelligent scheme drilling system 12 shown in fig. 1.
S402, analyzing the target object identification and the fault information included in the fault drilling scheme.
In the embodiment of the present invention, this step is similar to the step S202, and the details of the embodiment of the present invention are not repeated herein.
And S403, determining a fault classification corresponding to the fault information, and determining a fault injection engine corresponding to the fault classification according to a preset mapping relation between the fault classification and the fault injection engine.
In the embodiment of the invention, the fault information is classified, and different fault information can belong to different fault classifications. For example, as shown in table 4 below, the embodiment of the present invention classifies fault information: the type a faults include server faults (CPU surge, network delay, network packet loss, disk I/O anomaly, etc.), container faults (e.g., Kubernetes of Google) faults (Node CPU load, etc.), cluster faults (e.g., downtime of a MySQL database main library, downtime of a slave library, etc.), and user-defined faults for the type C faults.
Figure BDA0002839931280000121
TABLE 4
For example, for three pieces of fault information, such as a CPU surge, a network delay, a network packet loss, and the like, belonging to a server fault, and a server fault belonging to a class a fault, fault classifications corresponding to the three pieces of fault information, such as the CPU surge, the network delay, the network packet loss, and the like, may be determined: the a-type fault may be determined according to a mapping relationship between a preset fault classification and a fault injection engine (such as fault injection engines of chaos engineering tools, chaos blade, automation platform StackStorm, and user-defined fault injection tools included in the fault injection engine 17 shown in fig. 1), as shown in table 5 below: chaos engineering tool, chaos blade.
Fault classification Fault injection engine
Class A fault Chaos engineering tool chaos blade
Class B fault Automatic platform StackStorm
Class C fault User-defined fault injection tool
TABLE 5
S404, injecting a fault corresponding to the fault information for a target object corresponding to the target object identification by using the fault injection engine, wherein the target object is any object included in the service.
In the embodiment of the present invention, a fault injection engine may be used to inject a fault corresponding to the fault information for a target object corresponding to a target object identifier, where the target object is any object included in a service.
Specifically, the fault generation instruction corresponding to the fault information in the fault injection engine may be determined according to a preset correspondence between the fault information and the fault generation instruction in the fault injection engine, and the fault injection engine is used to send the fault generation instruction to the target object corresponding to the target object identifier, so that the target object executes the fault generation instruction to generate the fault corresponding to the fault information.
For example, as shown in fig. 1, the scheduling execution engine 15 determines a fault generation instruction corresponding to a fault information CPU surge in the chaos engineering tool chaos blade, and sends the fault generation instruction to the target object server corresponding to the target object identifier by using the chaos engineering tool chaos blade, so that the server executes the fault generation instruction corresponding to the CPU surge, and generates a CPU surge fault.
S405, after the fault corresponding to the fault information is injected, index data of the target object is obtained, and a fault drilling result for the service is determined based on the index data.
In the embodiment of the present invention, this step is similar to the step S204, and the details of the embodiment of the present invention are not repeated herein.
In addition, after determining that the target object has a fault according to the index data of the target object, the service may be affected, so that the service is down, and the like. Therefore, the fault recovery instruction corresponding to the fault information in the fault injection engine can be determined according to the corresponding relation between the preset fault information and the fault recovery instruction in the fault injection engine, and the fault injection engine is used for sending the fault recovery instruction to the target object so that the target object executes the fault recovery instruction to recover to normal, and therefore the service can recover to normal.
For example, as shown in fig. 1, after determining that the server generates a CPU surge fault according to the CPU utilization of the server, the scheduling execution engine 15 may determine a fault recovery instruction corresponding to the fault information CPU surge in the chaos engineering tool chaos blade, and send the fault recovery instruction to the server by using the chaos engineering tool chaos blade, so that the server executes the fault recovery instruction to recover to normal, and thus the service may recover to normal.
After the fault drilling is completed, the whole fault drilling needs to be replicated, and the problems found in the fault drilling need to be repaired and optimized, so that the embodiment of the invention can obtain the operation data generated by the service in the fault drilling process, wherein the operation data at least comprises log data, and certainly can also comprise alarm data, monitoring data and the like, and a fault drilling report is generated based on the operation data, thereby facilitating the user to replicate the drilling and perform effect evaluation on the fault drilling result. In addition, AI and other services can be accessed, and preliminary optimization suggestions are provided for users to refer to defects, problems and the like found in the drilling.
For example, as shown in fig. 1, the effect evaluation system 11 obtains log data generated by the service during the fault drilling process, and checks the operation state of the service during the fault drilling process in the log data; if the service runs normally, the fault drilling is successful, the drilling effect is excellent, and the service is hardly affected by the surge fault of the CPU in the server; if the service is delayed, but the delay time is not more than 2 seconds, the fault drilling is successful, the drilling effect is better, and the service is influenced by the surge fault of the CPU in the server; if the service is delayed for more than 2 seconds, failure drilling is failed, the drilling effect is poor, the service is influenced by the surge fault of the CPU in the server greatly, the defect existing in the service can be determined to be related to the CPU in the server, and then related optimization can be carried out on the CPU in the server.
Corresponding to the above method embodiment, an embodiment of the present invention further provides a fault drilling apparatus, as shown in fig. 5, the apparatus may include: the system comprises a scheme acquisition module 510, a scheme analysis module 520, an engine determination module 530, a fault injection module 540, a data acquisition module 550 and a result determination module 560.
A scheme obtaining module 510, configured to obtain a preset fault drilling scheme for a service under a condition that a fault drilling instruction is triggered, where the fault drilling scheme includes at least one piece of fault information;
a scheme analyzing module 520, configured to analyze the target object identifier and the fault information included in the fault drilling scheme;
an engine determining module 530, configured to determine a fault injection engine corresponding to the fault information;
a fault injection module 540, configured to inject, by using the fault injection engine, a fault corresponding to the fault information for a target object corresponding to the target object identifier, where the target object is any object included in the service;
a data obtaining module 550, configured to obtain index data of the target object after injecting the fault corresponding to the fault information;
a result determination module 560 for determining a fault drilling result for the service based on the metric data.
An embodiment of the present invention further provides an electronic device, as shown in fig. 6, including a processor 61, a communication interface 62, a memory 63, and a communication bus 64, where the processor 61, the communication interface 62, and the memory 63 complete mutual communication through the communication bus 64,
a memory 63 for storing a computer program;
the processor 61 is configured to implement the following steps when executing the program stored in the memory 63:
under the condition of triggering a fault drilling instruction, acquiring a preset fault drilling scheme aiming at a service, wherein the fault drilling scheme comprises at least one piece of fault information; analyzing the target object identification and the fault information included in the fault drilling scheme, and determining a fault injection engine corresponding to the fault information; injecting a fault corresponding to the fault information for a target object corresponding to the target object identification by using the fault injection engine, wherein the target object is any object included in the service; after the fault corresponding to the fault information is injected, index data of the target object is obtained, and a fault drilling result for the service is determined based on the index data.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In yet another embodiment of the present invention, a storage medium is further provided, which stores instructions that, when executed on a computer, cause the computer to perform the fault drilling method described in any one of the above embodiments.
In a further embodiment provided by the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the fault drilling method as described in any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a storage medium or transmitted from one storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more available media integrated servers, data centers, and the like. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A method of fault drilling, the method comprising:
under the condition of triggering a fault drilling instruction, acquiring a preset fault drilling scheme aiming at a service, wherein the fault drilling scheme comprises at least one piece of fault information;
analyzing the target object identification and the fault information included in the fault drilling scheme, and determining a fault injection engine corresponding to the fault information;
injecting a fault corresponding to the fault information for a target object corresponding to the target object identification by using the fault injection engine, wherein the target object is any object included in the service;
after the fault corresponding to the fault information is injected, index data of the target object is obtained, and a fault drilling result for the service is determined based on the index data.
2. The method according to claim 1, wherein before acquiring the preset fault drilling scheme for the service in case of triggering the fault drilling instruction, the method further comprises:
acquiring a service level identifier corresponding to a service input by a user and a target object identifier corresponding to a target object included in the service;
determining a fault information set corresponding to a service level identifier according to a mapping relation between the preset service level identifier and a preset fault information set;
extracting at least one piece of fault information from the fault information set based on a preset fault information extraction rule;
generating a fault drilling scheme comprising the target object identification and at least one piece of fault information, and storing the fault drilling scheme into a fault drilling scheme library;
the acquiring of the preset fault drilling scheme for the service comprises the following steps: and acquiring the fault drilling scheme from the fault drilling scheme library.
3. The method according to claim 1, wherein the obtaining of the preset fault drilling scheme for the service in case of triggering the fault drilling instruction comprises:
triggering a fault drilling instruction based on a preset instruction triggering mode, wherein the instruction triggering mode at least comprises one of the following modes: a random triggering mode, a periodic triggering mode and a user triggering mode;
and under the condition of triggering the fault drilling instruction, acquiring a preset fault drilling scheme aiming at the service.
4. The method of claim 1, wherein determining a fault injection engine corresponding to the fault information comprises:
determining a fault classification corresponding to the fault information;
and determining a fault injection engine corresponding to the fault classification according to a preset mapping relation between the fault classification and the fault injection engine.
5. The method of claim 1, wherein said injecting, with the fault injection engine, a fault corresponding to the fault information for the target object identified by the corresponding target object comprises:
determining a fault generation instruction corresponding to the fault information in the fault injection engine according to a corresponding relation between preset fault information and the fault generation instruction in the fault injection engine;
and sending the fault generation instruction to a target object corresponding to the target object identification by using the fault injection engine so that the target object executes the fault generation instruction to generate a fault corresponding to the fault information.
6. The method of claim 5, further comprising:
after determining that the target object generates a fault according to the index data of the target object, determining a fault recovery instruction corresponding to the fault information in the fault injection engine according to a preset corresponding relationship between the fault information and the fault recovery instruction in the fault injection engine;
and sending the fault recovery instruction to the target object by utilizing the fault injection engine so that the target object executes the fault recovery instruction to recover to be normal.
7. The method according to any one of claims 1 to 6, further comprising:
acquiring operation data generated by the service in a fault drilling process, wherein the operation data at least comprises log data;
and generating a fault drilling report based on the operation data, and evaluating the effect of the fault drilling result.
8. A fault drilling apparatus, characterized in that the apparatus comprises:
the system comprises a scheme acquisition module, a fault drilling module and a fault drilling module, wherein the scheme acquisition module is used for acquiring a preset fault drilling scheme aiming at service under the condition of triggering a fault drilling instruction, and the fault drilling scheme comprises at least one piece of fault information;
the scheme analysis module is used for analyzing the target object identifier and the fault information included in the fault drilling scheme;
the engine determining module is used for determining a fault injection engine corresponding to the fault information;
a fault injection module, configured to inject, by using the fault injection engine, a fault corresponding to the fault information for a target object corresponding to the target object identifier, where the target object is any object included in the service;
the data acquisition module is used for acquiring index data of the target object after the fault corresponding to the fault information is injected;
a result determination module to determine a fault drilling result for the service based on the metric data.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any one of claims 1 to 7 when executing a program stored in the memory.
10. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202011488116.6A 2020-12-16 2020-12-16 Fault drilling method and device, electronic equipment and storage medium Pending CN112540887A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011488116.6A CN112540887A (en) 2020-12-16 2020-12-16 Fault drilling method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011488116.6A CN112540887A (en) 2020-12-16 2020-12-16 Fault drilling method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112540887A true CN112540887A (en) 2021-03-23

Family

ID=75018262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011488116.6A Pending CN112540887A (en) 2020-12-16 2020-12-16 Fault drilling method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112540887A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221096A (en) * 2021-06-04 2021-08-06 北银金融科技有限责任公司 Method and system for analyzing correlation of random events in chaotic engineering
CN113487186A (en) * 2021-07-07 2021-10-08 中国工商银行股份有限公司 Client fault drilling method, device, computer system and readable storage medium
CN113935178A (en) * 2021-10-21 2022-01-14 北京同创永益科技发展有限公司 Explosion radius control system and method for cloud-originated chaos engineering experiment
CN114113984A (en) * 2021-11-29 2022-03-01 平安壹账通云科技(深圳)有限公司 Fault drilling method, device, terminal equipment and medium based on chaotic engineering
CN114978923A (en) * 2022-04-21 2022-08-30 京东科技信息技术有限公司 Fault drilling method, device and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6477666B1 (en) * 1999-11-22 2002-11-05 International Business Machines Corporation Automatic fault injection into a JAVA virtual machine (JVM)
US20170024299A1 (en) * 2015-07-21 2017-01-26 International Business Machines Corporation Providing Fault Injection to Cloud-Provisioned Machines
CN110308969A (en) * 2019-06-26 2019-10-08 深圳前海微众银行股份有限公司 Failure drilling method, device, equipment and computer storage medium
CN111400182A (en) * 2020-03-16 2020-07-10 腾讯科技(深圳)有限公司 Fault injection method, device, server and computer readable storage medium
CN111651353A (en) * 2020-05-29 2020-09-11 北京百度网讯科技有限公司 Fault injection method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6477666B1 (en) * 1999-11-22 2002-11-05 International Business Machines Corporation Automatic fault injection into a JAVA virtual machine (JVM)
US20170024299A1 (en) * 2015-07-21 2017-01-26 International Business Machines Corporation Providing Fault Injection to Cloud-Provisioned Machines
CN110308969A (en) * 2019-06-26 2019-10-08 深圳前海微众银行股份有限公司 Failure drilling method, device, equipment and computer storage medium
CN111400182A (en) * 2020-03-16 2020-07-10 腾讯科技(深圳)有限公司 Fault injection method, device, server and computer readable storage medium
CN111651353A (en) * 2020-05-29 2020-09-11 北京百度网讯科技有限公司 Fault injection method and device, electronic equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221096A (en) * 2021-06-04 2021-08-06 北银金融科技有限责任公司 Method and system for analyzing correlation of random events in chaotic engineering
CN113487186A (en) * 2021-07-07 2021-10-08 中国工商银行股份有限公司 Client fault drilling method, device, computer system and readable storage medium
CN113935178A (en) * 2021-10-21 2022-01-14 北京同创永益科技发展有限公司 Explosion radius control system and method for cloud-originated chaos engineering experiment
CN114113984A (en) * 2021-11-29 2022-03-01 平安壹账通云科技(深圳)有限公司 Fault drilling method, device, terminal equipment and medium based on chaotic engineering
CN114978923A (en) * 2022-04-21 2022-08-30 京东科技信息技术有限公司 Fault drilling method, device and system

Similar Documents

Publication Publication Date Title
CN112540887A (en) Fault drilling method and device, electronic equipment and storage medium
CN109062809B (en) Online test case generation method and device and electronic equipment
Yuan et al. Automated known problem diagnosis with event traces
CN107329894B (en) Application program system testing method and device and electronic equipment
CN111309539A (en) Abnormity monitoring method and device and electronic equipment
CN108156141B (en) Real-time data identification method and device and electronic equipment
CN110716539B (en) Fault diagnosis and analysis method and device
WO2021008029A1 (en) Case execution method, apparatus and device, and computer readable storage medium
CN109284331B (en) Certificate making information acquisition method based on service data resources, terminal equipment and medium
CN112433948A (en) Simulation test system and method based on network data analysis
CN108304276B (en) Log processing method and device and electronic equipment
CN114924990A (en) Abnormal scene testing method and electronic equipment
CN115952081A (en) Software testing method, device, storage medium and equipment
CN111865673A (en) Automatic fault management method, device and system
CN113918438A (en) Method and device for detecting server abnormality, server and storage medium
CN114020432A (en) Task exception handling method and device and task exception handling system
CN112506802A (en) Test data management method and system
CN111767213A (en) Method and device for testing database check points, electronic equipment and storage medium
CN114090462B (en) Software repeated defect identification method and device, computer equipment and storage medium
CN113672497B (en) Method, device and equipment for generating non-buried point event and storage medium
CN110413516B (en) Method and device for identifying slow SQL codes and electronic equipment
CN114881112A (en) System anomaly detection method, device, equipment and medium
CN112395119B (en) Abnormal data processing method, device, server and storage medium
CN111835566A (en) System fault management method, device and system
CN113781068A (en) Online problem solving method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination