CN115438518B - Fault simulation application system based on chaos concept - Google Patents
Fault simulation application system based on chaos concept Download PDFInfo
- Publication number
- CN115438518B CN115438518B CN202211388032.4A CN202211388032A CN115438518B CN 115438518 B CN115438518 B CN 115438518B CN 202211388032 A CN202211388032 A CN 202211388032A CN 115438518 B CN115438518 B CN 115438518B
- Authority
- CN
- China
- Prior art keywords
- drilling
- event
- management module
- management
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/08—Computing arrangements based on specific mathematical models using chaos models or non-linear system models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/20—Administration of product repair or maintenance
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- General Engineering & Computer Science (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Evolutionary Computation (AREA)
- General Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Nonlinear Science (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Mathematics (AREA)
- Artificial Intelligence (AREA)
- Algebra (AREA)
- Geometry (AREA)
- Computer Hardware Design (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention belongs to the technical field of fault simulation, and particularly provides a fault simulation application system based on a chaos concept, which comprises an environment management module, an event management module, a drill management module, a system authority management module and an evaluation module; the environment management module is used for managing development, test and production environments of users; the event management module is used for managing system state events and fault injection events in the system operation process; the drilling management module is used for simulating production events by combining different fault types; the system management module is used for managing the drilling members and distributing the authority; and the examination evaluation module is used for evaluating and feeding back the solution process and scheme result of the simulation production event aiming at different drilling processes and modes. The problem of difficult simulation and reproduction of events is solved by rapidly injecting single or full link faults into an application system.
Description
Technical Field
The invention relates to the technical field of fault simulation, in particular to a fault simulation application system based on a chaos concept.
Background
The operation and maintenance mode of the scientific and technological system is a temporary handling mode based on the occurrence of events, has the characteristics of burstiness, unknown property and difficult handling and positioning, and is often used for handling the operation and maintenance events in practical application and use with long time consumption and large problem handling risk, which is mainly caused by the following reasons: a. the reason for the staff level is disposed. The experience and level of operation and maintenance personnel are different, the operation and maintenance event is often influenced by various factors such as experience or operation proficiency and the like when the operation and maintenance event is handled, and certain personnel risk exists in event handling; b. some emergency manuals lack effective validation. The system emergency manual is an operation guide when an on-line event of the system occurs, and is often not subjected to effective environmental verification before use, so that certain operation risk exists in the use stage of the system, and unknown influence is generated on the disposal of the operation and maintenance event; c. the simulation of the event is difficult to reproduce.
The method is characterized in that the on-line accidental events are difficult to reproduce, problems are analyzed, positioned and treated to generate obstacles, the events are not generated by a single factor, the problems are reproduced and positioned and treated to be more important under the condition that risk events occur due to multiple factors, and flexible event reproduction means are lacked at present.
Disclosure of Invention
The method is characterized in that the on-line accidental events are difficult to reproduce, problems are analyzed, positioned and treated to generate obstacles, the events are not generated by a single factor, the problems are reproduced and positioned and treated to be more important under the condition that risk events occur due to multiple factors, and flexible event reproduction means are lacked at present. The invention aims to solve the problem of passivity in the traditional operation and maintenance work, and provides a fault simulation application system based on a chaos concept for positioning and analyzing a problem and reproducing a fault.
The technical scheme of the invention provides a fault simulation application system based on a chaos concept, which comprises an environment management module, an event management module, a drill management module and an evaluation module;
the environment management module is used for collecting environment information of the server side through the probe program, acquiring state information of the server side in real time through a timing heartbeat detection mechanism and judging the health state of the server side, and the probe program is used for realizing environment deployment of the system through establishing connection with the server side;
the event management module has an atomic-level fault simulation capability, and manages the fault which can be simulated by the server to generate an event through the environment deployment of the server probe program based on the environment management module;
the drilling management module is used for performing multi-dimensional arrangement based on the event generated by the event management module, generating faults through a probe program in the environment management module, triggering drilling actions and receiving an operation screenshot in the drilling process;
and the examination and evaluation module is used for carrying out examination and evaluation on the result according to the drilling time length, result judgment, the operation steps and compliance operation and outputting an evaluation result based on the operation screenshot of the drilling process received by the drilling management module.
As a further limitation of the technical scheme of the invention, the environment management module comprises a machine management submodule, an application management submodule and a deployment unit management submodule;
the machine management submodule is used for managing the machine where the user application is located, and specifically editing the application system, the deployment unit and the remark information where the machine is located; disabling/enabling a machine probe program; performing multi-dimensional query management through the application name, the deployment unit, the ip address and the state;
the application management submodule is used for managing different applications of a user, and specifically comprises the steps of checking, modifying and deleting an application name, an application code, an application description and a remark;
the deployment unit management submodule is used for managing different deployment units of a certain application; and viewing, modifying and deleting the name, the code, the description and the remark of the deployment unit.
As a further limitation of the technical scheme of the invention, the event management module is also used for carrying out multi-dimensional event query through event types, event states, event names and event codes; and editing the event type, the event name, the event code and the event state.
As a further limitation of the technical scheme of the invention, the drilling management module comprises a drilling library submodule, an intra-group drilling submodule and a case library submodule;
the drilling library submodule is used for creating drilling, and is specifically used for selecting drilling content, selecting drilling personnel and selecting a drilling system to edit a drilling event after the service endpoint clicks to start drilling;
the in-group drilling sub-module is used for selecting drilling content, selecting drilling personnel in the group member and selecting a drilling system to edit drilling events after the service endpoint clicks to start drilling;
and the case base submodule is used for storing the events generated in the event management module, filing and editing the problem faults to generate a drilling case and form a case base.
As a further limitation of the technical solution of the present invention, the attribute information of the drill includes the serial numbers of all drills, the belonging group, the drill type, the drill duration, the creator, the creation time, the latest experiment state, the release state, the drill description, and the information of the drill operation.
As a further limitation of the technical scheme of the invention, the in-group drilling submodule comprises a random drilling unit, an emergency drilling unit and an assessment and evaluation unit;
the random drilling unit is used for randomly extracting based on all drilling cases in the case base, generating drilling tasks, triggering to implement drilling and receiving screenshots of the drilling operation process;
the emergency drilling unit is used for generating a drilling task aiming at a set specific scene, triggering to perform drilling and receiving a screenshot of a drilling operation process;
and the examination evaluation unit is used for performing examination evaluation on the screenshot in the drilling process and outputting an evaluation result.
As a further limitation of the technical solution of the present invention, the system further comprises a system management module, which is used for managing the drill members, controlling the authority, managing the group, and controlling the authority of the system menu.
As a further limitation of the technical scheme of the invention, the system management module comprises a user management submodule, an authority management submodule and a task group management submodule;
the user management submodule is used for displaying all system user information lists and supporting query and screening according to user names, real names, affiliated roles and states; editing, forbidding/enabling, resetting passwords, adding and deleting the user;
the authority management submodule is used for displaying a list of all user roles, displaying all menus capable of setting the authority and supporting the distribution of corresponding menu authority according to the role types; editing, adding and deleting the menu list;
the task group management submodule is used for displaying a task group information list which is created by all the systems and supporting the inquiry and screening according to the task group name and the task group state; view, edit, disable/enable tasks.
As a further limitation of the technical scheme of the invention, the assessment evaluation module is specifically used for judging and scoring the assessment result on the basis of the handling step of the simulation production event and the handling screenshot picture and on the basis of the set scoring rule.
As a further limitation of the technical scheme of the invention, the application process of the system comprises the following steps:
confirming that the drilling environment is in an available state through an environment management module at a server, and selecting an event needing drilling at this time to create a drilling scene through an event management module at the server, wherein the drilling event is created in a drilling management case base; and the server sends an instruction to the probe program through the http service, the probe program performs life cycle management of the fault according to the drilling event, and the drilling scene is created completely. And after the drilling personnel log in the drilling system through ssh, drilling implementation is carried out, the screenshots of the drilling process are uploaded to the drilling management module of the server, and after drilling is finished, the server-side assessment and evaluation module carries out assessment and evaluation according to the uploaded screenshots of the drilling process and the consumed time to output an evaluation result.
According to the technical scheme, the invention has the following advantages: the problem of difficult simulation and reproduction of events is solved by rapidly injecting single or full link faults into an application system. The method is mainly applied to a bank enterprise-level application system, and is used for simulating a fault scene and improving the usability level of the system.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic block diagram of a system of one embodiment of the present invention.
FIG. 2 is a schematic block diagram of the environment management module of the system of one embodiment of the present invention.
FIG. 3 is a schematic block diagram of a drill management module of the system of one embodiment of the present invention.
FIG. 4 is a schematic block diagram of a system management module of one embodiment of the present invention.
Detailed Description
The invention aims to solve the problem of the passive working method in the traditional operation and maintenance work, provide an application system for flexibly reproducing faults and effectively verifying events in the operation and maintenance work, and provide the application system for flexibly reproducing the faults and effectively verifying the events in the operation and maintenance work. In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a fault simulation application system based on a chaos concept, which includes an environment management module, an event management module, a drill management module, and an assessment and evaluation module;
the environment management module is used for collecting environment information of the server through the probe program, acquiring state information of the server in real time through a timing heartbeat detection mechanism and judging the health state of the server, and the probe program establishes contact with the server to realize environment deployment of the system;
the environment management module is used for managing system environment information related to the platform system and comprises a system name, a deployment unit, a deployment example and the like; the environment information of the target server, such as ip, server name, cpu, memory configuration and disk space, is collected through the probe program, the server state information is obtained in real time through a timing heartbeat detection mechanism and is used for judging the health state of the server, the probe program establishes a management relation with the server side, the environment management function of the system is achieved, and the function realization of the event management module and the fault drilling module is supported.
The event management module has an atomic-level fault simulation capability, and manages the fault which can be simulated by the server to generate an event through environment deployment of a probe program of the server based on the environment management module;
the event management module has the fault simulation capability of more than 200 atomic levels common to the system, and can realize the management and subsequent drilling use of the fault which can be simulated by the server through the environment deployment of the target server probe program based on the environment management module. The fault simulation capability is completed by a probe program, and the probe program is deployed in the drilling system and keeps alive with a server side by sending heartbeat through http. When the probe program is installed, the highest authority of the drilling system is obtained, the system kernel can be tampered, the hardware drive can be changed, faults of network equipment, a CPU, a memory and a disk can be realized, and a real scene can be simulated. For example, when the network card is down, the server sends an instruction to the probe program to instruct the fault event (including fault generation time, fault end time and fault type) of the network card, the probe program receives the instruction to isolate the network card drive of the drilling system, the network card of the drilling system is in an offline state and keeps isolation from other systems, and when the fault end time is reached, the probe program removes the network card drive isolation, so that the fault removing effect is achieved.
The drilling management module is used for performing multi-dimensional arrangement based on the event generated by the event management module, generating faults through a probe program in the environment management module, triggering drilling actions and receiving an operation screenshot in the drilling process;
and the examination evaluation module is used for carrying out examination evaluation on the result according to the exercise time length, result judgment, operation steps and compliance operation and outputting an evaluation result on the basis of the exercise process operation screenshot received by the exercise management module.
The system mainly realizes the simulation generation of environmental faults, assists the analysis and positioning of system problems, scheme determination, implementation and solution and assists the improvement of the capacity of operation and maintenance personnel. The drilling management module arranges the event scenes generated by the event management module based on the system information in the environment management module, implements simulation actions through drilling, and generates assessment results based on the assessment evaluation module.
As shown in fig. 2, the environment management module includes a machine management submodule, an application management submodule, and a deployment unit management submodule;
the machine management submodule is used for managing the machine where the user application is positioned, and specifically editing the application system, the deployment unit and the remark information where the machine is positioned; disabling/enabling the machine probe program; performing multi-dimensional query management through the application name, the deployment unit, the ip address and the state;
the application management submodule is used for managing different applications of a user, and specifically comprises the steps of checking, modifying and deleting an application name, an application code, an application description and a remark;
the deployment unit management submodule is used for managing different deployment units of a certain application; and viewing, modifying and deleting the name of the deployment unit, the code of the deployment unit, the description of the deployment unit and the remark.
The machine management sub-module comprises functions of machine query, machine addition, machine check, machine editing, machine starting and machine forbidding and is responsible for managing the machine where the user application is located; the application management sub-module comprises functions of application query, application addition, application viewing, application editing and application deletion and is responsible for managing different applications of a user; an application is often divided into different deployment units according to different services provided, such as an online deployment unit, a batch deployment unit, a database deployment unit, and the like, and the deployment unit management submodule is mainly responsible for managing different deployment units of an application.
The event management module is also used for carrying out multi-dimensional event query through event types, event states, event names and event codes; and editing the event type, the event name, the event code and the event state.
The application of the distributed technology at present makes the infrastructure more complex than the traditional industry, especially the financial industry relates to a large amount of fund transactions, and covers complex infrastructures such as multiple data centers, multiple activities, disaster recovery, containers, virtual machines and the like. The reproduction and processing of production events often require coordination of different professionals in different departments, which is time consuming and resource consuming. 218 fundamental failures can be realized, covering tens of specialties such as K8S, CPU, disk, network, DNS, shared storage, memory, IO, JVM, message queue, cache, database, etc. Specifically, each fault type also includes typical faults such as common disk read-write, disk filling, killing process, CPU full, network delay, network packet loss and the like. The system interacts with a probe program installed on a target machine, and by sending different instructions, a specified fault is injected to the target machine by one key, such as killing Cheng Guzhang, and a KILL command is sent to the target machine; network delay is the network delay caused by sending a TC command to a target machine; the JVM type fault is a java exception type which is self-defined by the injection target machine to reproduce memory exception and the like. The common faults can be injected by one key of the system without building a foundation or coordinating multiple departments to reproduce cooperatively. Events of complex or full link transaction scenarios can additionally be simulated by injecting custom exception faults, delayed execution, and orchestration of base fault capabilities in the program execution method. The time sequence abnormity in the bank transaction is a more classical business scene, when the financial transaction is sent and the transaction is not completely successful due to network timeout or other reasons, the original transaction needs to be corrected so as to prevent the situation that the user successfully transfers the account but the actual opposite side does not receive the account. The replication of such a scenario requires the ability to orchestrate multiple failures. The steps in the system are as follows, (1) injecting JVM class delay exception in initiating transfer transaction; (2) injecting JVM custom exception (3) into the conflict resolution, recovering the fault in the step (2) (4) injecting the transfer transaction into JVM custom exception (5) and recovering the fault in the step (2). By the method and various arrangements, faults of code layers can be carried out at a finer granularity to reproduce complex service scenes.
As shown in fig. 3, the drilling management module includes a drilling library sub-module, an in-group drilling sub-module, and a case library sub-module;
the drilling library submodule is used for creating drilling, and is specifically used for selecting drilling content, selecting drilling personnel and selecting a drilling system to edit a drilling event after the service endpoint clicks to start drilling;
the in-group drilling sub-module is used for selecting drilling content, selecting drilling personnel in the group member and selecting a drilling system to edit drilling events after the service endpoint clicks to start drilling;
and the case base submodule is used for storing the events generated in the event management module, filing and editing the problem faults to generate a drilling case and form a case base. The case base can be checked, edited and deleted, and the drill can be rapidly generated based on the case. The case library sub-module is generally used for summarizing after large-scale practice, filing and editing common frequently-occurring problem faults, and forming a case library. And generating an event in the event management module and then storing the event in a case library for subsequent viewing and rapid exercise. The rapid implementation of the drill only requires the selection of personnel and the drill system environment.
The drill library sub-module can create rapid drilling, create full link drilling, reset drilling in a group, reset random drilling, display the sequence number of all drilling, the group to which the drill belongs, the drilling type, the drilling duration, the creator, the creation time, the recent experiment state, the release state, the drilling description and the operation information. The query list can be screened and displayed according to different conditions, and operations such as checking, editing, deleting, releasing, starting to perform, performing results and the like can be performed on the drill; the in-group drilling sub-module is a function realized for special application and crowd simulation drilling, and mainly comprises three sub-function modules of random drilling, emergency drilling and assessment and evaluation; the case base submodule can quickly form a fault close to a real scene by combining basic faults. Including add, modify, and delete event case functions, fast rehearsal functions, and full link rehearsal functions.
As shown in fig. 4, the system management module includes a user management submodule, a right management submodule, and a task group management submodule;
the user management submodule is used for displaying all system user information lists and supporting query and screening according to user names, real names, affiliated roles and states;
the authority management submodule is used for displaying all user role lists, displaying all menus capable of setting authority and supporting the distribution of corresponding menu authority according to role types;
and the task group management submodule is used for displaying the created task group information lists of all the systems and supporting the query and screening according to the task group names and the task group states.
The system management module mainly manages the drilling members, the distribution authority and the like, and the user management submodule displays all system user information lists and supports inquiry and screening according to user names, real names, affiliated roles and states. The user can be added, edited, user role edited, enabled, disabled, deleted and password reset; the authority management submodule displays all user role lists, displays all menus capable of setting authority and supports the distribution of corresponding menu authority according to role types; and the task group management submodule displays the created task group information lists of all the systems and supports the inquiry and screening according to the task group names and the task group states. The task group can be newly added, edited, checked, forbidden and the like.
And the assessment evaluation module is specifically used for judging and scoring the assessment result and evaluating the related treatment cases on the basis of the treatment steps of the simulation production events and the treatment screenshot photos and on the basis of the related scoring rules.
And the assessment evaluation module is used for giving fair evaluation and feedback to the solution process and scheme result of the simulation production event aiming at different drilling processes and modes. The module is based on the treatment steps of the simulation production events and the treatment screenshot photos, judges and scores the assessment results based on the relevant scoring rules, and can evaluate the relevant treatment cases and correlate the relevant knowledge bases. The assessment and evaluation team can enter the relevant associated knowledge base, generalize and integrate common production events generated in the operation and maintenance process, and associate the common production events into the relevant knowledge base. After the drilling is finished, the disposal personnel can click the relevant knowledge base to check the standard emergency disposal method and the case for further study and review, so that a closed loop process of simulating a production event, emergency disposal drilling, drilling result evaluation and review is achieved, and a forward incentive effect on daily operation and maintenance work is achieved.
The drill management module also comprises a compliance management submodule used for judging the compliance of the operation of the drill management module. The environment management module further comprises a compliance management submodule for performing compliance management on the deployment of the application. The module mainly provides compliance test questions to check whether the flow of the production problems processed by the user is in compliance or not. The system inputs compliance test questions in advance from the aspects of supervision requirements, systems, event handling and the like, and a user needs to answer the questions and then process specific events before processing the simulation events.
The drilling implementation process comprises three parts of drilling scene creation, drilling task implementation and assessment:
the server side confirms that the drilling environment is in an available state through the environment management module (the probe program sends heartbeat packets regularly, and if the heartbeat packets are received within 5 minutes, the heartbeat packets can be judged to be available), and then the event management module of the server side selects events needing drilling (random drilling does not need to be selected, and random selection is performed), such as faults of CPU abnormity, network card abnormity and the like. The drilling events are created in a drilling-managed case base, the case names, duration, fault times, fault systems, ip addresses and specific fault types need to be edited, and the fault types are generated by referring to the event management module of claim 3. After the drilling scene is created, the server sends an instruction to the probe program through the http service, and the probe program performs life cycle management of a fault according to a drilling event, for example, network packet loss of a drilling machine is performed at a specific time, and a network packet loss command is run on the drilling machine, so that the creation of the drilling scene is completed. The method comprises the steps that a drilling person logs in a drilling system through ssh to conduct drilling implementation, the main work is troubleshooting and repairing of faults, the troubleshooting depends on a third-party monitoring system, for example, machine restarting faults can be shown as machine monitoring information loss, the drilling person needs to log in a drilling system machine through ssh, a program relevant starting command is executed, and after technical verification is passed, screenshots of key steps in a drilling process are uploaded to a server side. And after the drilling is finished, the server side performs assessment according to the uploaded screenshots and the consumed time in the drilling process.
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions should be within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure and the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (9)
1. A fault simulation application system based on a chaos concept is characterized by comprising an environment management module, an event management module, a drill management module and an examination and evaluation module;
the environment management module is used for collecting environment information of the server side through the probe program, acquiring state information of the server side in real time through a timing heartbeat detection mechanism and judging the health state of the server side, and the probe program is used for realizing environment deployment of the system through establishing connection with the server side;
the event management module has an atomic-level fault simulation capability, and manages the fault which can be simulated by the server to generate an event through the environment deployment of the server probe program based on the environment management module;
the drilling management module is used for performing multi-dimensional arrangement based on the event generated by the event management module, generating faults through a probe program in the environment management module, triggering drilling actions and receiving an operation screenshot in the drilling process;
the assessment evaluation module is used for assessing and evaluating the result according to the drilling time length, result judgment, operation steps and compliance operation and outputting an evaluation result based on the operation screenshot of the drilling process received by the drilling management module;
confirming that the drilling environment is in an available state through an environment management module at a server, and selecting an event needing drilling at this time to create a drilling scene through an event management module at the server, wherein the drilling event is created in a drilling management case library; the server sends an instruction to the probe program through http service, the probe program performs fault life cycle management according to the drilling event, and the drilling scene is created; and after the drilling personnel log in the drilling system through ssh, drilling implementation is carried out, the screenshots of the drilling process are uploaded to the drilling management module of the server, and after drilling is finished, the server-side assessment and evaluation module carries out assessment and evaluation according to the uploaded screenshots of the drilling process and the consumed time to output an evaluation result.
2. The chaos philosophy-based fault simulation application system of claim 1, wherein the environment management module comprises a machine management submodule, an application management submodule, and a deployment unit management submodule;
the machine management submodule is used for managing the machine where the user application is located, and specifically editing the application system, the deployment unit and the remark information where the machine is located; disabling/enabling the machine probe program; performing multi-dimensional query management through the application name, the deployment unit, the ip address and the state;
the application management submodule is used for managing different applications of a user, and specifically comprises the steps of checking, modifying and deleting an application name, an application code, an application description and a remark;
the deployment unit management submodule is used for managing different deployment units of a certain application; and viewing, modifying and deleting the name, the code, the description and the remark of the deployment unit.
3. The chaos concept-based fault simulation application system of claim 2, wherein the event management module is further configured to perform a multi-dimensional event query by using an event type, an event state, an event name, and an event code; and editing the event type, the event name, the event code and the event state.
4. The chaos concept-based fault simulation application system of claim 3, wherein the drilling management module comprises a drilling library sub-module, an intra-group drilling sub-module, and a case library sub-module;
the drilling library submodule is used for creating drilling, and is specifically used for selecting drilling content, selecting drilling personnel and selecting a drilling system to edit a drilling event after the service endpoint clicks to start drilling;
the in-group drilling sub-module is used for selecting drilling content, selecting drilling personnel in the group member and selecting a drilling system to edit drilling events after the service endpoint clicks to start drilling;
and the case base submodule is used for storing the events generated in the event management module, filing and editing the problem faults to generate a drilling case and form a case base.
5. The chaotic concept-based failure simulation application system according to claim 4, wherein the attribute information of the drills includes serial numbers of all drills, belonging groups, drill types, drill durations, creators, creation times, recent experiment states, release states, drill descriptions, and information of drill operations.
6. The chaos concept-based fault simulation application system of claim 5, wherein the intra-group drilling submodule comprises a stochastic drilling unit, an emergency drilling unit and an assessment evaluation unit;
the random drilling unit is used for randomly extracting based on all drilling cases in the case base, generating drilling tasks, triggering to implement drilling and receiving screenshots of the drilling operation process;
the emergency drilling unit is used for generating a drilling task aiming at a set specific scene, triggering to perform drilling and receiving a screenshot of a drilling operation process;
and the examination evaluation unit is used for performing examination evaluation on the screenshot in the drilling process and outputting an evaluation result.
7. The chaos philosophy-based fault simulation application system of claim 6, further comprising a system management module for performing member management, right control, group management, and right control of system menus.
8. The chaos philosophy-based fault simulation application system of claim 7, wherein the system management module comprises a user management submodule, a permission management submodule and a task group management submodule;
the user management submodule is used for displaying all system user information lists and supporting query and screening according to user names, real names, affiliated roles and states; editing, forbidding/enabling, resetting passwords, adding and deleting the user;
the authority management submodule is used for displaying a list of all user roles, displaying all menus capable of setting the authority and supporting the distribution of corresponding menu authority according to the role types; editing, adding and deleting the menu list;
the task group management submodule is used for displaying a task group information list which is created by all the systems and supporting the inquiry and screening according to the task group name and the task group state; the tasks are viewed, edited, disabled/enabled.
9. The chaos concept-based fault simulation application system of claim 8, wherein the assessment evaluation module is specifically configured to determine and score the assessment result based on the handling steps and the handling screenshot photos of the simulated production events and the set scoring rules.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211388032.4A CN115438518B (en) | 2022-11-08 | 2022-11-08 | Fault simulation application system based on chaos concept |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211388032.4A CN115438518B (en) | 2022-11-08 | 2022-11-08 | Fault simulation application system based on chaos concept |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115438518A CN115438518A (en) | 2022-12-06 |
CN115438518B true CN115438518B (en) | 2023-04-07 |
Family
ID=84253170
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211388032.4A Active CN115438518B (en) | 2022-11-08 | 2022-11-08 | Fault simulation application system based on chaos concept |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115438518B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6018732A (en) * | 1998-12-22 | 2000-01-25 | Ac Properties B.V. | System, method and article of manufacture for a runtime program regression analysis tool for a simulation engine |
CN112464497A (en) * | 2020-12-16 | 2021-03-09 | 江苏满运物流信息有限公司 | Fault drilling method, device, equipment and medium based on distributed system |
CN113535532A (en) * | 2020-04-14 | 2021-10-22 | 中国移动通信集团浙江有限公司 | Fault injection system, method and device |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104143130B (en) * | 2014-07-28 | 2017-07-14 | 中国安全生产科学研究院 | Accident emergency drilling system and drilling method |
US10672289B2 (en) * | 2015-09-24 | 2020-06-02 | Circadence Corporation | System for dynamically provisioning cyber training environments |
CN111786823A (en) * | 2020-06-19 | 2020-10-16 | 中国工商银行股份有限公司 | Fault simulation method and device based on distributed service |
CN112714013B (en) * | 2020-12-22 | 2023-02-03 | 浪潮云信息技术股份公司 | Application fault positioning method in cloud environment |
CN113010393A (en) * | 2021-02-25 | 2021-06-22 | 北京四达时代软件技术股份有限公司 | Fault drilling method and device based on chaotic engineering |
CN113935178B (en) * | 2021-10-21 | 2022-09-16 | 北京同创永益科技发展有限公司 | Explosion radius control system and method for cloud-originated chaos engineering experiment |
CN114647489A (en) * | 2022-04-02 | 2022-06-21 | 阿里巴巴(中国)有限公司 | Drill method and system applied to chaotic engineering |
CN114791846B (en) * | 2022-05-23 | 2022-10-04 | 北京同创永益科技发展有限公司 | Method for realizing observability aiming at cloud-originated chaos engineering experiment |
-
2022
- 2022-11-08 CN CN202211388032.4A patent/CN115438518B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6018732A (en) * | 1998-12-22 | 2000-01-25 | Ac Properties B.V. | System, method and article of manufacture for a runtime program regression analysis tool for a simulation engine |
CN113535532A (en) * | 2020-04-14 | 2021-10-22 | 中国移动通信集团浙江有限公司 | Fault injection system, method and device |
CN112464497A (en) * | 2020-12-16 | 2021-03-09 | 江苏满运物流信息有限公司 | Fault drilling method, device, equipment and medium based on distributed system |
Non-Patent Citations (2)
Title |
---|
A Fully Decentralized Multi-Agent Fault Location and Isolation for Distribution Networks With DGs;Wenguo Li等;《IEEE Access》;20210209;全文 * |
分布式实时系统的软件故障注入;徐光侠等;《重庆大学学报》;20100215(第02期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN115438518A (en) | 2022-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210075821A1 (en) | Cyber Security Posture Validation Platform | |
Kumar et al. | Practical machine learning for cloud intrusion detection: Challenges and the way forward | |
CN113067728A (en) | Network security attack and defense test platform | |
CN112187585B (en) | Network protocol testing method and device | |
US8234633B2 (en) | Incident simulation support environment and business objects associated with the incident | |
US20100192220A1 (en) | Apparatuses, methods and systems for providing a virtual development and deployment environment including real and synthetic data | |
Hogganvik et al. | A graphical approach to risk identification, motivated by empirical investigations | |
WO2018216000A1 (en) | A system and method for on-premise cyber training | |
CN107168844B (en) | Performance monitoring method and device | |
Jiménez‐Ramírez et al. | Automated testing in robotic process automation projects | |
US20100223190A1 (en) | Methods and systems for operating a virtual network operations center | |
CN107423090B (en) | Flash player abnormal log management method and system | |
CN111597104A (en) | Multi-protocol adaptive interface regression testing method, system, equipment and medium | |
Bernardi et al. | Using process mining and model-driven engineering to enhance security of web information systems | |
CN115396182A (en) | Industrial control safety automatic arrangement and response method and system | |
Alghamdi | Effective penetration testing report writing | |
CN115438518B (en) | Fault simulation application system based on chaos concept | |
Alrimawi et al. | Incidents are meant for learning, not repeating: sharing knowledge about security incidents in cyber-physical systems | |
KR102254693B1 (en) | Cyber security training system having network writing function | |
CN113301040B (en) | Firewall strategy optimization method, device, equipment and storage medium | |
CN109583192A (en) | A kind of fixed safety system of mobile terminal application and method based on emulation | |
Allison et al. | Digital Twin-Enhanced Incident Response for Cyber-Physical Systems | |
JP2011034274A (en) | Automatic test execution system | |
CN114706738A (en) | Method and device for automatically burying point at client | |
Arantes et al. | Tool support for generating model-based test cases via web |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |