CN108763039B

CN108763039B - Service fault simulation method, device and equipment

Info

Publication number: CN108763039B
Application number: CN201810285000.9A
Authority: CN
Inventors: 王少华
Original assignee: Advanced New Technologies Co Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2018-04-02
Filing date: 2018-04-02
Publication date: 2021-09-21
Anticipated expiration: 2038-04-02
Also published as: CN108763039A

Abstract

The embodiment of the specification discloses a method, a device and equipment for simulating service faults. The fault scenes and the simulation tasks are mapped one by one, and the basic data sets required to be adopted are determined, so that the corresponding simulation tasks are generated in a manner of arranging the basic data sets, the simulation of the fault scenes in reality is realized, the cost for constructing the fault scenes is reduced, more complicated fault scenes are combined in a simpler and more flexible manner, and related service systems are not influenced.

Description

Service fault simulation method, device and equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for simulating a service fault.

Background

With the rapid development of services, various failures are easy to occur in the operation of a service system due to the diversity and complexity of the services, and corresponding management and control are required.

In the prior art, when a management and control system is tested and verified, the real fault occurrence is simulated mainly by completing fault injection on a business system, so that the management and control system can collect data of the current business system for analysis, find problems and verify the detection capability of the management and control system.

Based on this, a more convenient service failure simulation scheme is needed.

Disclosure of Invention

The embodiment of the specification provides a method, a device and equipment for simulating a service fault, which are used for solving the following problems: to provide a more convenient service failure simulation scheme.

Based on this, an embodiment of the present specification provides a service fault simulation method, including:

determining a basic data set required to be adopted according to a fault scene to be simulated;

arranging the basic data set to generate a simulation task corresponding to the fault scene to be simulated;

sending the simulation task to a management and control system so that the management and control system can perform fault test according to fault data provided by the simulation task;

wherein the basic data set is obtained by pre-collection, and comprises a fault data set and a normal data set.

Meanwhile, an embodiment of the present specification further provides a service fault simulation apparatus, including:

the device comprises a determining module, a judging module and a judging module, wherein the determining module is used for determining a basic data set required to be adopted according to a fault scene to be simulated, and the basic data set is obtained by pre-collecting and comprises a fault data set and a normal data set;

the arrangement module is used for arranging the basic data set and generating a simulation task corresponding to the fault scene to be simulated;

and the sending module is used for sending the simulation task to the management and control system so that the management and control system can carry out fault test according to the fault data provided by the simulation task.

Correspondingly, an embodiment of the present specification further provides a service fault simulation apparatus, including:

a memory storing a service failure simulation program;

and the processor calls the service fault simulation program in the memory and executes the following steps:

Correspondingly, embodiments of the present specification also provide a non-volatile computer storage medium storing computer-executable instructions configured to:

The embodiment of the specification adopts at least one technical scheme which can achieve the following beneficial effects:

in the scheme provided by the embodiment of the description, the fault scenes and the simulation tasks are mapped one by one, and the basic data sets required to be adopted are determined, so that the corresponding simulation tasks are generated in a manner of arranging the basic data sets, the simulation of the fault scenes in reality is realized, the cost for constructing the fault scenes is reduced, more complex fault scenes are combined in a simpler and more flexible manner, and related service systems are not influenced. In addition, the occurrence of each fault scene is managed in a task management mode, the fault scenes can be edited for multiple times and can be triggered to stop at any time, and a more convenient business fault scene construction mode is provided for the test and verification of a subsequent management and control system.

Drawings

FIG. 1 is a schematic overall flow chart of a fault drilling process in the prior art;

fig. 2 is a schematic diagram of a system architecture involved in service fault simulation provided in an embodiment of the present disclosure;

fig. 3 is a schematic flow chart of a service fault simulation method provided in an embodiment of the present specification;

FIG. 4 is a schematic diagram of constructing a fault scenario based on a base data set provided by an embodiment of the present specification;

FIG. 5 is a diagram illustrating results after a simulation task is performed according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a service fault simulation apparatus provided in an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step are within the scope of the present application.

In the prior art, for a management and control system, a traditional test and verification scheme is a mode of attack and defense drilling. Namely, the attacking system completes fault injection on the service system in a fault injection mode to simulate real fault occurrence, and the management and control system (namely the defending system) rapidly discovers and positions the fault problem by acquiring data of the current service system in real time and analyzing whether the service system is abnormal or not.

As shown in fig. 1, fig. 1 is a schematic overall flow chart of performing fault drilling in the prior art. In this mode, the injection process of the real system fault by the attacker is very complicated each time, and the online system cannot inject the fault frequently (the retransmission traffic affects other downstream related service systems). Meanwhile, the management and control system needs a large amount of abnormal data to train an algorithm for detecting the abnormality of the service system and perform accuracy test of the related algorithm. Therefore, the discrimination algorithm for each test or verification of the management and control system is difficult to acquire data, the cost is high, a large amount of personnel is required to participate, the whole data flow is long, the normal sequence of each flow is ensured, otherwise, the success is short of one step; in addition, in some service systems with large user quantity, frequent fault injection will significantly reduce user experience.

Based on this, the embodiments of the present specification provide a method for constructing a service fault based on a data set, so as to combine more complex fault scenarios in a simpler and more flexible manner, reduce the cost of constructing the fault scenarios, and simultaneously, not affect related service systems.

Fig. 2 is a schematic diagram of a system architecture involved in service fault simulation provided in an embodiment of the present disclosure. In this architecture, the storage device for collecting and storing the basic data set may be a business system, but in practical applications, for various factors such as convenience, security and no influence on the business system, a device or an entity other than the business system is generally used as an execution subject, and for example, it may be an independent database, a server or a management and control system, etc. unrelated to the business system.

A service fault simulation process provided in an embodiment of the present specification will be described in detail below based on the architecture shown in fig. 2, as shown in fig. 3, fig. 3 is a schematic flow diagram of a service fault simulation method provided in an embodiment of the present specification, and includes:

s301, determining a basic data set required to be adopted according to a fault scene to be simulated. Wherein the basic data set is obtained by pre-collection, and comprises a fault data set and a normal data set.

Specifically, the business system collects and deposits various fault data (including fault data during drilling and fault data occurring during running of the flat-time system) to form a basic data set containing various data (wherein, data without fault injection also belongs to one of the data sets, namely, a normal data set).

For example, the service failure may be represented by a large drop of service request traffic in a short time, and its intrinsic cause may be such as a line interruption, a line packet loss, an excessive time delay, a machine room shutdown, and so on. Each intrinsic cause of a business failure has a corresponding failure data set that can be collected and stored specifically in a daily business process or a daily defense and attack drill. Obviously, when processed by normal traffic, it will also have a corresponding normal data set.

In other words, if the detection capability of the management and control system for any fault reason needs to be tested, the management and control system can observe whether the management and control system can correctly detect the fault reason corresponding to the fault data set by collecting the corresponding fault data set in advance, so as to test and verify the fault detection capability of the management and control system.

S303, arranging the basic data set and generating a simulation task corresponding to the fault scene to be simulated.

The layout refers to creating a simulation task in a specified order and format. For example, to simulate a fault scenario of a line short interruption, a simulation task is organized in the order of a continuous 5 minute normal data set, a continuous 3 minute line interruption data set, and a continuous 3 minute normal data set. The organized simulation task is a series of data sets assigned with sending modes. By arranging various data sets in the basic data set, various fault reasons can be simulated flexibly, test verification data of various fault scenes are provided for the management and control system, and the fault detection capability of the management and control system is more convenient to test and verify.

S305, sending the simulation task to a management and control system so that the management and control system can perform fault test according to fault data provided by the simulation task.

As a specific embodiment, for the basic data set, it can be collected in advance as follows:

receiving the injected fault code, acquiring fault data triggered by the fault code, generating a fault data set and storing the fault data set; or acquiring fault data generated when a service fault occurs in actual service processing, generating a fault data set and storing the fault data set; and acquiring normal data generated when no service fault occurs in actual service processing, generating the normal data set, and storing the normal data set.

In other words, the source of the basic data set includes a fault data set generated by fault injection during the attack and defense training, a fault data set generated by faults generated by daily business processing and a normal data set, and various basic data sets are provided in different ways, so that different simulation tasks can be compiled for testing according to the actual testing requirements.

In practical application, for the arranging of the basic data set in S303, a simulation task corresponding to the fault scenario to be simulated is generated, which includes: and determining the sending time length, the sending frequency and the sending sequence of any fault data set or normal data set, and generating a simulation task corresponding to the fault scene to be simulated.

In other words, when arranging fault scenes, the data sets which are determined to be adopted and correspond to the fault scenes in the daily collected and stored basic data sets are combined into a simulation task of retransmitting data according to a certain sending sequence, sending frequency and sending time length to simulate the actual fault scene.

For the same fault scene, the system can correspond to a plurality of different simulation tasks, that is, the inherent cause of the same service fault phenomenon may be different, and we need to observe whether the management and control system can correctly detect the corresponding cause. For example, for the same phenomenon that the service flow is reduced, the phenomenon may be line interruption, data delay, or the like, so that when a fault scene is arranged, a simulation task may be generated by arranging a corresponding line interruption fault data set and a corresponding data delay fault data set, and then the simulation task is sent. In general, in the programming simulation task, a normal data set should be added to realize a more realistic simulation.

Further, after the scheduling of the simulation task, when the simulation task is sent, the sending method of the simulation task may be further controlled, that is, for sending the simulation task to the control system in S305, the method includes:

determining a sending mode of the simulation task, wherein the sending mode comprises at least one of sequential sending, cyclic sending or stopping sending; and sending the plurality of simulation tasks to a management and control system according to the determined sending mode.

Each simulation task corresponds to a fault scene, the corresponding fault scene is controlled through the control (sequential execution, cyclic execution, stopping execution and the like) of the simulation tasks, and the simulation of fault occurrence is realized in a flexible control mode. The sequence execution is to execute the data transmission in the simulation task according to the arranged sequence; the cycle execution means that the simulation task is executed again after the simulation task is completely finished; stopping execution is stopping data transmission.

Fig. 4 is a schematic diagram of constructing a fault scenario based on a basic data set according to an embodiment of the present disclosure, as shown in fig. 4. The fault scene is constructed through a basic data set, and the fault scene is mainly divided into three layers:

the method comprises the steps of collecting a basic data set, various fault data sets and a normal data set, sequentially collecting data sets of various scenes constructed in various fault injection (including fault non-injection) modes, storing the data sets into corresponding storage equipment, and converging the data sets into the basic data set constructed by the whole fault scene.

And constructing a simulation task, selecting some corresponding data sets in the basic data set according to the requirement of constructing a fault scene, combining the data sets according to a certain sequence, setting the retransmission sending frequency/Query Per Second (QPS) and the retransmission duration of each data set, and combining into a retransmission task, wherein the simulation task corresponds to an actual fault scene.

And (3) simulating task control, wherein each simulating task corresponds to one fault scene, and the corresponding fault scene is controlled through controlling (sequential execution, cyclic execution, stopping execution and the like) the simulating tasks, so that the simulation of fault occurrence is realized in a flexible control mode.

Taking a typical failure scenario service flow drop as an example, when arranging the simulation task, determining the simulation task as follows: { [ normal data set, 300QPS, 3min ], [ line interruption data set, 30QPS, 3min ], [ normal data set, 300QPS, 3min ] }, the above simulation tasks are sent sequentially. Thus, a schematic diagram of the result of the flow drop in a typical fault scenario is obtained, as shown in fig. 5, and fig. 5 is a schematic diagram of the result after the simulation task is performed (the flow drop is 3 minutes) provided in the embodiment of the present specification. The management and control system receives the relevant data at the moment and detects the data, so that a worker can observe whether the management and control system can correctly detect the reason of the flow drop.

Based on the same idea, an embodiment of the present specification further provides a service fault simulation apparatus, as shown in fig. 6, where fig. 6 is a schematic structural diagram of the service fault simulation apparatus provided in the embodiment of the present specification, and the schematic structural diagram includes:

the determining module 601 is configured to determine a basic data set to be adopted according to a fault scene to be simulated, where the basic data set is obtained by pre-collection and includes a fault data set and a normal data set;

the arranging module 603 is used for arranging the basic data set and generating a simulation task corresponding to the fault scene to be simulated;

the sending module 605 sends the simulation task to the management and control system, so that the management and control system performs fault testing according to the fault data provided by the simulation task.

Further, the apparatus further includes a basic data collection module 607, which receives the injected fault code, obtains fault data triggered by the fault code, generates the fault data set, and stores the fault data set; or acquiring fault data generated when a service fault occurs in actual service processing, generating a fault data set and storing the fault data set; and acquiring normal data generated when no service fault occurs in actual service processing, generating the normal data set, and storing the normal data set.

Further, the scheduling module 603 determines a sending time length, a sending frequency and a sending sequence of any fault data set or normal data set, and generates a simulation task corresponding to the fault scenario to be simulated.

Further, the sending module 605 determines a sending mode of the simulation task, where the sending mode includes at least one of sequential sending, cyclic sending, or stop sending; and sending the plurality of simulation tasks to a management and control system according to the determined sending mode.

a memory storing a service failure simulation program;

Based on the same inventive concept, embodiments of the present application further provide a corresponding non-volatile computer storage medium, in which computer-executable instructions are stored, where the computer-executable instructions are configured to:

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. Especially, as for the device, apparatus and medium type embodiments, since they are basically similar to the method embodiments, the description is simple, and the related points may refer to part of the description of the method embodiments, which is not repeated here.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps or modules recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the units may be implemented in the same software and/or hardware or in one or more pieces of software and/or hardware when implementing the embodiments of the present description.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include transitory computer readable media (transient media) such as modulated data signal numbers and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

Embodiments of the present description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular transactions or implement particular abstract data types. Embodiments of the present description may also be practiced in distributed computing environments where transactions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Claims

1. A service fault simulation method comprises the following steps:

arranging the basic data set to generate a simulation task corresponding to the fault scene to be simulated; the organized simulation task is a series of various data sets in the basic data set assigned with sending sequence and format;

2. The method of claim 1, wherein the basic data set is obtained by pre-collection, comprising:

receiving the injected fault code, acquiring fault data triggered by the fault code, generating a fault data set and storing the fault data set; alternatively, the first and second electrodes may be,

acquiring fault data generated when a service fault occurs in actual service processing, generating a fault data set and storing the fault data set; alternatively, the first and second electrodes may be,

and acquiring normal data generated when no service fault occurs in actual service processing, generating the normal data set, and storing the normal data set.

3. The method of claim 1, wherein arranging the base data set to generate a simulation task corresponding to the fault scenario to be simulated comprises:

and determining the sending time length, the sending frequency and the sending sequence of any fault data set or normal data set, and generating a simulation task corresponding to the fault scene to be simulated.

4. The method of claim 1, sending the simulation task to a governing system, comprising:

determining a sending mode of the simulation task, wherein the sending mode comprises at least one of sequential sending, cyclic sending or stopping sending;

and sending a plurality of simulation tasks to the management and control system according to the determined sending mode.

5. A service fault simulation apparatus comprising:

the arrangement module is used for arranging the basic data set and generating a simulation task corresponding to the fault scene to be simulated; the organized simulation task is a series of various data sets in the basic data set assigned with sending sequence and format;

6. The apparatus of claim 5, further comprising a basic data collection module that receives the injected fault codes, obtains fault data triggered by the fault codes, generates the fault data set, and saves; or acquiring fault data generated when a service fault occurs in actual service processing, generating a fault data set and storing the fault data set; or, normal data generated when no service fault occurs in actual service processing is acquired, and the normal data set is generated and stored.

7. The apparatus according to claim 5, wherein the scheduling module determines a transmission time length, a transmission frequency and a transmission sequence of any fault data set or normal data set, and generates a simulation task corresponding to the fault scenario to be simulated.

8. The apparatus of claim 5, the sending module to determine a sending mode of the simulation task, wherein the sending mode includes at least one of sequential sending, cyclic sending, or stop sending; and sending a plurality of simulation tasks to the management and control system according to the determined sending mode.

9. A service fault simulation device, comprising:

a memory storing a service failure simulation program;