CN113342650B - Chaotic engineering method and device for distributed system - Google Patents

Chaotic engineering method and device for distributed system Download PDF

Info

Publication number
CN113342650B
CN113342650B CN202110603040.5A CN202110603040A CN113342650B CN 113342650 B CN113342650 B CN 113342650B CN 202110603040 A CN202110603040 A CN 202110603040A CN 113342650 B CN113342650 B CN 113342650B
Authority
CN
China
Prior art keywords
fault
test
data
test case
distributed system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110603040.5A
Other languages
Chinese (zh)
Other versions
CN113342650A (en
Inventor
张晓娜
暨光耀
傅媛媛
黄琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110603040.5A priority Critical patent/CN113342650B/en
Publication of CN113342650A publication Critical patent/CN113342650A/en
Application granted granted Critical
Publication of CN113342650B publication Critical patent/CN113342650B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a chaotic engineering method and a chaotic engineering device of a distributed system, and relates to the technical fields of distributed systems and chaotic engineering, wherein the method comprises the following steps: collecting test data and server equipment data of a distributed system through a code embedded point; replacing the test data with the abnormal data to form an abnormal data test case; generating fault types related to corresponding fault points according to server equipment data and a pre-established fault expert database, and forming a fault test case; the fault expert database is a relation among the type of the server equipment, the fault type of the server equipment and the fault occurrence probability; and executing the abnormal data test case and the fault test case to obtain a test result of the distributed system. The invention can realize the overall and high-efficiency improvement of the robustness and the high availability of the distributed system.

Description

Chaotic engineering method and device for distributed system
Technical Field
The invention relates to the technical field of distributed systems and chaotic engineering, in particular to a chaotic engineering method and device of a distributed system.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
In recent years, as the system architecture is developed from single application to distributed system, the development efficiency and the system expandability are gradually improved, but at the same time, the complexity of the system is also increased, and the traditional service test method cannot fully cover all possible behaviors of the system. With the continuous development of micro services, the scale of the system is continuously increased, and the dependence among services also brings about a lot of uncertainty, so that in the complex call network, any ring is abnormal, and other services are possibly influenced. The probability and randomness of faults become larger due to the increase of service nodes, and how to improve the robustness and the high availability of the distributed system becomes a problem to be solved urgently.
Currently, the robustness, high availability capability and the like of most distributed systems are realized through chaotic engineering, and the robustness of the systems is verified mainly through simulation faults.
The existing chaotic engineering only verifies the performance of the system when unexpected faults or the performance of the system when parameters are abnormal, and the system and the parameters are not verified in combination, so that the problem of incomplete test coverage exists, and the robustness and the high availability of the system cannot be effectively improved. In addition, most of the current chaotic engineering methods cannot be fully automated, and a test case needs to be designed manually and then executed, so that the problems of low test efficiency and high manpower consumption exist.
Disclosure of Invention
The embodiment of the invention provides a chaotic engineering method of a distributed system, which is used for comprehensively and efficiently improving the robustness and high availability of the distributed system, and comprises the following steps:
collecting test data and server equipment data of a distributed system through a code embedded point;
Replacing the test data with the abnormal data to form an abnormal data test case;
Generating fault types related to corresponding fault points according to server equipment data and a pre-established fault expert database, and forming a fault test case; the fault expert database is a relation among the type of the server equipment, the fault type of the server equipment and the fault occurrence probability;
and executing the abnormal data test case and the fault test case to obtain a test result of the distributed system.
The embodiment of the invention also provides a chaotic engineering device of the distributed system, which is used for comprehensively and efficiently improving the robustness and the high availability of the distributed system, and comprises the following components:
The acquisition unit is used for acquiring test data and server equipment data of the distributed system through the code embedded points;
the abnormal data test case generation unit is used for replacing the test data with abnormal data to form an abnormal data test case;
The fault test case generation unit is used for generating fault types related to corresponding fault points according to the server equipment data and a pre-established fault expert database to form a fault test case; the fault expert database is a relation among the type of the server equipment, the fault type of the server equipment and the fault occurrence probability;
and the test unit is used for executing the abnormal data test case and the fault test case to obtain a test result of the distributed system.
The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the chaotic engineering method of the distributed system when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program for executing the chaotic engineering method of the distributed system.
In the embodiment of the invention, compared with the technical scheme that the testing coverage is incomplete in the prior art, the system robustness and the high usability can not be effectively improved, and the test case re-execution case is required to be designed manually, and the efficiency is low, the chaotic engineering scheme of the distributed system is characterized in that: collecting test data and server equipment data of a distributed system through a code embedded point; replacing the test data with the abnormal data to form an abnormal data test case; generating fault types related to corresponding fault points according to server equipment data and a pre-established fault expert database, and forming a fault test case; the fault expert database is a relation among the type of the server equipment, the fault type of the server equipment and the fault occurrence probability; and executing the abnormal data test case and the fault test case to obtain a test result of the distributed system, so that the robustness and the high availability of the distributed system can be comprehensively and efficiently improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 is a schematic flow chart of a chaotic engineering method of a distributed system in an embodiment of the invention;
FIG. 2 is a schematic flow chart of a method of chaotic engineering of a distributed system according to another embodiment of the present invention;
FIG. 3 is a schematic diagram of a transaction link and a related server in a service invocation relationship in accordance with an embodiment of the present invention;
FIG. 4 is a flow chart illustrating an abnormal data test case according to an embodiment of the present invention;
FIG. 5 is a flow chart of a fault test case execution in an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a chaotic engineering device of a distributed system according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.
Before describing the embodiments of the present invention, the terms related to the embodiments of the present invention will be described first.
1. Chaos engineering: chaotic engineering is the subject of experiments performed on a distributed system, and aims to establish the confidence of the system in bearing the capability of turbulent conditions in a production environment, and can be regarded as experiments performed for revealing the weakness of the system, and the greater the difficulty of damaging the steady state, the greater the confidence of the system behavior.
2. Jmeter is a Java-based test tool developed by Apache organization, which can be used for pressure test, functional test and regression test of software.
3. Postman is a powerful tool for testing HTTP interfaces developed by Postdot Technologies.
4. ChaosBlade is a tool for injecting faults of the open source codes of the alebab in 2018, can provide the injection of faults of a CPU, a memory, a network, a disk and the like, and also supports secondary development and optimization according to requirements.
5. LoadRunner is software developed by hewlett packard corporation and is primarily used for automatic load testing of various architectures, and can predict system behavior and evaluate system performance.
In order to comprehensively and efficiently improve the robustness and the high availability of a distributed system, the embodiment of the invention provides a distributed system chaotic engineering scheme, which comprises the steps of firstly adding code buried points to collect related data and links, and then replacing the collected test data with abnormal data to form an abnormal data test case by means of data replacement and the like; in addition, aiming at the collected server equipment data, a fault expert database is combined to generate a fault test case; and then automatically executing the generated test cases, collecting relevant monitoring data, and sending test results to testers. According to the invention, a more comprehensive chaotic engineering test case (an abnormal data test case and a fault test case) can be automatically generated without additional manpower input, and the chaotic engineering test case is automatically executed, so that the labor cost is reduced, the coverage of the test is improved, and the robustness and the high usability of the distributed system are further improved under the condition of not increasing the manpower. The chaotic engineering scheme of the distributed system is described in detail below.
Fig. 1 is a flow chart of a chaotic engineering method of a distributed system according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step 101: collecting test data and server equipment data of a distributed system through a code embedded point;
Step 102: replacing the test data with the abnormal data to form an abnormal data test case;
Step 103: generating fault types related to corresponding fault points according to server equipment data and a pre-established fault expert database, and forming a fault test case; the fault expert database is a relation among the type of the server equipment, the fault type of the server equipment and the fault occurrence probability;
Step 104: and executing the abnormal data test case and the fault test case to obtain a test result of the distributed system.
The chaotic engineering method of the distributed system provided by the embodiment of the invention can realize the overall and efficient improvement of the robustness and the high availability of the distributed system. The steps involved in the method are described in detail below in conjunction with fig. 2-5.
1. First, the above step 101, i.e., step 1 in fig. 2, is described.
In specific implementation, code embedded points are made in the codes, related data including transaction links, service names, parameters, container IDs, database IP and the like are collected through the embedded points, and the collected data are subjected to preliminary processing and stored in the database.
In particular, buried points are terms in the field of data collection, and their academic name is event tracking (EVENT TRACKING), which refers to related technologies for capturing, processing and transmitting specific user behaviors or events and implementation processes thereof.
From the foregoing, in one embodiment, the chaotic engineering method of the distributed system may further include: and performing preliminary processing on the test data to obtain the test data after the preliminary processing.
In one embodiment, performing preliminary processing on the test data to obtain the test data after the preliminary processing may include: splitting the test data message string to obtain test data of the message string with the preset format.
In specific implementation, processing the data mainly refers to processing the collected parameters of the service, and most of the parameters of the service are in the form of a message string, so that the test data can be replaced in order to facilitate the subsequent steps, and further the efficiency of testing the distributed system is improved, specifically, the message string of the parameters is mainly split into the following formats: parameter name |parameter type|parameter value.
To facilitate an understanding of how the present invention may be implemented, the collected test data and server device data are described by way of example in connection with FIG. 3.
As shown in fig. 3, the transaction link and the related server in the service call relationship provided by the embodiment of the present invention: as in fig. 3: the A service calls the B service, the B service calls the C service, and the C service calls the D service. Wherein, when the A service calls the B service, three parameters are transmitted, namely (B1, int (3)), (B2, str (12)), (B3, bigDecimal), wherein when the A calls the B, the parameter value transmitted to the B1 is 10, the parameter value transmitted to the B2 is BBBBB, and the parameter value transmitted to the B3 is 1234567890123456. The transaction involves four application containers, and database IP1, database IP2, database IP3, cache server NIP1. After the relevant transaction links and the relevant servers are acquired through the code embedded points of the S101, the transaction links and the relevant servers are stored into a database through processing, and the types stored into the database are as follows.
1. The test data mainly refers to the parameter types and parameter values transmitted when called between different services. The test data is stored in the database in the following format:
(unique index ID, source service name, called service name 1, parameter type 1, parameter name 2, parameter type 3, parameter name 3, parameter type 3, … …, parameter N, parameter type N);
(unique index ID, called service name 1, called service name 2, parameter name 1, parameter type 1, parameter name 2, parameter type 3, parameter name 3, parameter type 3, … …, parameter N, parameter type N);
(unique index ID, called service name 2, called service name 3, parameter name 1, parameter type 1, parameter name 2, parameter type 3, parameter name 3, parameter type 3, … …, parameter N, parameter type N).
Because the same service may be invoked by a plurality of different services, the different services involved in the same transaction are uniquely represented by unique index IDs. As above, the information stored in the database, it can be known that the link of the transaction is: the "source service name" calls "called service name 1", "called service name 1" calls "called service name 2", "called service name 2" calls "called service name 3", and parameters and parameter types of each layer of service call, etc.
Preferably, the parameter types herein may include, but are not limited to char, int, str, bigDecimal, date, etc., and are stored in the database according to the actual acquisition situation.
2. Server device related data (server device data) refers to a server device involved in a transaction, and includes MYSQL, ORACLE, docker containers, linux servers, F5, SLB, DBLE, NOS cache servers, redis cache servers, KAFKA, MQ, and other types of devices related to a distributed system. Server device data is stored to the database in the following format:
(unique index ID, server type, server IP 1);
(unique index ID, server type, server IP 2).
Preferably, the server types herein include, but are not limited to MYSQL, ORACLE, docker containers, linux servers, F5, SLB, DBLE, NOS cache servers, redis cache servers, KAFKA, MQ, and other distributed system related device types, etc.
Preferably, because the same service may involve multiple different servers, the different servers involved in the same transaction are uniquely represented by unique index IDs.
2. Next, a preferred step between the above steps 101 and 102, i.e., step 2 in fig. 2, is described.
In specific implementation, the collected test data and the relevant data of the server equipment are processed (preprocessed) for the second time and stored in a database according to a certain format (see the mapping relation format below).
In particular, since there may be a change in the parameter type or number of service transactions for each version, the test data of the previous version is not necessarily applicable to the next version, so the cleaning mechanism of the test data and the server device data in the database cleans up once for each version.
In one embodiment, the chaotic engineering method of the distributed system may further include: and preprocessing the test data and the server equipment data to obtain a mapping relation between the test data and the server equipment data.
In particular, secondary processing (preprocessing) refers to mapping the collected test data and server device data. Because the whole transaction link has a plurality of types of collected test data and server equipment, one-to-one mapping is needed, so that the subsequent generation of a correct test case on a correct service type is facilitated, and the accuracy of the test is further improved.
In specific implementation, the mapping relationship format may be as follows: service name |parameter name 1|parameter type 1|parameter value 1|parameter name 2|parameter type 2|parameter value 2| … … parameter name n|parameter type n|parameter value n|server type|server IP.
3. Next, the above step 102, i.e., step 3 in fig. 2, is described.
In one embodiment, replacing test data with exception data to form an exception data test case may include: and replacing the test data with the abnormal data by means of field type replacement, field length replacement and special field replacement to form an abnormal data test case.
In specific implementation, the collected test data is replaced, and normal test data is replaced by abnormal data through three replacement modes of field type replacement, field length replacement and special field replacement, so that an abnormal data test case is formed. These three alternatives are described in detail below by way of example to facilitate an understanding of how embodiments of the invention may be practiced.
1. The field type replacement is to replace the parameter value of the field with a value which does not conform to the field type, and verify whether the system can correctly process the condition that the parameter value type of the field is not matched. The invention relates to a code under a distributed system, which is mostly written in JAVA, and the invention is described by taking the type in Java code as an example:
The data types in JAVA are: byte (byte type), short (end integer), int (integer), long (long integer), char (character type), float (floating point type), double precision type (double), string (String), when the original parameter type is any one of the above, the original parameter value is replaced by the other 7 types of parameters, thereby realizing that the abnormal data is replaced and generating the abnormal parameter. For example:
(parameter name a, int (4)), when replaced with an abnormal parameter, then it can be replaced with:
(parameter name a, char type), examples are (parameter name a, B);
(parameter name a, string type), examples are (parameter name a, BBBB); and so on.
2. The field length replacement is a field of pointer pair limiting length, and a value exceeding the field length needs to be assigned to the field, so as to verify whether the system can correct the field length. For example, (parameter name a, int (4)), when the abnormal parameter is replaced, the abnormal parameter is (parameter name a, 11111); thereby verifying that the system can handle normally when the parameter value of the field exceeds the defined field length.
3. Special field substitution refers to the assignment of a special field, such as certain special field types, that requires compliance with certain logic rules. For the special field types, replacing the special field types with parameters which do not accord with the special field rules, and verifying whether the system can process normally.
Such as identification cards, telephone numbers, postal codes, dates, etc., are assigned by substitution with parameters that do not comply with character rules. Such as date, typically year, month, day, and value in code, typically in the form of YYYY-MM-DD, with the "-" symbol replaced with another character, e.g., YYYYFMMFDD, such that the rule is not met, a date-type anomaly test case is generated, and so on.
4. Next, the above step 103, i.e., step 4 in fig. 2, is described.
In the implementation process, aiming at the collected server equipment, a fault expert library is combined to generate fault types related to corresponding fault points, so that a fault test case is formed.
In one embodiment, generating the fault types related to the corresponding fault points according to the server device data and the pre-established fault expert database to form the fault test cases may include:
Acquiring the type of the server device and the IP of the server device from the data of the server device;
According to the type of the server equipment and the IP of the server equipment, a fault expert library is combined with a pre-established fault expert library to generate fault types related to corresponding fault points, and a fault test case is formed.
In the implementation, the related server equipment type, server equipment IP and the like are acquired from the related data of the server equipment, and then a fault test case is generated according to the server equipment type and the server equipment IP and by combining a fault expert database. The fault expert database is an expert database set according to production and experience, and the main expression rules of the expert database are as follows: probability of occurrence of device type fault. The device types mainly comprise MYSQL, ORACLE, docker containers, linux servers, F5 and SLB, DBLE, NOS cache servers, redis cache servers, KAFKA, MQ and other device types relevant to the distributed system, and the like. The fault types mainly comprise CPU, memory, unavailable network ports, network delay, network packet loss, network packet damage, network packet disorder, network packet retransmission, disk space fullness, disk IO busyness, disk IO rate low and the like. The probability of occurrence of a failure refers to how much a certain failure occurs for a certain type of server device, and the higher the probability of occurrence, the greater the importance. The probability of failure is mainly comprehensively considered according to the probability of production and the probability of test environment occurrence, and then assignment is carried out. In generating the fault test cases, the fault test cases with different importance levels can be determined by setting the lowest threshold value of the fault occurrence probability. The following are illustrated:
the failure expert database (relationship between server device type, server device failure type, and failure occurrence probability) may include:
dock container |cpu full load|75;
dock container |jvm memory overflow|80;
dock container |network port unavailable |60;
dock container |network delay|35;
dock container |network packet loss|48;
dock container |network packet corruption |30;
dock container |network packet out of order |20;
the dock container |network packet resends |10;
dock container |disk space is full of |20;
dock container |disk IO BUSY|55;
dock container |disk IO rate low |18;
the NOS cache server|CPU full load|75;
the NOS cache server is provided with a JVM memory overflow 80;
the NOS cache server |network port is unavailable |60;
The NOS cache server is a network delay of 35;
the NOS cache server|network packet loss|48;
the NOS cache server|network packet damage|30;
The NOS cache server is used for carrying out the out-of-order |20 on network packets;
the NOS cache server retransmits the |10 network packet;
the NOS cache server is full of disc space |20;
The NOS cache server is |disk IO busy|55;
the NOS cache server is low in the disc IO rate by |18.
The server equipment data collected through the buried point can be (dock container, 122.18.XX. YY), the lowest threshold value of the fault occurrence probability set by the tester is 40, and then the expert database is combined to generate fault test cases with the fault occurrence probability higher than or equal to 40%, and the generated fault test case set is as follows:
(docker container, 122.18.XX. YY, CPU full load, 75);
(docker container, 122.18.XX. YY, JVM memory overflow, 80);
(docker container, 122.18.Xx. Yy, network port unavailable, 60);
(docker container, 122.18.Xx. Yy, network packet loss, 48);
(docker container, 122.18.XX. YY, disk IO busy, 55).
5. Next, the above step 104, i.e., step 5 in fig. 2, is described.
When the method is implemented, corresponding test cases are automatically scheduled through a scheduling tool, relevant monitoring data are collected, test results are stored in a database, and abnormal test results are sent to the slaves.
Preferably, the scheduling tools include two types, scheduling for abnormal data test cases and scheduling for faulty test cases, respectively. The abnormal data test cases are initiated primarily through Jmeter or postman. The fault test cases are initiated primarily by ChaosBlade tools. The detailed steps for performing the abnormal data test case and the failure test case, respectively, are described below.
1. First, the execution of the abnormal data test case is introduced.
In one embodiment, performing the abnormal data test case and the failure test case to obtain a test result of the distributed system may include: executing the abnormal data test cases according to the following method to obtain test results corresponding to the abnormal data test cases:
Judging whether the number of the abnormal data test cases to be tested in the abnormal data test case set is larger than 0;
Executing the abnormal data test cases through Jmeter tools or postman tools when the number of the abnormal data test cases to be tested is greater than 0;
Collecting data when executing the case, obtaining a test result corresponding to the abnormal data test case, and storing the test result corresponding to the abnormal data test case into a database;
And inquiring abnormal test results from the database, and sending the abnormal test results to a tester.
In specific implementation, as shown in fig. 4, the abnormal data test case execution process provided in this embodiment specifically includes the following steps.
The abnormal data test case is initiated mainly through Jmeter or postman, and comprises the following processing flows:
S301: and starting to execute the abnormal data test cases, and judging whether the number of the abnormal data test cases to be tested is larger than 0. If the number of abnormal data test cases to be tested is greater than 0, step S302 is performed. If the number of abnormal data test cases to be tested is not greater than 0, step S304 is performed.
S302: the abnormal data test cases are executed by a test tool such as Jmeter or Postman.
S303: collecting test data and test results, and storing the test data and the test results in a database.
S304: and inquiring abnormal test results from the database, and sending the abnormal test results to a tester.
2. Next, the execution of the fault test case is described.
In one embodiment, performing the abnormal data test case and the failure test case to obtain a test result of the distributed system may include: executing the fault test case according to the following method to obtain a test result corresponding to the fault test case:
installing chaosblade of the medium on the server to be injected with the fault;
Judging whether the number of the fault test cases to be tested in the fault test case set is larger than 0;
When the number of fault test cases to be tested is greater than 0, initiating high concurrency of preset transactions through a LoadRunner tool or Jmeter tool;
executing one fault test case in the fault test case set, and injecting a corresponding fault on a corresponding server IP through ChaosBlade tools according to the fault type of the fault test case;
collecting monitoring data when executing the case to obtain a test result corresponding to the fault test case, and storing the test result into a database;
inquiring an abnormal test result from the database, and sending the abnormal test result to a tester;
and when all fault test cases in the fault test case set are executed, the ChaosBlade medium is withdrawn.
In one embodiment, the monitoring data may include: resource monitoring data and system monitoring data.
In specific implementation, as shown in fig. 5, the fault test case execution process provided in this embodiment specifically includes the following steps.
The fault test case mainly comprises the following processing flows of injecting faults through a chaosblade tool:
S401: and installing chaosblade media on the server to be injected with the fault.
ChaosBlade is a tool for injecting faults of the open source codes of the alebab in 2018, can provide the injection of faults of a CPU, a memory, a network, a disk and the like, and also supports secondary development and optimization according to requirements.
S402: and judging whether the number of fault test cases to be tested is larger than 0. If the number of fault test cases to be tested is greater than 0, step S403 is performed. If the number of test cases for failure to be tested is not greater than 0, step 406 is performed.
S403: high concurrency of a transaction is initiated through LoadRunner or Jmeter.
Preferably, the number of concurrent users of a certain transaction in production at peak time can be acquired first. The high concurrency of the step can be used as the concurrency user number of a certain transaction in the production in the peak period, and a high concurrency pressure test is initiated.
LoadRunner and Jmeter are preferably tools that initiate pressure testing, where the high concurrency of initiating transactions is primarily used to simulate in-transit transactions on production.
S404: and executing a fault test case, and injecting corresponding faults through ChaosBlade commands according to the fault type of the fault test case.
The fault test case generated in step 4 in fig. 2 is executed, and the corresponding fault is injected on the corresponding server IP through ChaosBlade command according to the fault type in the case.
S405: and collecting monitoring data, including resource monitoring data and system monitoring data, and storing the monitoring data in a database.
Preferably, the resource monitoring data includes a CPU, a memory, a network IO, a disk IO, and the like of the server, and may be collected by NMON tools or other resource collection tools.
Preferably, the system monitoring data includes transaction number per second, transaction response time, number of concurrent transactions, number of successful transactions, number of failed transactions, etc., which may be collected by LoadRunner or Jmeter.
Preferably, the collected data is stored in a database for ease of querying. Since there may be a change to the service code for each version, each version needs to be tested once, the cleaning mechanism for monitoring data in the database cleans up once for each version.
S406: and inquiring abnormal test results from the database, and sending the abnormal test results to a tester.
S407: and (3) finishing the execution of all fault test cases, removing ChaosBlade media, and restoring the test environment to the original state.
The distributed system chaotic engineering method provided by the embodiment of the invention has the advantages that:
1. The method and the device for testing the chaotic engineering have the advantages that the mode of combining the abnormal data test case and the fault test case is firstly provided as the test case of the chaotic engineering, and the test coverage area of the method and the device is larger than that of the method and the device for testing the chaotic engineering by using the abnormal data test case or the fault test case, so that various abnormal scenes of the chaotic engineering can be effectively and comprehensively covered.
2. Transaction links, parameters, server equipment, server types and the like related to transactions are obtained in a buried point mode, test data are replaced to be abnormal data according to a certain rule, fault test cases are generated according to the server equipment types, testing efficiency and testing effect can be improved without additional input of manpower, and robustness and high availability of the distributed system are further provided.
The embodiment of the invention also provides a chaotic engineering device of the distributed system, as described in the following embodiment. Because the principle of the device for solving the problem is similar to that of the chaotic engineering method of the distributed system, the implementation of the device can be referred to the implementation of the chaotic engineering method of the distributed system, and the repeated parts are not repeated.
Fig. 6 is a schematic structural diagram of a chaotic engineering device of a distributed system according to an embodiment of the present invention, as shown in fig. 6, the device includes:
The acquisition unit 01 is used for acquiring test data and server equipment data of the distributed system through code embedded points;
An abnormal data test case generation unit 02 for replacing the test data with abnormal data to form an abnormal data test case;
The fault test case generation unit 03 is used for generating fault types related to corresponding fault points according to the server equipment data and a pre-established fault expert database to form a fault test case; the fault expert database is a relation among the type of the server equipment, the fault type of the server equipment and the fault occurrence probability;
And the test unit 04 is used for executing the abnormal data test case and the fault test case to obtain a test result of the distributed system.
In one embodiment, the abnormal data test case generation unit may be specifically configured to: and replacing the test data with the abnormal data by means of field type replacement, field length replacement and special field replacement to form an abnormal data test case.
In one embodiment, the test unit may be specifically configured to: executing the fault test case according to the following method to obtain a test result corresponding to the fault test case:
installing chaosblade of the medium on the server to be injected with the fault;
Judging whether the number of the fault test cases to be tested in the fault test case set is larger than 0;
When the number of fault test cases to be tested is greater than 0, initiating high concurrency of preset transactions through a LoadRunner tool or Jmeter tool;
executing one fault test case in the fault test case set, and injecting a corresponding fault on a corresponding server IP through ChaosBlade tools according to the fault type of the fault test case;
collecting monitoring data when executing the case to obtain a test result corresponding to the fault test case, and storing the test result into a database;
inquiring an abnormal test result from the database, and sending the abnormal test result to a tester;
and when all fault test cases in the fault test case set are executed, the ChaosBlade medium is withdrawn.
In one embodiment, the monitoring data may include: resource monitoring data and system monitoring data.
In one embodiment, the test unit may be specifically configured to: executing the abnormal data test cases according to the following method to obtain test results corresponding to the abnormal data test cases:
Judging whether the number of the abnormal data test cases to be tested in the abnormal data test case set is larger than 0;
Executing the abnormal data test cases through Jmeter tools or postman tools when the number of the abnormal data test cases to be tested is greater than 0;
Collecting data when executing the case, obtaining a test result corresponding to the abnormal data test case, and storing the test result corresponding to the abnormal data test case into a database;
And inquiring abnormal test results from the database, and sending the abnormal test results to a tester.
In one embodiment, the fault test case generation unit may be specifically configured to:
Acquiring the type of the server device and the IP of the server device from the data of the server device;
According to the type of the server equipment and the IP of the server equipment, a fault expert library is combined with a pre-established fault expert library to generate fault types related to corresponding fault points, and a fault test case is formed.
In one embodiment, the chaotic engineering method of the distributed system may further include: and the preprocessing unit is used for preprocessing the test data and the server equipment data to obtain a mapping relation between the test data and the server equipment data.
In one embodiment, the chaotic engineering method of the distributed system may further include: the preliminary processing unit is used for performing preliminary processing on the test data to obtain the test data after the preliminary processing.
In one embodiment, the preliminary processing unit is specifically configured to: splitting the test data message string to obtain test data of the message string with the preset format.
The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the chaotic engineering method of the distributed system when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program for executing the chaotic engineering method of the distributed system.
In the embodiment of the invention, compared with the technical scheme that the testing coverage is incomplete in the prior art, the system robustness and the high usability can not be effectively improved, and the test case re-execution case is required to be designed manually, and the efficiency is low, the chaotic engineering scheme of the distributed system is characterized in that: collecting test data and server equipment data of a distributed system through a code embedded point; replacing the test data with the abnormal data to form an abnormal data test case; generating fault types related to corresponding fault points according to server equipment data and a pre-established fault expert database, and forming a fault test case; the fault expert database is a relation among the type of the server equipment, the fault type of the server equipment and the fault occurrence probability; and executing the abnormal data test case and the fault test case to obtain a test result of the distributed system, so that the robustness and the high availability of the distributed system can be comprehensively and efficiently improved.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (9)

1. A method of chaotic engineering of a distributed system, comprising:
collecting test data and server equipment data of a distributed system through a code embedded point;
Replacing the test data with the abnormal data to form an abnormal data test case;
Generating fault types related to corresponding fault points according to server equipment data and a pre-established fault expert database, and forming a fault test case; the fault expert database is a relation among the type of the server equipment, the fault type of the server equipment and the fault occurrence probability;
executing the abnormal data test case and the fault test case to obtain a test result of the distributed system;
Executing the abnormal data test case and the fault test case to obtain a test result of the distributed system, wherein the test result comprises the following steps: executing the fault test case according to the following method to obtain a test result corresponding to the fault test case: installing chaosblade of the medium on the server to be injected with the fault; judging whether the number of the fault test cases to be tested in the fault test case set is larger than 0; when the number of fault test cases to be tested is greater than 0, initiating high concurrency of preset transactions through a LoadRunner tool or Jmeter tool; executing one fault test case in the fault test case set, and injecting a corresponding fault on a corresponding server IP through ChaosBlade tools according to the fault type of the fault test case; collecting monitoring data when executing the case to obtain a test result corresponding to the fault test case, and storing the test result into a database; inquiring an abnormal test result from the database, and sending the abnormal test result to a tester; and when all fault test cases in the fault test case set are executed, the ChaosBlade medium is withdrawn.
2. The method of chaotic engineering for a distributed system according to claim 1, wherein replacing the test data with the abnormal data to form an abnormal data test case comprises: and replacing the test data with the abnormal data by means of field type replacement, field length replacement and special field replacement to form an abnormal data test case.
3. The method of chaotic engineering for a distributed system according to claim 1, wherein the monitoring data comprises: resource monitoring data and system monitoring data.
4. The method of chaotic engineering of a distributed system according to claim 1, wherein the performing of the abnormal data test case and the fault test case to obtain the test result of the distributed system comprises: executing the abnormal data test cases according to the following method to obtain test results corresponding to the abnormal data test cases:
Judging whether the number of the abnormal data test cases to be tested in the abnormal data test case set is larger than 0;
Executing the abnormal data test cases through Jmeter tools or postman tools when the number of the abnormal data test cases to be tested is greater than 0;
Collecting data when executing the case, obtaining a test result corresponding to the abnormal data test case, and storing the test result corresponding to the abnormal data test case into a database;
And inquiring abnormal test results from the database, and sending the abnormal test results to a tester.
5. The method of chaotic engineering of a distributed system according to claim 1, wherein generating fault types involved in corresponding fault points according to server equipment data and a pre-established fault expert database, forming fault test cases, comprises:
Acquiring the type of the server device and the IP of the server device from the data of the server device;
According to the type of the server equipment and the IP of the server equipment, a fault expert library is combined with a pre-established fault expert library to generate fault types related to corresponding fault points, and a fault test case is formed.
6. The method of chaotic engineering of a distributed system according to claim 1, further comprising: and preprocessing the test data and the server equipment data to obtain a mapping relation between the test data and the server equipment data.
7. A chaotic engineering device of a distributed system, comprising:
The acquisition unit is used for acquiring test data and server equipment data of the distributed system through the code embedded points;
the abnormal data test case generation unit is used for replacing the test data with abnormal data to form an abnormal data test case;
The fault test case generation unit is used for generating fault types related to corresponding fault points according to the server equipment data and a pre-established fault expert database to form a fault test case; the fault expert database is a relation among the type of the server equipment, the fault type of the server equipment and the fault occurrence probability;
the test unit is used for executing the abnormal data test case and the fault test case to obtain a test result of the distributed system;
Executing the abnormal data test case and the fault test case to obtain a test result of the distributed system, wherein the test result comprises the following steps: executing the fault test case according to the following method to obtain a test result corresponding to the fault test case: installing chaosblade of the medium on the server to be injected with the fault; judging whether the number of the fault test cases to be tested in the fault test case set is larger than 0; when the number of fault test cases to be tested is greater than 0, initiating high concurrency of preset transactions through a LoadRunner tool or Jmeter tool; executing one fault test case in the fault test case set, and injecting a corresponding fault on a corresponding server IP through ChaosBlade tools according to the fault type of the fault test case; collecting monitoring data when executing the case to obtain a test result corresponding to the fault test case, and storing the test result into a database; inquiring an abnormal test result from the database, and sending the abnormal test result to a tester; and when all fault test cases in the fault test case set are executed, the ChaosBlade medium is withdrawn.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 6 when executing the computer program.
9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program for executing the method of any one of claims 1 to 6.
CN202110603040.5A 2021-05-31 Chaotic engineering method and device for distributed system Active CN113342650B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110603040.5A CN113342650B (en) 2021-05-31 Chaotic engineering method and device for distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110603040.5A CN113342650B (en) 2021-05-31 Chaotic engineering method and device for distributed system

Publications (2)

Publication Number Publication Date
CN113342650A CN113342650A (en) 2021-09-03
CN113342650B true CN113342650B (en) 2024-07-02

Family

ID=

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107861867A (en) * 2017-10-24 2018-03-30 阿里巴巴集团控股有限公司 Page fault monitoring method, device, system and electronic equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107861867A (en) * 2017-10-24 2018-03-30 阿里巴巴集团控股有限公司 Page fault monitoring method, device, system and electronic equipment

Similar Documents

Publication Publication Date Title
US20040153837A1 (en) Automated testing
US20100223446A1 (en) Contextual tracing
CN107302476B (en) Automatic testing method and system for testing asynchronous interactive system
CN107870948A (en) Method for scheduling task and device
CN112395177A (en) Interactive processing method, device and equipment of service data and storage medium
CN111290958B (en) Method and device for debugging intelligent contract
CN111913824B (en) Method for determining data link fault cause and related equipment
KR20180037342A (en) Application software error monitoring, statistics management service and solution method.
US9823999B2 (en) Program lifecycle testing
CN112650676A (en) Software testing method, device, equipment and storage medium
CN114422386B (en) Monitoring method and device for micro-service gateway
Kesim et al. Identifying and prioritizing chaos experiments by using established risk analysis techniques
CN109714249A (en) A kind of method for pushing and relevant apparatus of small routine message
US11341842B2 (en) Metering data management system and computer readable recording medium
CN112506802B (en) Test data management method and system
CN113672452A (en) Method and system for monitoring operation of data acquisition task
CN113342650B (en) Chaotic engineering method and device for distributed system
CN110609761B (en) Method and device for determining fault source, storage medium and electronic equipment
JP2014035595A (en) Testing device for communication system, testing program for communication system, and testing method for communication system
CN112202647A (en) Test method, device and test equipment in block chain network
CN111722917A (en) Resource scheduling method, device and equipment for performance test task
CN113342650A (en) Chaos engineering method and device for distributed system
CN110874319A (en) Automated testing method, automated testing platform, automated testing equipment and computer-readable storage medium
CN112131180B (en) Data reporting method, device and storage medium
US11243857B2 (en) Executing test scripts with respect to a server stack

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant