CN114124759B - Evaluation method and device for distributed system, electronic equipment and storage medium - Google Patents

Evaluation method and device for distributed system, electronic equipment and storage medium Download PDF

Info

Publication number
CN114124759B
CN114124759B CN202111349273.3A CN202111349273A CN114124759B CN 114124759 B CN114124759 B CN 114124759B CN 202111349273 A CN202111349273 A CN 202111349273A CN 114124759 B CN114124759 B CN 114124759B
Authority
CN
China
Prior art keywords
flow
service
link
test
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111349273.3A
Other languages
Chinese (zh)
Other versions
CN114124759A (en
Inventor
陈肇权
林海
马泽政
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202111349273.3A priority Critical patent/CN114124759B/en
Publication of CN114124759A publication Critical patent/CN114124759A/en
Application granted granted Critical
Publication of CN114124759B publication Critical patent/CN114124759B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Abstract

The invention discloses an evaluation method and device for a distributed system, electronic equipment and a storage medium, and relates to the technical field of chaotic engineering, wherein the evaluation method comprises the following steps: the method comprises the steps of obtaining a link test script, executing the link test script, and generating a request message, wherein the link test script is obtained by combining the generated test script with a service link, sending the request message to a system to be tested, performing flow benchmark test and flow comparison test on the system to be tested, collecting operation data in the test process, receiving a response message returned after the system to be tested is tested, and evaluating the state of the system to be tested for flow change in different dimensions based on the response message and the operation data to obtain an evaluation message. The invention solves the technical problems that the influence of flow abnormality on the system is not considered in the implementation process of the chaotic engineering in the related technology, and the capability of the system to deal with the flow change cannot be estimated.

Description

Evaluation method and device for distributed system, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to an evaluation method and apparatus for a distributed system, an electronic device, and a storage medium.
Background
Chaotic engineering is an emerging content in the field of software research and development, and is a subject for carrying out experiments on a system, and is used for knowing the capability of the system to cope with various chaotic conditions in a production environment and establishing confidence on the capability of the system to bear turbulent conditions in the production environment. In the related art, in the process of implementing chaotic engineering, most of the chaotic conditions of resources are focused on simulation, various resources (such as a CPU, a memory, a queue and the like) are randomly injected, so that the conditions of high resource use, down of the resources and the like are caused, the chaotic value of the system environment is increased, and the usability and the robustness performance of the system under the abnormal chaotic environment of the resources are evaluated. However, the existing chaotic engineering implementation method does not consider the influence of traffic storm, unreasonable transaction flow and other conditions on service availability and robustness, has insufficient overall evaluation dimension on system capacity, improves engineering risk and reduces engineering quality.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the invention provides an evaluation method and device for a distributed system, electronic equipment and a storage medium, which at least solve the technical problem that the influence of flow abnormality on the system is not considered in the implementation process of chaotic engineering in the related technology, and the capability of the system for coping with flow change cannot be evaluated.
According to an aspect of an embodiment of the present invention, there is provided an evaluation method for a distributed system, including: acquiring a link test script, executing the link test script, and generating a request message, wherein the link test script is obtained by combining the generated test script with a service link; the request message is sent to a system to be tested, so that flow benchmark test and flow contrast test are carried out on the system to be tested, and operation data in the test process are collected; receiving a response message returned after the system to be tested is tested; based on the response message and the operation data, the state of the system to be tested in different dimensions for flow change is evaluated, and an evaluation message is obtained.
Optionally, before acquiring the link test script, the evaluation method includes: monitoring flow data of a network module on a server by adopting a preset probe strategy, wherein the flow data at least carries a port identifier and a service request; when the port identification indicates a certain service port of the system to be tested, collecting a plurality of service requests in the flow data; analyzing the service request to obtain an analysis result; and traversing the plurality of service requests according to a time sequence by adopting a preset identification algorithm based on the analysis result to obtain the service link.
Optionally, after traversing the service requests according to time sequence based on the analysis result by adopting a preset recognition algorithm to obtain a service link, the evaluation method further includes: and under the condition that the service link appears for the first time, extracting request messages and response messages of all service requests in the service link to obtain a flow sample.
Optionally, after extracting the request message and the response message of all the service requests in the service link to obtain the traffic sample, the evaluation method further includes: carrying out statistical analysis on the service links and the service requests by adopting a preset clustering algorithm and the number of expected categories to obtain link call quantity and request call quantity; and screening the service links corresponding to the link call quantity larger than the first preset threshold value, and screening the service requests corresponding to the request call quantity larger than the second preset threshold value to obtain screening results.
Optionally, after obtaining the screening result, the evaluation method further includes: sorting the screening results into a time sequence request curve according to the time sequence of the execution request; based on a request number proportion and the time sequence request curve, weighting and summing to obtain a time sequence total flow variation curve and a flow composition duty ratio, wherein the request number proportion is a proportion value between the service link and the service request and the total number of requests; and characterizing the time sequence total flow variation curve and the flow composition ratio as a reference scene simulating flow variation.
Optionally, after characterizing the time-series total flow variation curve and the flow composition ratio as a reference scene simulating flow variation, the evaluation method includes: for the reference scene simulating flow variation, generating an initial test script based on the flow sample, wherein the initial test script at least comprises: testing behavior, testing data, assertions; receiving test data input by external equipment, and adjusting the initial test script based on the test data to obtain a target test script; and combining the service link and the target test script to obtain a link test script.
Optionally, executing the link test script, and generating the request message includes: analyzing the link test script to obtain test behaviors and test data; and adopting a preset communication protocol to package the test behavior and the test data into a request message.
Optionally, the step of performing a flow benchmark test and a flow contrast test on the system to be tested includes: generating a link request flow based on the link test script; fitting the link request flow and a reference scene simulating flow variation to obtain a reference test result and a concurrency time sequence curve, and completing flow reference test; and restarting the service request according to the concurrency time sequence curve, randomly injecting flow anomalies on the basis of the reference flow to obtain a flow comparison result, and completing the flow comparison test.
Optionally, the type of flow anomaly includes at least one of: the method comprises the steps of increasing forehand service abnormality, increasing service request abnormality, increasing processing request time consumption abnormality and traffic load abnormality, wherein the forehand service refers to the service of which the execution sequence is arranged in the front in a business link.
Optionally, before evaluating the state of the system under test for flow change in different dimensions based on the response message and the operation data, the evaluation method includes: performing assertion inspection on the response message to obtain an inspection result; and determining that the link test script execution is completed in the case that the checking result indicates that all assertion checks are passed.
Optionally, based on the response message and the operation data, the step of evaluating the state of the system to be tested in different dimensions for the flow change includes: determining a capacity grading threshold value corresponding to each dimension; and analyzing the state of the system to be tested for flow change in different dimensions based on the response message, the operation data and the capacity grading threshold corresponding to each dimension.
According to another aspect of the embodiment of the present invention, there is also provided an evaluation apparatus for a distributed system, including: the system comprises an acquisition unit, a service link and a link test script generation unit, wherein the acquisition unit is used for acquiring the link test script, executing the link test script and generating a request message, and the link test script is obtained by combining the generated test script with the service link; the sending unit is used for sending the request message to a system to be tested so as to perform flow benchmark test and flow comparison test on the system to be tested and collect operation data in the test process; the receiving unit is used for receiving a response message returned after the system to be tested is tested; and the evaluation unit is used for evaluating the state of the system to be tested for flow change in different dimensions based on the response message and the operation data to obtain an evaluation message.
Optionally, the evaluation device includes: the first monitoring module is used for monitoring flow data of the network module on the server by adopting a preset probe strategy before acquiring the link test script, wherein the flow data at least carries a port identifier and a service request; the first acquisition module is used for acquiring a plurality of service requests in the flow data when the port identification indicates a certain service port of the system to be tested; the first analysis module is used for analyzing the service request to obtain an analysis result; and the first traversing module is used for traversing the plurality of service requests according to the time sequence by adopting a preset recognition algorithm based on the analysis result to obtain the service link.
Optionally, the evaluation device further comprises: the first extraction module is used for extracting request messages and response messages of all service requests in the service link under the condition that the service link appears for the first time after traversing the service requests according to time sequence by adopting a preset identification algorithm based on the analysis result to obtain the service link, so as to obtain a flow sample.
Optionally, the evaluation device further comprises: the first analysis module is used for carrying out statistical analysis on the service link and the service request by adopting a preset clustering algorithm and the number of expected categories after extracting request messages and response messages of all service requests in the service link to obtain a flow sample, so as to obtain a link call quantity and a request call quantity; the first screening module is used for screening the service links corresponding to the link call quantity larger than a first preset threshold value, and screening the service requests corresponding to the request call quantity larger than a second preset threshold value, so as to obtain screening results.
Optionally, the evaluation device further comprises: the first sorting module is used for sorting the screening results into a time sequence request curve according to the time sequence of the execution request; the first calculation module is used for obtaining a time sequence total flow variation curve and a flow composition ratio based on a request number proportion and the time sequence request curve through weighted summation, wherein the request number proportion is a proportion value between the service link and the service request and the total number of requests; and the first characterization module is used for characterizing the time sequence total flow variation curve and the flow composition ratio as a reference scene simulating flow variation.
Optionally, the evaluation device includes: a first generation module, configured to generate, for a reference scene simulating flow variation, an initial test script based on the flow sample after characterizing the time-series total flow variation curve and the flow composition ratio as the reference scene simulating flow variation, where the initial test script includes at least: testing behavior, testing data, assertions; the first adjusting module is used for receiving test data input by external equipment and adjusting the initial test script based on the test data to obtain a target test script; and the first combination module is used for combining the service link and the target test script to obtain a link test script.
Optionally, the acquiring unit includes: the first analysis submodule is used for analyzing the link test script to obtain test behaviors and test data; and the first encapsulation submodule is used for encapsulating the test behaviors and the test data into a request message by adopting a preset communication protocol.
Optionally, the transmitting unit includes: the first generation sub-module is used for generating link request flow based on the link test script; the first fitting sub-module is used for fitting the link request flow and a reference scene simulating flow variation to obtain a reference test result and a concurrency time sequence curve, and completing flow reference test; and the first injection submodule is used for reinitiating the service request according to the concurrency time sequence curve, randomly injecting flow anomalies on the basis of the reference flow to obtain a flow comparison result, and completing the flow comparison test.
Optionally, the type of flow anomaly includes at least one of: the method comprises the steps of increasing forehand service abnormality, increasing service request abnormality, increasing processing request time consumption abnormality and traffic load abnormality, wherein the forehand service refers to the service of which the execution sequence is arranged in the front in a business link.
Optionally, the evaluation device includes: the first assertion module is used for carrying out assertion check on the response message before evaluating the state of the system to be tested for flow change in different dimensions based on the response message and the operation data to obtain a check result; and the first determining module is used for determining that the execution of the link test script is completed under the condition that the checking result indicates that all assertion checks are passed.
Optionally, the evaluation unit includes: the first determining submodule is used for determining a capacity grading threshold value corresponding to each dimension; the first analysis submodule is used for analyzing the state of the system to be tested for flow change in different dimensions based on the response message, the operation data and the capacity grading threshold corresponding to each dimension.
According to another aspect of the embodiments of the present invention, there is further provided a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, the device where the computer readable storage medium is controlled to execute the evaluation method for a distributed system according to any one of the above.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device including one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the evaluation method for a distributed system as set forth in any one of the above.
In the method, a link test script is obtained, the link test script is executed, and a request message is generated, wherein the link test script is obtained by combining the generated test script with a service link, the request message is sent to a system to be tested, so that flow benchmark test and flow comparison test are carried out on the system to be tested, operation data in the test process are collected, a response message returned after the system to be tested is received, and based on the response message and the operation data, the state of the system to be tested for flow change in different dimensions is evaluated, so that an evaluation message is obtained. In the method, the flow benchmark test and the flow comparison test can be initiated on the system, the flow anomaly is randomly injected in the flow comparison test process, the flow anomaly scene is manufactured, the performance of the system for coping with the sudden flow anomaly in different dimensions can be effectively observed and evaluated, the engineering risk can be reduced, the engineering quality is improved, and the technical problem that the influence of the flow anomaly on the system is not considered in the chaotic engineering implementation process in the related technology, and the capacity of the system for coping with the flow change cannot be evaluated is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a schematic diagram of an alternative chaotic engineering implementing system according to an embodiment of the present invention;
FIG. 2 is a flow chart of an alternative evaluation method for a distributed system according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a multi-way tree constructed in an alternative recognition process according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an alternative flow-based chaotic engineering implementation method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an alternative evaluation apparatus for a distributed system according to an embodiment of the invention;
fig. 6 is a block diagram of a hardware configuration of an electronic device (or mobile device) for an evaluation method of a distributed system according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
To facilitate an understanding of the invention by those skilled in the art, some terms or nouns involved in the various embodiments of the invention are explained below:
chaos engineering: is a complex technical means for improving the elastic capability of the technical architecture, and aims to establish the confidence of the capability of the system to bear turbulent flow conditions in the production environment.
The following embodiments of the present invention can be applied to the field of chaotic engineering implementation, and are particularly suitable for the field of chaotic engineering implementation of the capability of an evaluation system for coping with flow changes. In the present invention, fig. 1 is a schematic diagram of an alternative chaotic engineering implementation system according to an embodiment of the present invention, including: the system comprises a flow acquisition unit 1, a flow analysis unit 2, a script management unit 3, a chaos implementation unit 4, a chaos evaluation unit 5 and an information storage unit 6, wherein the flow acquisition unit 1 comprises: a flow acquisition module 11, a link analysis module 12 and a flow sample extraction module 13; the flow rate analysis unit 2 includes: a cluster statistics module 21 and a reference scene design module 22; the script management unit 3 includes: a script generation module 31, a script management module 32, and a link script combination module 33; the chaos implementing unit 4 includes: a flow generation module 41, an anomaly injection module 42 and an information acquisition module 43; the chaos evaluation unit 5 includes: a dimension and threshold definition module 51, a grading evaluation module 52 and a prompt and early warning module 53.
The invention identifies the service request and the service link with large contribution to the flow through collection, analysis and conversion of the flow data, forms a reference scene simulating the flow variation in the chaotic engineering, converts the reference scene into a link test script, initiates the flow reference test and the flow comparison test in the evaluation implementation stage, randomly injects the abnormality in the flow comparison test process to manufacture a flow abnormality scene, and simultaneously defines the multidimensional information collection and grading evaluation process of the communication, application and system layers, so as to evaluate the capability level of the system for coping with the chaotic flow variation, effectively observe and evaluate the performance of the system for coping with the sudden flow abnormality, develop grading evaluation on a plurality of dimensionalities such as reliability, expandability, fault tolerance and the like of the distributed system, reduce engineering risks and improve engineering quality.
Example 1
According to an embodiment of the present invention, there is provided an evaluation method embodiment for a distributed system, it being noted that the steps shown in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.
FIG. 2 is a flow chart of an alternative evaluation method for a distributed system, as shown in FIG. 2, according to an embodiment of the invention, the method comprising the steps of:
step S201, obtaining a link test script, executing the link test script, and generating a request message, wherein the link test script is obtained by combining the generated test script with a service link.
Step S202, a request message is sent to a system to be tested, so that a flow benchmark test and a flow contrast test are carried out on the system to be tested, and operation data in the test process are collected.
Step S203, receiving a response message returned after the system to be tested is tested.
Step S204, based on the response message and the operation data, the state of the system to be tested in different dimensions for the flow change is evaluated, and an evaluation message is obtained.
Through the steps, the link test script can be obtained, the link test script is executed, and a request message is generated, wherein the link test script is obtained by combining the generated test script with a service link, the request message is sent to a system to be tested, so that flow benchmark test and flow comparison test are carried out on the system to be tested, operation data in the test process are collected, a response message returned after the system to be tested is received, the state of the system to be tested, which is corresponding to the flow change in different dimensions, is evaluated based on the response message and the operation data, and the evaluation message is obtained. In the embodiment of the invention, the flow benchmark test and the flow comparison test can be initiated on the system, the anomalies are randomly injected in the flow comparison test process, the flow anomaly scene is manufactured, the performance of the system for coping with the sudden flow anomalies in different dimensions can be effectively observed and evaluated, the engineering risk can be reduced, the engineering quality is improved, and the technical problem that the influence of the flow anomalies on the system is not considered in the chaotic engineering implementation process in the related technology, and the capability of the system for coping with the flow changes cannot be evaluated is solved.
The following describes the embodiments of the present invention in detail with reference to the steps described above and attached fig. 1.
In the embodiment of the invention, before acquiring the link test script, the evaluation method comprises the following steps: monitoring flow data of a network module on a server by adopting a preset probe strategy, wherein the flow data at least carries a port identifier and a service request; when the port identification indicates a certain service port of the system to be tested, collecting a plurality of service requests in the flow data; analyzing the service request to obtain an analysis result; based on the analysis result, a preset recognition algorithm is adopted to traverse the plurality of service requests according to the time sequence, and a service link is obtained.
Optionally, after traversing the plurality of service requests according to the time sequence by adopting a preset recognition algorithm based on the analysis result to obtain the service link, the evaluation method further comprises: and under the condition that the service link appears for the first time, extracting request messages and response messages of all service requests in the service link to obtain a flow sample.
In the embodiment of the invention, the traffic data of the network module on the monitoring server can be bypassed by the traffic acquisition unit 1 by using a probe program (namely a preset probe strategy), the service request is acquired, the service link is identified by multidimensional analysis, and the traffic sample in the service link is extracted, wherein the specific process is as follows:
The flow collection module 11 may be deployed on an application server of the system to be tested, and the probe program (i.e. a preset probe policy) is used to circularly monitor flow data on a network card (i.e. a network module on the server), where the flow data at least carries a port identifier and a service request, and the flow collection module 11 collects service requests whose target port is a service port of the system to be tested (i.e. when the port identifier indicates a certain service port of the system to be tested, collect multiple service requests in the flow data), and notify the link analysis module 12 to perform subsequent analysis, where the specific process is as follows:
after collecting the plurality of service requests, the collected service requests may be parsed by the link analysis module 12, and specific parsing contents include, but are not limited to: after resolving the service request, the caller IP, session information, call and called information of the service program, etc. to obtain the resolving result, the caller IP, session information, call and called information of the service program may traverse the service request in time sequence, sort the collected service requests, analyze the service requests by using a preset recognition algorithm, and recognize the associated service requests as ordered links (i.e. service links).
The preset recognition algorithm includes, but is not limited to, the following various recognition algorithms:
(1) The identification algorithm is carried out according to the same communication connection: searching for a SOCKET connection creation operation and a disconnection operation, and identifying multiple service requests between the two operations using the same communication connection as the same link sequence (i.e. the service links indicated in the application) according to time sequence.
(2) According to the same session identification algorithm: the message header of each service request is retrieved, and multiple service requests using the same session are identified as the same link sequence in time sequence.
(3) Pressing the request identification algorithm: analyzing the service request of an upstream calling party and a downstream called service of a service request, regarding the service request of an end user as an entrance request, regarding the downstream service called by the service request as a lower node on a calling link, defining a multi-way tree to save a service calling relation, saving the service request once by each node in the multi-way tree, regarding the upstream calling party as a father node of the service request node, regarding all downstream called services as child nodes thereof, and traversing and outputting each multi-way tree according to a preamble after the analysis is completed, namely, obtaining a link sequence on a time sequence.
Taking the "query user information" link as an example for illustration, a complete business process includes Login (logic), query user information (qryUserInfo), and exit (logo), wherein before querying the user information, it is necessary to check whether the session information is valid (checkSession), before exiting, it is necessary to check whether the user is logged in (checklogic), the entry request is logic, fig. 3 is a schematic diagram of a multi-way tree constructed in an optional identification process according to an embodiment of the present invention, as shown in fig. 3, including: root node login, child node query user information, child node check session information, child node exit, child node check user login.
(4) The method comprises the following steps of: and carrying out inverse sequence analysis on the message BODY (BODY) of each acquired service request, and carrying out custom extraction and comparison on the content of a specific FIELD (FIELD), wherein multiple service requests with consistent content are identified as the same link sequence according to time sequence. For example, the system to be tested allocates a unique service number (TransID) for each service, analyzes the content of the HTTP message body for the service request of the HTTP protocol, acquires the parameter value of the access parameter of the TransID for the GET method, and traverses all the files of the submitted form for the POST method of the submitted form to acquire the corresponding field value with the field name of the TransID.
In the embodiment of the invention, as the characteristics of different service systems are different, a plurality of identification algorithms are needed to be used simultaneously for the identification analysis of the service links, and the link identification algorithms can be correspondingly expanded according to the differences of communication protocols, application layer compositions, service data structures and the like, the plurality of identification algorithms are only for illustration, the invention is not represented to only comprise the link identification algorithms, and other identification algorithms are also applicable to the invention.
For identifying a completed service link, it may be described in terms of a set of service request names on the link, and stored in the information storage unit 6, for example, one service link refers to five service requests, namely, "Login" (log) "," session check "(checkSession)", "query user information (qryUserInfo)", "Login check" (checklog) ", and" exit "(logo)", and then the service link is described as:
"Login-checkSession-qryUserInfo-checkLogin-logo" (qryUserInfo link for short).
In the embodiment of the present invention, whether the service link appears for the first time can be determined according to the analysis result of the link analysis module 12 and according to the description, for the service link that appears for the first time, the detailed request message and the response message (including the header and the body) of all the service requests in the service link can be extracted by the flow sample extraction module 13 in the buffer of the flow acquisition module 11, so as to obtain the flow sample, and the flow sample is stored in the information storage unit 6, and for the service link that appears for the non-first time, the corresponding buffer of the flow acquisition module 11 is cleaned.
Optionally, after extracting the request message and the response message of all the service requests in the service link to obtain the traffic sample, the evaluation method further includes: carrying out statistical analysis on service links and service requests by adopting a preset clustering algorithm and the number of expected categories to obtain the link call quantity and the request call quantity; and screening the service links corresponding to the link call quantity larger than the first preset threshold value, and screening the service requests corresponding to the request call quantity larger than the second preset threshold value to obtain screening results.
Optionally, after obtaining the screening result, the evaluation method further includes: for the screening result, sorting the screening result into a time sequence request curve according to the time sequence of the execution request; based on the request number proportion and a time sequence request curve, weighting and summing to obtain a time sequence total flow change curve and a flow composition ratio, wherein the request number proportion is a proportion value between a service link and service requests and the total number of requests; and characterizing the time sequence total flow change curve and the flow composition ratio as a reference scene simulating flow change.
In the embodiment of the invention, the flow analysis unit 2 can collect the analyzed service request and service link call data for the flow acquisition unit 1, and the service link and service request with large contribution to the flow are screened by utilizing a preset clustering algorithm and a predefined threshold parameter (namely a first preset threshold value and a second preset threshold value), and then the service request call and service link call on a time sequence are summarized according to the granularity in seconds, so as to calculate a total flow variation curve and a flow composition ratio (characterized as a reference scene simulating flow variation) on the time sequence, wherein the reference scene in the embodiment of the invention, namely the link request composition and link request quantity reference of each second in the implementation test process of chaotic engineering, is as follows:
The service request and the service link may be statistically screened by the cluster statistics module 21, the cluster statistics module 21 uses a preset clustering algorithm to count the service link and the service request according to the collection analysis result of the flow collection unit 1, and obtains a multi-class set of the service link request and the service request and respective call volume (i.e. the preset clustering algorithm and the expected class number are adopted to perform statistical analysis on the service link and the service request to obtain the link call volume and the request call volume), and then evaluates the link call volume and the request call volume according to a preset threshold value, screens the service link and the service request with large contribution to the flow (screens the service link corresponding to the link call volume greater than the first preset threshold value, and screens the service request corresponding to the request call volume greater than the second preset threshold value), so as to obtain the screening result.
The preset clustering algorithm may be a common machine learning algorithm, including but not limited to: EM, DBSCAN, K-Means, et al. The K-Means clustering algorithm is more efficient than other clustering algorithms, the K-Means clustering algorithm is taken as an example for description, the algorithm is input into the vectorized service links and the number of expected categories, and the output is the categories and the number of the service links contained in each category, wherein vectorization refers to mapping of the textualized behaviors to vectorized mathematical space through a model and is used for clustering model training, and common word vectorization methods include but are not limited to: bag of words model, word2vec, n-gram, etc.
For example, a large number of service links are already collected in the known information storage unit 6, including the service links identified by the link analysis module 12, the cluster statistics module 21 uses a K-Means clustering algorithm to perform cluster learning with the total number of service link request records and the number of expected categories 3 (where the number of expected categories may be adjusted according to the total number of service links and the accuracy of clustering) stored in the information storage unit 6 as parameters, and then obtains 3 category sets, which specifically Means that after cluster calculation, each service link of the system to be tested is searched for 10021703 times in production, wherein qryUserInfo links are searched for 1000000 times, logic-logo values are searched for 20000 times, and logic-qrybalnce-logo values are searched for 1000 times. Thus, the three traffic links contribute significantly to the traffic, accounting for approximately 99.9% of the total traffic.
In the embodiment of the present invention, for the service links and service requests screened by the cluster statistics module 21, the reference scenario design module 22 may obtain service link and service request details from the traffic collection result stored in the information storage unit 6, sort the service link and service request details into a request curve (i.e. a time sequence request curve) on a time sequence according to the execution time sequence, and then obtain a total traffic variation curve and a traffic composition ratio (characterized as a reference scenario simulating traffic variation) on the time sequence by weighted summation according to the request number proportion of the service links and service requests (the request number proportion is a proportion value between the service links and service requests and the total number of requests) and the time sequence request curve.
Optionally, after characterizing the time series total flow variation curve and the flow composition ratio as the reference scene simulating the flow variation, the evaluation method includes: for a reference scene simulating flow variation, generating an initial test script based on a flow sample, wherein the initial test script at least comprises: testing behavior, testing data, assertions; receiving test data input by external equipment, and adjusting an initial test script based on the test data to obtain a target test script; and combining the service link and the target test script to obtain a link test script.
In the embodiment of the invention, the script management unit 3 can generate a pressure test script (i.e. an initial test script) of a service request according to the flow sample acquired by the flow acquisition unit 1 aiming at a service link and the service request related to a reference scene, and the pressure test script is combined into a link test script after completion of chaotic engineering staff, and the specific process is as follows:
according to the request response details (i.e. the traffic samples) of the service samples of the traffic link extracted by the traffic acquisition unit 1, an initial test script of the service request is generated, network interaction actions can be extracted from the request messages (i.e. the request messages in the traffic samples) through the script generation module 31, test behaviors are defined, and then the content of the interaction actions is subjected to anti-formatting operation by using the content analyzer, so that test data are extracted.
The initial test script in the embodiment of the invention at least comprises: test behavior, test data, assertions, etc., are as follows:
(1) The test behavior is a series of operations on the engineering to be tested, describing the use behavior of a real user on the engineering to be tested, and simulating the use pressure of the user on the engineering to be tested by playing back the test behavior in the test process. Taking the BS architecture WEB services system as an example, the test behavior is a series of HTTP request calls to system specific URIs.
(2) The test data are data which are used in the test process, have certain business meanings and meet the requirements of cases. Taking a WEB service system of a BS architecture as an example, test data is a series of parameter assignment to a system specific URI, and the test data is expressed in forms of form KEY-VALUE set, JSON, XML and the like.
(3) The assertion is a boolean expression, which can be composed of three parts of an actual value, an expected relation and an expected value, wherein the actual value and the expected value are numerical expressions or character strings, four arithmetic operations and variable references can be contained, when the assertion is checked, the variable replacement and the four arithmetic operations of the actual value and the expected value are finished first, and finally, whether the numerical values of the actual value and the expected value meet the expected relation given by the assertion is judged, if so, the assertion check is passed, otherwise, the assertion check is failed.
In the embodiment of the present invention, the script management module 32 receives the completion of the script by the engineering implementation personnel (that is, receives the test data input by the external device, and adjusts the initial test script based on the test data) in an interactive manner, so as to obtain the target test script, and stores the target test script in the information storage unit 6.
In the embodiment of the present invention, the link script combining module 33 may combine the service request script (i.e. the target test script) generated by the script generating module 31 and perfected by the script management module 32 into the link test script (i.e. combine the service link and the target test script to obtain the link test script) according to the service link identified by the link analysis module 12 in the traffic collection unit 1.
For example, for qryUserInfo links under HTTP protocol, a total of 5 traffic samples are involved in script generation and link script combination, where checkSession and checklog are GET methods and the rest are POST methods.
Step S201, obtaining a link test script, executing the link test script, and generating a request message, wherein the link test script is obtained by combining the generated test script with a service link.
Optionally, executing the link test script, and generating the request message includes: analyzing the link test script to obtain test behaviors and test data; and adopting a preset communication protocol to package the test behavior and the test data into a request message.
In the embodiment of the present invention, after obtaining the link test script (the link test script is obtained by combining the generated test script with the service link), the flow generating module 41 may parse the link test script (may parse the link test script to obtain the test behavior and the test data) generated by the script management unit 3 and execute the link test script in a loop, and when executing each time in the loop, the flow generating module 41 encapsulates the test behavior and the test data into a request message conforming to the communication protocol (that is, adopts the preset communication protocol to encapsulate the test behavior and the test data into the request message).
Step S202, a request message is sent to a system to be tested, so that a flow benchmark test and a flow contrast test are carried out on the system to be tested, and operation data in the test process are collected.
In the embodiment of the invention, after the request message is packaged, the request message can be sent to the system to be tested according to the flow composition ratio defined by the reference scene by using a specified communication protocol, and the response is received in a synchronous blocking waiting mode.
In the embodiment of the present invention, the chaotic implementing unit 4 may initiate a request based on a link test script, and implement a chaotic engineering test, including: the system comprises a flow reference test and a flow comparison test, wherein the flow reference test is used for analyzing a fitting result after fitting a link request flow and a preset reference scene, and the flow comparison test is used for analyzing flow change data after injecting abnormal flow into a current network.
In addition, the information collection module 43 may continuously collect multidimensional data (i.e., running data) from the flow generation module 41 and the system to be tested through a plurality of monitoring threads during the test implementation process, and store the multidimensional data in the information storage unit 6 as a data support for chaos evaluation. In an embodiment of the present invention, the multidimensional data collected by information collection module 43 includes, but is not limited to: service success rate, service availability, service capability, queue queuing congestion, application resources, system resource consumption. For example:
(1) Service availability acquisition: and sending a Health Check instruction to each node of the system to be tested at regular time, and checking whether the service node can normally provide service.
(2) Service capability collection: collecting service capability indicators from the traffic generation module 41 includes: success rate of a particular service (assertion checking all pass is considered successful in service invocation), average time consumption of service requests, throughput of service requests, etc.
(3) And (3) application resource acquisition: by using the injected monitoring instruction, the application software resources supporting the normal operation of the system to be tested, especially the resources with limited total resources (such as stack memory which is independently managed) or the resources which are managed by a resource pool mode (such as a connection pool, a thread pool, an internal queue and the like), are collected, and the use amount of the resources and the waiting congestion time of a resource requester are monitored.
(4) And (3) collecting system resource consumption: continuously calling a resource acquisition command of an operating system, and acquiring hardware resource data of a server system from the angle of the operating system, including but not limited to: CPU, memory, network read-write quantity, disk IO, etc.
For example, for the Linux operating system, when the information collection module 43 collects the system resource consumption information, the monitoring process is defined to continuously call the vmstat 1 command, so as to obtain the dimension CPU consumption data of user, sys, wa and the like.
Optionally, the step of performing a flow benchmark test and a flow contrast test on the system to be tested includes: generating link request flow based on the link test script; fitting a reference scene of link request flow and simulated flow variation to obtain a reference test result and a concurrency time sequence curve, and completing flow reference test; and restarting the service request according to the concurrency time sequence curve, randomly injecting flow anomalies on the basis of the reference flow to obtain a flow comparison result, and completing the flow comparison test.
In the embodiment of the present invention, the flow generating module 41 generates the link request flow by executing the link test script, and then, the flow generating module 41 may initiate the flow benchmark test, dynamically adjust the concurrency request number, control the link request flow to fit with the benchmark scene simulating the flow variation, and obtain the benchmark result and the concurrency request number change curve (i.e. concurrency time sequence curve) on time, thereby completing the flow benchmark test. The flow generating module 41 can reinitiate the service request according to the same concurrency time sequence curve of the reference test, and the abnormal flow can be randomly injected into the flow through the abnormal injection module 42 in the test process so as to increase the integral chaos degree of the system flow, obtain the flow comparison result and finish the flow comparison test.
Optionally, the type of flow anomaly includes at least one of: the front-hand service is services which are sequentially arranged in the front in the service link, wherein the front-hand service is services which are sequentially arranged in the front in the service link.
In the embodiment of the present invention, the abnormal flow can be randomly injected by the abnormal injection module 42 on the basis of the flow reference test, so as to increase the overall chaos degree of the flow requested by the system to be tested, and the abnormal injection module 42 may include two parts: the flow control system comprises a command communication component which is communicated with a flow generation module 41, and a flow forwarding component between the flow generation module 41 and a system to be tested, wherein the command communication component randomly selects a service request, generates a flow control command to the flow generation module 41, and makes anomalies such as incomplete links, flow amplification and the like from a flow source; the flow forwarding component works on the network transmission layer according to the principle of proxy forwarding, randomly selects service requests, and makes anomalies such as time consumption, unbalanced load and the like from the communication transmission perspective.
The flow anomalies in the embodiment of the invention comprise a series of anomalies related to service flow scale, composition, load and the like and affecting the processing capacity and performance of the system to be tested, and the types of the flow anomalies can comprise the following various anomaly types, but the invention is not represented to only comprise the following various flow anomalies:
(1) Forehand service anomaly increases: the simulated front hand service is greatly increased due to the calling party, and the back hand service of the congested link is normally executed and is spread to the abnormality of other links through the public resource. In specific implementation, for the front hand service in the service link (refer to the service with the execution sequence in the service link arranged in the front), the flow generating module 41 receives the instruction of the anomaly injecting module 42, and in the execution process of the link script, some concurrent threads are randomly selected, and after the front hand service is completed, the return is interrupted, and the back hand service is not executed, so that the request amount of the front hand service is indirectly increased.
For example, when a forehand service surge is injected into a qryUserInfo link, a plurality of concurrent threads are randomly selected, the return is interrupted after the Loign-checkSession request is sequentially executed, the Loign request is re-executed, and the request quantity of the Loign and checkSession is increased.
(2) The service request is abnormally increased: the simulation causes a sudden increase in a large number of service requests due to the caller, causes a rapid increase in traffic and network connection numbers, and propagates to anomalies of other links through common resources. In the implementation, the flow generation module 41 receives the instruction of the anomaly injection module 42, newly adds some concurrent threads to initiate flow requests, directly increases the link request quantity, and amplifies the whole flow scale.
For example, when the traffic storm is abnormal to the qryUserInfo link, the newly added concurrent thread executes the qryUserInfo link script, so that the whole traffic scale and the duty ratio of the qryUserInfo link in the whole traffic are increased.
(3) Processing requests is abnormally increased in time consumption: simulation is an anomaly that, due to network or service provider reasons, the request time is long, the service capacity is reduced, and the request propagates to other links through the common resources. The flow forwarding means of the anomaly injection module 42 randomly selects part of service requests when forwarding the requests, sleeps for a period of time after receiving response messages, and forwards the response messages to a calling party, and delays sending of the ACK responses and instructions such as link terminals in the communication process, so that the situations of network congestion and long processing time consumption of service providers are simulated.
For example, when the qryUserInfo link is injected with a long and abnormal service, the traffic forwarding component randomly selects an instance of the qryUserInfo link request, and when receiving a checkSession request response message, the traffic forwarding component sleeps for 10 seconds and returns a source address, so that the qryUserInfo link request is slow, the application lock resource is slowly released, and congestion occurs.
(4) Traffic load anomalies: and simulating the exception that the request congestion and the service capacity drop on the service node are caused by the increase of the request quantity or the slow processing of the specific service node, when the node traffic pressure is overlarge, the node is down, the processing pressure of the rest nodes is increased due to the overflowed traffic request, and finally all the nodes are flushed by traffic. The flow forwarding means of the anomaly injection module 42 selects a surviving server (Active, capable of providing service normally), randomly selects part of service requests when forwarding the requests, modifies the destination address of the request message into the selected server, and reselects a surviving server when the selected server is down due to excessive flow, and continues the above operation.
For example, when traffic load imbalance is injected into the qryUserInfo link, the traffic forwarding component randomly selects an instance of the qryUserInfo link request, modifies the request destination address (http:// www.) to a specific application server IP address, and causes the request traffic for that IP address to be much greater than the other servers in the cluster.
Step S203, receiving a response message returned after the system to be tested is tested.
Optionally, before evaluating the state of the system to be tested against the flow change in different dimensions based on the response message and the operation data, the evaluation method includes: performing assertion inspection on the response message to obtain an inspection result; in the event that the inspection results indicate that all assertion inspection passes, it is determined that link test script execution is complete.
In the embodiment of the present invention, after receiving the response message, the flow generation module 41 parses the response message, checks whether the assertion passes, and determines that the execution result of the link test script passes when all assertion checks pass.
Step S204, based on the response message and the operation data, the state of the system to be tested in different dimensions for the flow change is evaluated, and an evaluation message is obtained.
Optionally, based on the response message and the operation data, the step of evaluating the state of the system to be tested in different dimensions for the flow change comprises: determining a capacity grading threshold value corresponding to each dimension; based on the response message, the operation data and the capacity grading threshold corresponding to each dimension, analyzing the state of the system to be tested for flow change in different dimensions.
In the embodiment of the invention, the chaotic engineering implementation process can be evaluated by the chaotic evaluation unit 5, the embodiment of the invention defines a plurality of concerned dimensions such as system availability, service capability, abnormal phenomenon, abnormal isolation and the like based on engineering concepts, and defines a grading threshold value of each dimension (namely, determines the capability grading threshold value corresponding to each dimension), in the evaluation implementation process, the chaotic evaluation unit 4 evaluates the capability rating of a system to be tested in different concerned dimensions (namely, analyzes the state of the system to be tested in different dimensions for flow change) according to a benchmark test and a comparison test result (namely, response message) output by the chaotic implementation unit 4 and multidimensional operation data in the test process, and sends prompt and early warning to engineering implementation personnel for the dimensions with low capability grades, wherein the specific process is as follows:
the capability evaluation dimension of the system to be tested for coping with the flow abnormality and the capability grading threshold of each specific dimension are defined by a dimension and threshold definition module 51 and used as evaluation basis for chaotic engineering implementation. For each dimension, a capability level is defined, along with a demarcation threshold for each level. For example, the following presents a set of capability assessment dimensions and threshold sets that are used only for descriptive purposes and for ease of understanding, and are not intended to represent that the present invention can only be used with such a set of capability assessment dimensions and threshold sets:
(1) System availability: high-grade: no system unavailability phenomenon (Health Check returns false, supra) occurs in the whole course; intermediate level: the phenomenon that the system is not available once occurs when the abnormal injection is ended, but the system is recovered to be available after the abnormal injection is ended; low-grade: the system is not available during the abnormal injection and is not recovered after the abnormal injection is terminated.
(2) Service capability (flow comparison test vs. flow benchmark): high-grade: the server has strict flow control protection, and the service capacity such as throughput, time consumption and the like is not affected when abnormal injection is performed; intermediate level: the service capacity of throughput, single time consumption and the like is reduced by not more than 50% during abnormal injection, and the service capacity is recovered to be normal after the abnormal injection is finished; low-grade: the service capacity of throughput, time consumption and the like is reduced by more than 50% during the abnormal injection, and the service capacity cannot be recovered to be normal after the abnormal injection is finished.
(3) Resource consumption: high-grade: when abnormal injection is carried out, the consumption of the application and system resources is not affected, and the application of additional resources is directly returned to the application failure; intermediate level: when abnormal injection is carried out, the consumption of application and system resources is not higher than 50%, the waiting time of resource application is not higher than 1 second, and the queuing depth is not higher than 200% of the daily depth; low-grade: when abnormal injection is carried out, the consumption of application and system resources is higher than 50%, the resource application waits for more than 1 second, and the queuing depth exceeds 200% of the daily depth.
(4) Abnormality isolation: high-grade: the server side is provided with service level abnormal isolation, and abnormal service does not cause abnormal service capacity of other services; intermediate level: when abnormal injection is carried out, the service capacity of irrelevant services is not more than 20% affected (the service capacity such as throughput, single time consumption and the like is reduced by more than 20%, and the service capacity is considered to be affected); low-grade: upon abnormal injection, the service capacity of more than 20% of irrelevant services is affected.
(5) Abnormal phenomenon: high-grade: the system to be tested has no unexpected abnormal phenomenon; intermediate level: the system to be tested has less than 5 types of abnormal phenomena beyond expectation; intermediate level: the system to be tested has less than 10 types of unexpected anomalies, wherein the anomalies refer to other anomalies which do not occur in the benchmark test process and cannot be included in the dimensions of the classes.
In the embodiment of the invention, the classification evaluation module 52 performs the classification evaluation of the chaotic engineering of the system to be tested according to the data acquired by the information acquisition module 43 and the threshold value preconfigured by the dimension and threshold value definition module 51. For example, the hierarchical evaluation method of the hierarchical evaluation module 52 is described in terms of exemplary dimensions and thresholds of the dimension and threshold definition module 51:
(1) System availability evaluation: traversing the system health check result in the benchmark comparison test process according to the time sequence in a reverse order, wherein after the reverse order traversal is started, the first system health check result is false, and the system availability is evaluated as low-level; when the first system health check result in the reverse order is true, but the condition that the system health check result is false appears in the traversal process, evaluating the system health check result as intermediate; the result of the system health check which is not found after the traversal is false, and the result is evaluated as high-grade
(2) Service capability evaluation: the service test results of the query traffic generation module 41 are aligned in time series and then compared and evaluated from the lateral and longitudinal angles.
1) Lateral contrast refers to: the dimensions of throughput, average response time, etc. are calculated in seconds, the ratio of change of data compared with the last second is then traversed in chronological order, and an evaluation is made according to the hierarchical threshold exemplified by the dimension and threshold definition module 41.
2) The longitudinal comparison refers to calculating the service capability difference ratio of the flow benchmark test and the flow comparison test at the same moment according to three granularities of the overall, the link and the service, traversing according to a time sequence, and evaluating according to the grading threshold value exemplified by the dimension and threshold value definition module 41.
(3) Evaluation of resource consumption: and calculating the change proportion of the data compared with the last second according to the application resource consumption and system resource consumption information acquired by the information acquisition module 43 by taking seconds as a unit. Then traverses in a time sequential order, evaluating according to the hierarchical thresholds exemplified by the dimension and threshold definition module 41.
(4) Abnormal isolation evaluation: based on the result of the service granularity of the longitudinal comparison, after the abnormal service is removed according to the time sequence, the number of the services evaluated as the middle level and the low level is counted, namely the affected irrelevant service, and after the counting is completed, the evaluation is made according to the grading threshold exemplified by the dimension and threshold definition module 41.
(5) During and after the test, the number of unexpected abnormal phenomena is observed manually, and high-level, medium-level and low-level evaluation is performed according to the example grading threshold.
In the embodiment of the present invention, the prompt early warning module 43 may traverse the grading evaluation result, and if the conditions of "middle-level" and "low-level" evaluation exist, early warning may be performed on the engineering implementation personnel, for example, the engineering implementation personnel may be notified in a mail early warning manner.
According to the embodiment of the invention, through acquisition, analysis and conversion of production flow, service requests and service links with large contribution to the flow are identified, a reference flow scene of the chaotic engineering is formed, and converted into a link test script, in the evaluation implementation stage, the embodiment of the invention initiates the flow reference test and the flow comparison test, anomalies are randomly injected in the flow comparison test process, the flow anomaly scene is manufactured, the capability level of a system for coping with chaotic flow changes is evaluated on a plurality of concerned dimensions, the embodiment of the invention not only makes beneficial expansion on the concept of the traditional chaotic engineering, enriches the implementation method and objects of the chaotic engineering, but also can effectively observe and evaluate the performance of the system for coping with the sudden flow anomalies, and develop graded evaluation on a plurality of dimensions such as reliability, expandability, fault tolerance and the like of the distributed system, reduce engineering risks and improve engineering quality.
Example two
FIG. 4 is a schematic diagram of an alternative flow-based chaotic engineering implementation method according to an embodiment of the present invention, as shown in FIG. 4, including the steps of:
step one: flow monitoring and acquisition:
in the embodiment of the invention, the flow acquisition unit monitors the network card data of the middleware (such as an application server) of the TCP application layer in the form of a bypass probe, acquires service request flow information, and stores the service request flow information in the data storage unit.
Step two: link relation analysis:
in the embodiment of the invention, a plurality of association requests in the traffic are identified as ordered links from a plurality of dimensions such as communication connection, session, entrance request, custom message content and the like.
Step three: and (3) extracting a flow sample:
in the embodiment of the invention, for the first-occurring link, a detailed request message and a response message of all service requests in the link are extracted and stored as traffic samples.
Step four: and (3) traffic clustering statistics:
in the embodiment of the invention, the link request and the service request are counted by using a clustering algorithm, the link call quantity and the request call quantity are evaluated according to a certain threshold value, and the link and the service request with large contribution to the flow are screened.
Step five: and (3) designing a reference flow scene:
In the embodiment of the invention, the call quantity information of each link and service is integrated according to time sequence for the screened link and service request to form a simulation scene (namely a reference scene) of the reference flow.
Step six: perfecting service request chain script:
in the embodiment of the invention, the service request script is generated based on the traffic sample and perfected by a script developer, and the request script is combined into a request chain by combining the traffic scene and the link design.
Step seven: initiating a flow benchmark test:
in the embodiment of the invention, a flow generation module initiates a benchmark test, dynamically adjusts the concurrency request number in the process, controls link request flow to fit with a benchmark scene, and obtains a concurrency request number change curve (namely a concurrency time sequence curve) on a time sequence as a benchmark result. For each service request, a server response is obtained and checked for assertion passing.
Step eight: initiating a flow contrast test:
in the embodiment of the invention, the flow generation module initiates the request according to the concurrency time sequence curve. In the test process, for each service request, the response of the service end is also obtained and whether the assertion passes or not is checked.
Step nine: flow anomaly injection:
In the embodiment of the invention, the abnormal injection module continuously injects flow abnormality at the transmission layer and the application layer at random, thereby increasing the integral chaos degree of the system.
Step ten: and (3) collecting the running conditions:
in the embodiment of the invention, the multidimensional data of the system to be tested is collected in the test process, including but not limited to: service success rate, service availability, service capability, queue queuing congestion, resource consumption, and the like.
Step eleven: chaos engineering evaluation:
in the embodiment of the invention, based on the collected multidimensional data, the chaotic engineering grading evaluation is carried out on the system to be tested according to a preset threshold value.
The embodiment of the invention provides a flow-based chaotic engineering implementation method, which comprises the steps of collecting, analyzing and converting production flow, initiating reference flow, injecting abnormality and collecting information in the chaotic engineering implementation process, and evaluating the capability of a chaotic engineering implementation object.
Example III
An evaluation apparatus for a distributed system provided in this embodiment includes a plurality of implementation units, each of which corresponds to each implementation step in the above-described embodiment.
FIG. 5 is a schematic diagram of an alternative evaluation apparatus for a distributed system according to an embodiment of the present invention, as shown in FIG. 5, the evaluation apparatus may include: an acquisition unit 50, a transmission unit 51, a reception unit 52, an evaluation unit 53, wherein,
the obtaining unit 50 is configured to obtain a link test script, and execute the link test script to generate a request message, where the link test script is obtained by combining the generated test script with a service link;
the sending unit 51 is configured to send a request packet to a system to be tested, so as to perform a flow benchmark test and a flow contrast test on the system to be tested, and collect operation data in the testing process;
the receiving unit 52 is configured to receive a response message returned after the system to be tested is tested;
the evaluation unit 53 is configured to evaluate the state of the system to be tested in different dimensions for the flow change based on the response message and the operation data, and obtain an evaluation message.
The evaluation device can acquire a link test script through the acquisition unit 50, execute the link test script, and generate a request message, wherein the link test script is obtained by combining the generated test script with a service link, sends the request message to a system to be tested through the sending unit 51 so as to perform flow benchmark test and flow comparison test on the system to be tested, collects operation data in the test process, receives a response message returned after the test of the system to be tested is completed through the receiving unit 52, evaluates the state of the system to be tested for flow change in different dimensions based on the response message and the operation data through the evaluation unit 53, and obtains the evaluation message. In the embodiment of the invention, the flow benchmark test and the flow comparison test can be initiated on the system, the anomalies are randomly injected in the flow comparison test process, the flow anomaly scene is manufactured, the performance of the system for coping with the sudden flow anomalies in different dimensions can be effectively observed and evaluated, the engineering risk can be reduced, the engineering quality is improved, and the technical problem that the influence of the flow anomalies on the system is not considered in the chaotic engineering implementation process in the related technology, and the capability of the system for coping with the flow changes cannot be evaluated is solved.
Optionally, the evaluation device includes: the first monitoring module is used for monitoring the flow data of the network module on the server by adopting a preset probe strategy before acquiring the link test script, wherein the flow data at least carries a port identifier and a service request; the first acquisition module is used for acquiring a plurality of service requests in the flow data when the port identification indicates a certain service port of the system to be tested; the first analysis module is used for analyzing the service request to obtain an analysis result; the first traversing module is used for traversing the plurality of service requests according to the time sequence by adopting a preset recognition algorithm based on the analysis result to obtain a service link.
Optionally, the evaluation device further includes: the first extraction module is used for extracting request messages and response messages of all service requests in the service link under the condition that the service link appears for the first time after traversing the service requests according to time sequence by adopting a preset identification algorithm based on the analysis result to obtain the service link, so as to obtain a flow sample.
Optionally, the evaluation device further includes: the first analysis module is used for carrying out statistical analysis on the service link and the service requests by adopting a preset clustering algorithm and the number of expected categories after extracting request messages and response messages of all service requests in the service link to obtain a flow sample, so as to obtain a link call quantity and a request call quantity; the first screening module is used for screening the service links corresponding to the link call quantity larger than the first preset threshold value, and screening the service requests corresponding to the request call quantity larger than the second preset threshold value, so as to obtain screening results.
Optionally, the evaluation device further includes: the first sorting module is used for sorting the screening results into a time sequence request curve according to the time sequence of the execution request after the screening results are obtained; the first calculation module is used for obtaining a time sequence total flow variation curve and a flow composition ratio based on a request number proportion and a time sequence request curve through weighted summation, wherein the request number proportion is a proportion value between a service link and service requests and the total number of requests; the first characterization module is used for characterizing the time sequence total flow variation curve and the flow composition ratio as a reference scene simulating flow variation.
Optionally, the evaluation device includes: the first generation module is configured to generate an initial test script based on a flow sample for a reference scene simulating flow variation after characterizing a time-series total flow variation curve and a flow composition ratio as the reference scene simulating flow variation, where the initial test script at least includes: testing behavior, testing data, assertions; the first adjusting module is used for receiving test data input by the external equipment and adjusting an initial test script based on the test data to obtain a target test script; and the first combination module is used for combining the service link and the target test script to obtain a link test script.
Optionally, the acquiring unit includes: the first analysis submodule is used for analyzing the link test script to obtain test behaviors and test data; and the first encapsulation submodule is used for encapsulating the test behaviors and the test data into a request message by adopting a preset communication protocol.
Optionally, the transmitting unit includes: the first generation sub-module is used for generating link request flow based on the link test script; the first fitting sub-module is used for fitting the link request flow and a reference scene simulating flow variation to obtain a reference test result and a concurrency time sequence curve, and completing flow reference test; the first injection submodule is used for reinitiating the service request according to the concurrency time sequence curve, randomly injecting flow anomalies on the basis of the reference flow to obtain a flow comparison result, and completing the flow comparison test.
Optionally, the type of flow anomaly includes at least one of: the front-hand service is services which are sequentially arranged in the front in the service link, wherein the front-hand service is services which are sequentially arranged in the front in the service link.
Optionally, the evaluation device includes: the first assertion module is used for carrying out assertion check on the response message before evaluating the state of the system to be tested for flow change in different dimensions based on the response message and the operation data to obtain a check result; and the first determining module is used for determining that the execution of the link test script is completed under the condition that the checking result indicates that all assertion checks are passed.
Optionally, the evaluation unit comprises: the first determining submodule is used for determining a capacity grading threshold value corresponding to each dimension; the first analysis submodule is used for analyzing the state of the system to be tested for flow change in different dimensions based on the response message, the operation data and the capacity grading threshold corresponding to each dimension.
The above-described evaluation device may further include a processor and a memory, the above-described acquisition unit 50, transmission unit 51, reception unit 52, evaluation unit 53, and the like are stored in the memory as program units, and the processor executes the above-described program units stored in the memory to realize the corresponding functions.
The processor includes a kernel, and the kernel fetches a corresponding program unit from the memory. The kernel can be provided with one or more than one, and the state of the system to be tested for flow change in different dimensions is evaluated by adjusting kernel parameters.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), which includes at least one memory chip.
The present application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: the method comprises the steps of obtaining a link test script, executing the link test script, and generating a request message, wherein the link test script is obtained by combining the generated test script with a service link, sending the request message to a system to be tested, performing flow benchmark test and flow comparison test on the system to be tested, collecting operation data in the test process, receiving a response message returned after the system to be tested is tested, and evaluating the state of the system to be tested for flow change in different dimensions based on the response message and the operation data to obtain an evaluation message.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device including one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the evaluation method for a distributed system of any of the above.
Fig. 6 is a block diagram of a hardware configuration of an electronic device (or mobile device) for an evaluation method of a distributed system according to an embodiment of the present invention. As shown in fig. 6, the electronic device may include one or more processors 102 (shown as 102a, 102b, … …,102 n) and a memory 104 for storing data (the processor 102 may include, but is not limited to, a microprocessor MCU, a programmable logic device FPGA, etc.). In addition, the method may further include: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a keyboard, a power supply, and/or a camera. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 6 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the electronic device may also include more or fewer components than shown in FIG. 6, or have a different configuration than shown in FIG. 6.
According to another aspect of the embodiments of the present invention, there is also provided a computer readable storage medium, including a stored computer program, where the computer program when executed controls a device in which the computer readable storage medium is located to perform the evaluation method for a distributed system according to any one of the above.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (13)

1. An evaluation method for a distributed system, comprising:
obtaining a link test script, executing the link test script to generate a request message, wherein the link test script is obtained by combining the generated test script with a service link,
the service link is obtained through the following steps: monitoring flow data of a network module on a server by adopting a preset probe strategy, wherein the flow data at least carries a port identifier and a service request; when the port identification indicates a certain service port of the system to be tested, collecting a plurality of service requests in the flow data; analyzing the service request to obtain an analysis result; traversing a plurality of service requests according to a time sequence by adopting a preset identification algorithm based on the analysis result to obtain the service links;
The request message is sent to the system to be tested, so that a flow reference test and a flow comparison test are carried out on the system to be tested, and operation data in the test process are collected, wherein the flow reference test is used for analyzing a fitting result after fitting a link request flow and a preset reference scene, and the flow comparison test is used for analyzing flow change data after injecting abnormal flow into a current network;
receiving a response message returned after the system to be tested is tested;
based on the response message and the operation data, the state of the system to be tested in different dimensions for flow change is evaluated, and an evaluation message is obtained.
2. The evaluation method according to claim 1, wherein after traversing the plurality of service requests in time sequence based on the analysis result using a preset recognition algorithm to obtain a service link, the evaluation method further comprises:
and under the condition that the service link appears for the first time, extracting request messages and response messages of all service requests in the service link to obtain a flow sample.
3. The method according to claim 2, wherein after extracting the request message and the response message of all service requests in the service link to obtain the traffic sample, the method further comprises:
Carrying out statistical analysis on the service links and the service requests by adopting a preset clustering algorithm and the number of expected categories to obtain link call quantity and request call quantity;
and screening the service links corresponding to the link call quantity larger than the first preset threshold value, and screening the service requests corresponding to the request call quantity larger than the second preset threshold value to obtain screening results.
4. The evaluation method according to claim 3, wherein after obtaining the screening result, the evaluation method further comprises:
sorting the screening results into a time sequence request curve according to the time sequence of the execution request;
based on a request number proportion and the time sequence request curve, weighting and summing to obtain a time sequence total flow variation curve and a flow composition duty ratio, wherein the request number proportion is a proportion value between the service link and the service request and the total number of requests;
and characterizing the time sequence total flow variation curve and the flow composition ratio as a reference scene simulating flow variation.
5. The evaluation method according to claim 4, wherein after characterizing the time-series total flow variation curve and the flow composition ratio as a reference scene simulating flow variation, the evaluation method comprises:
For the reference scene simulating flow variation, generating an initial test script based on the flow sample, wherein the initial test script at least comprises: testing behavior, testing data, assertions;
receiving test data input by external equipment, and adjusting the initial test script based on the test data to obtain a target test script;
and combining the service link and the target test script to obtain a link test script.
6. The method of evaluating according to claim 1, wherein the step of executing the link test script to generate the request message comprises:
analyzing the link test script to obtain test behaviors and test data;
and adopting a preset communication protocol to package the test behavior and the test data into a request message.
7. The method of evaluating according to claim 1, wherein the step of performing a flow benchmark test and a flow contrast test on the system under test comprises:
generating a link request flow based on the link test script;
fitting the link request flow and a reference scene simulating flow variation to obtain a reference test result and a concurrency time sequence curve, and completing flow reference test;
And restarting the service request according to the concurrency time sequence curve, randomly injecting flow anomalies on the basis of the reference flow to obtain a flow comparison result, and completing the flow comparison test.
8. The assessment method according to claim 7, wherein the type of flow anomaly comprises at least one of: the method comprises the steps of increasing forehand service abnormality, increasing service request abnormality, increasing processing request time consumption abnormality and traffic load abnormality, wherein the forehand service refers to the service of which the execution sequence is arranged in the front in a business link.
9. The evaluation method according to claim 1, wherein before evaluating a state of the system under test against a flow change in different dimensions based on the response message and the operation data, the evaluation method comprises:
performing assertion inspection on the response message to obtain an inspection result;
and determining that the link test script execution is completed in the case that the checking result indicates that all assertion checks are passed.
10. The method according to claim 1, wherein the step of evaluating the state of the system under test for traffic variation in different dimensions based on the response message and the operation data comprises:
Determining a capacity grading threshold value corresponding to each dimension;
and analyzing the state of the system to be tested for flow change in different dimensions based on the response message, the operation data and the capacity grading threshold corresponding to each dimension.
11. An evaluation apparatus for a distributed system, comprising:
an obtaining unit, configured to obtain a link test script, and execute the link test script to generate a request packet, where the link test script is obtained by combining the generated test script with a service link,
the evaluation device includes: the first monitoring module is used for monitoring flow data of the network module on the server by adopting a preset probe strategy before acquiring the link test script, wherein the flow data at least carries a port identifier and a service request; the first acquisition module is used for acquiring a plurality of service requests in the flow data when the port identification indicates a certain service port of the system to be tested; the first analysis module is used for analyzing the service request to obtain an analysis result; the first traversing module is used for traversing the plurality of service requests according to time sequence by adopting a preset recognition algorithm based on the analysis result to obtain the service links;
The sending unit is used for sending the request message to a system to be tested so as to perform flow benchmark test and flow comparison test on the system to be tested, and collecting operation data in the test process, wherein the flow benchmark test is used for analyzing a fitting result after fitting a link request flow and a preset benchmark scene, and the flow comparison test is used for analyzing flow change data after injecting abnormal flow into a current network;
the receiving unit is used for receiving a response message returned after the system to be tested is tested;
and the evaluation unit is used for evaluating the state of the system to be tested for flow change in different dimensions based on the response message and the operation data to obtain an evaluation message.
12. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the evaluation method for a distributed system according to any one of claims 1 to 10.
13. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of evaluating for a distributed system of any of claims 1-10.
CN202111349273.3A 2021-11-15 2021-11-15 Evaluation method and device for distributed system, electronic equipment and storage medium Active CN114124759B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111349273.3A CN114124759B (en) 2021-11-15 2021-11-15 Evaluation method and device for distributed system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111349273.3A CN114124759B (en) 2021-11-15 2021-11-15 Evaluation method and device for distributed system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114124759A CN114124759A (en) 2022-03-01
CN114124759B true CN114124759B (en) 2024-03-08

Family

ID=80396364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111349273.3A Active CN114124759B (en) 2021-11-15 2021-11-15 Evaluation method and device for distributed system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114124759B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880182B (en) * 2022-06-10 2024-01-02 中国电信股份有限公司 Monitoring platform testing method and device, electronic equipment and readable storage medium
CN115168222B (en) * 2022-07-21 2023-02-28 北京同创永益科技发展有限公司 Method for producing lossless chaotic engineering experiment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761208A (en) * 2005-11-17 2006-04-19 郭世泽 System and method for evaluating security and survivability of network information system
CN106130830A (en) * 2016-08-31 2016-11-16 北京奇虎科技有限公司 The method of testing of safety equipment stability and test device
CN110618924A (en) * 2019-09-19 2019-12-27 浙江诺诺网络科技有限公司 Link pressure testing method of web application system
CN111831569A (en) * 2020-07-22 2020-10-27 平安普惠企业管理有限公司 Test method and device based on fault injection, computer equipment and storage medium
CN113381913A (en) * 2021-08-13 2021-09-10 飞狐信息技术(天津)有限公司 Traffic processing method, gateway, traffic comparison system and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761208A (en) * 2005-11-17 2006-04-19 郭世泽 System and method for evaluating security and survivability of network information system
CN106130830A (en) * 2016-08-31 2016-11-16 北京奇虎科技有限公司 The method of testing of safety equipment stability and test device
CN110618924A (en) * 2019-09-19 2019-12-27 浙江诺诺网络科技有限公司 Link pressure testing method of web application system
CN111831569A (en) * 2020-07-22 2020-10-27 平安普惠企业管理有限公司 Test method and device based on fault injection, computer equipment and storage medium
CN113381913A (en) * 2021-08-13 2021-09-10 飞狐信息技术(天津)有限公司 Traffic processing method, gateway, traffic comparison system and device

Also Published As

Publication number Publication date
CN114124759A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
US11500757B2 (en) Method and system for automatic real-time causality analysis of end user impacting system anomalies using causality rules and topological understanding of the system to effectively filter relevant monitoring data
CN114124759B (en) Evaluation method and device for distributed system, electronic equipment and storage medium
US9454450B2 (en) Modeling and testing of interactions between components of a software system
EP2871574B1 (en) Analytics for application programming interfaces
KR102522005B1 (en) Apparatus for VNF Anomaly Detection based on Machine Learning for Virtual Network Management and a method thereof
US8572226B2 (en) Enhancing network details using network monitoring scripts
CN1763778A (en) System and method for problem determination using dependency graphs and run-time behavior models
Wu et al. Zeno: Diagnosing performance problems with temporal provenance
Tang et al. An integrated framework for optimizing automatic monitoring systems in large IT infrastructures
CN105577799B (en) A kind of fault detection method and device of data-base cluster
CN110764980A (en) Log processing method and device
CN111274604A (en) Service access method, device, equipment and computer readable storage medium
CN112241350A (en) Micro-service evaluation method and device, computing device and micro-service detection system
Ma et al. Servicerank: Root cause identification of anomaly in large-scale microservice architectures
CN112559285A (en) Distributed service architecture-based micro-service monitoring method and related device
CN110365714A (en) Host-based intrusion detection method, apparatus, equipment and computer storage medium
Pranata et al. Misconfiguration discovery with principal component analysis for cloud-native services
Kumar et al. A state-space approach to SLA based management
KR101039874B1 (en) System for integration platform of information communication
CN112688947B (en) Internet-based network communication information intelligent monitoring method and system
CN113656314A (en) Pressure test processing method and device
Appleby et al. Threshold management for problem determination in transaction based e-commerce systems
CN113259878B (en) Call bill settlement method, system, electronic device and computer readable storage medium
Salva et al. Automatic web service robustness testing from WSDL descriptions
Bogardi-Meszoly et al. Performance factors in ASP. NET web applications with limited queue models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant