CN114124759A - Evaluation method and device for distributed system, electronic equipment and storage medium - Google Patents

Evaluation method and device for distributed system, electronic equipment and storage medium Download PDF

Info

Publication number
CN114124759A
CN114124759A CN202111349273.3A CN202111349273A CN114124759A CN 114124759 A CN114124759 A CN 114124759A CN 202111349273 A CN202111349273 A CN 202111349273A CN 114124759 A CN114124759 A CN 114124759A
Authority
CN
China
Prior art keywords
link
flow
service
test
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111349273.3A
Other languages
Chinese (zh)
Other versions
CN114124759B (en
Inventor
陈肇权
林海
马泽政
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202111349273.3A priority Critical patent/CN114124759B/en
Publication of CN114124759A publication Critical patent/CN114124759A/en
Application granted granted Critical
Publication of CN114124759B publication Critical patent/CN114124759B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an evaluation method and an evaluation device for a distributed system, electronic equipment and a storage medium, and relates to the technical field of chaotic engineering, wherein the evaluation method comprises the following steps: the method comprises the steps of obtaining a link test script, executing the link test script and generating a request message, wherein the link test script is obtained by combining the generated test script and a service link, sending the request message to a system to be tested so as to carry out flow benchmark test and flow comparison test on the system to be tested, collecting running data in the test process, receiving a response message returned after the test of the system to be tested is completed, and evaluating the state of the system to be tested, which corresponds to flow change, in different dimensions based on the response message and the running data to obtain an evaluation message. The method solves the technical problems that the influence of the abnormal flow on the system is not considered in the implementation process of the chaotic engineering in the related technology, and the capability of the system for coping with the flow change cannot be evaluated.

Description

Evaluation method and device for distributed system, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to an evaluation method and apparatus for a distributed system, an electronic device, and a storage medium.
Background
Chaotic engineering is a new content in the field of software research and development, is a subject for carrying out experiments on systems, is used for understanding the capacity of the systems for coping with various chaotic conditions in a production environment and establishing confidence for the capacity of the systems for bearing turbulent flow conditions in the production environment. In the related art, in the implementation process of the chaotic engineering, the simulation of resource chaotic conditions is mostly focused, various resources (such as resources of a CPU, a memory, a queue and the like) are injected randomly, conditions of high resource usage, resource downtime and the like are caused, and a chaotic value of a system environment is increased to evaluate the usability and robustness performance of the system in the abnormal chaotic environment of the resources. However, the existing chaotic engineering implementation method does not consider the influence of the conditions such as flow storm, unreasonable transaction flow and the like on the service availability and robustness, has incomplete evaluation dimension on the system capacity, improves the engineering risk and reduces the engineering quality.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides an evaluation method and device for a distributed system, electronic equipment and a storage medium, which are used for at least solving the technical problems that the influence of abnormal flow on the system is not considered in the implementation process of chaotic engineering in the related technology, and the capability of the system for coping with flow change cannot be evaluated.
According to an aspect of an embodiment of the present invention, there is provided an evaluation method for a distributed system, including: acquiring a link test script, executing the link test script and generating a request message, wherein the link test script is obtained by combining the generated test script and a service link; sending the request message to a system to be tested so as to perform flow benchmark test and flow comparison test on the system to be tested and acquire operation data in the test process; receiving a response message returned after the system to be tested finishes testing; and evaluating the state of the system to be tested in different dimensions corresponding to the flow change based on the response message and the operation data to obtain an evaluation message.
Optionally, before obtaining the link test script, the evaluation method includes: monitoring traffic data of a network module on a server by adopting a preset probe strategy, wherein the traffic data at least carries a port identifier and a service request; when the port identification indicates a certain service port of the system to be tested, acquiring a plurality of service requests in the flow data; analyzing the service request to obtain an analysis result; and traversing the service requests according to a time sequence by adopting a preset identification algorithm based on the analysis result to obtain the service link.
Optionally, after traversing the plurality of service requests according to a time sequence by using a preset identification algorithm based on the analysis result to obtain a service link, the evaluation method further includes: and under the condition that the service link appears for the first time, extracting the request messages and the response messages of all service requests in the service link to obtain a flow sample.
Optionally, after extracting the request messages and the response messages of all the service requests in the service link to obtain a traffic sample, the evaluation method further includes: performing statistical analysis on the service link and the service request by adopting a preset clustering algorithm and the expected category number to obtain a link call amount and a request call amount; and screening the service link corresponding to the link calling amount larger than the first preset threshold value, and screening the service request corresponding to the request calling amount larger than the second preset threshold value to obtain a screening result.
Optionally, after obtaining the screening result, the evaluation method further comprises: sorting the screening results into a time sequence request curve according to the time sequence of the execution requests; based on a request number proportion and the time sequence request curve, obtaining a time sequence total flow variation curve and a flow composition ratio by weighted summation, wherein the request number proportion is a proportion value between the service link and the service request and the total number of requests; and characterizing the time sequence total flow variation curve and the flow composition ratio as a reference scene simulating flow variation.
Optionally, after characterizing the time-series total flow variation curve and the flow composition fraction as a reference scenario simulating flow variation, the evaluation method includes: for the reference scenario simulating flow variation, generating an initial test script based on the flow sample, wherein the initial test script at least comprises: testing behaviors, testing data and assertions; receiving test data input by external equipment, and adjusting the initial test script based on the test data to obtain a target test script; and combining the service link and the target test script to obtain a link test script.
Optionally, the step of executing the link test script and generating a request message includes: analyzing the link test script to obtain test behaviors and test data; and packaging the test behavior and the test data into a request message by adopting a preset communication protocol.
Optionally, the step of performing a flow benchmark test and a flow comparison test on the system to be tested includes: generating link request flow based on the link test script; fitting the link request flow and a reference scene simulating flow change to obtain a reference test result and a concurrency degree time sequence curve, and completing flow reference test; and reinitiating the service request according to the concurrency time sequence curve, and injecting flow abnormity randomly on the basis of the reference flow to obtain a flow comparison result, thereby completing the flow comparison test.
Optionally, the type of traffic anomaly comprises at least one of: the method comprises the steps of abnormal increase of a front-hand service, abnormal increase of a service request, abnormal increase of time consumption for processing the request and abnormal traffic load, wherein the front-hand service refers to the services with the front row in the execution sequence in a traffic link.
Optionally, before evaluating the state of the system under test coping with flow change in different dimensions based on the response message and the operation data, the evaluation method includes: performing assertion check on the response message to obtain a check result; in the event that the check result indicates that all assertion checks pass, determining that the link test script execution is complete.
Optionally, the step of evaluating the state of the system under test for handling the traffic change in different dimensions based on the response packet and the operation data includes: determining a capacity grading threshold corresponding to each dimension; and analyzing the state of the system to be tested for responding to flow change in different dimensions based on the response message, the operation data and the capacity grading threshold corresponding to each dimension.
According to another aspect of the embodiments of the present invention, there is also provided an evaluation apparatus for a distributed system, including: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a link test script, executing the link test script and generating a request message, and the link test script is obtained by combining the generated test script and a service link; the sending unit is used for sending the request message to a system to be tested so as to perform flow benchmark test and flow comparison test on the system to be tested and acquire operation data in the test process; the receiving unit is used for receiving a response message returned after the system to be tested finishes testing; and the evaluation unit is used for evaluating the state of the system to be tested in different dimensions corresponding to the flow change based on the response message and the operation data to obtain an evaluation message.
Optionally, the evaluation device comprises: the system comprises a first monitoring module, a second monitoring module and a third monitoring module, wherein the first monitoring module is used for monitoring the traffic data of a network module on a server by adopting a preset probe strategy before acquiring a link test script, and the traffic data at least carries a port identifier and a service request; the first acquisition module is used for acquiring a plurality of service requests in the flow data when the port identification indicates a certain service port of the system to be tested; the first analysis module is used for analyzing the service request to obtain an analysis result; and the first traversal module is used for traversing the service requests according to the time sequence by adopting a preset identification algorithm based on the analysis result to obtain the service link.
Optionally, the evaluation device further comprises: and the first extraction module is used for traversing the service requests according to a time sequence by adopting a preset identification algorithm based on the analysis result to obtain a service link, and then extracting the request messages and the response messages of all the service requests in the service link to obtain a flow sample under the condition that the service link appears for the first time.
Optionally, the evaluation device further comprises: the first analysis module is used for performing statistical analysis on the service link and the service request by adopting a preset clustering algorithm and expected category number after extracting the request messages and the response messages of all service requests in the service link to obtain a flow sample to obtain a link call amount and a request call amount; the first screening module is used for screening the service link corresponding to the link calling amount larger than the first preset threshold value, and screening the service request corresponding to the request calling amount larger than the second preset threshold value to obtain a screening result.
Optionally, the evaluation device further comprises: the first sorting module is used for sorting the screening results into a time sequence request curve according to the time sequence of the execution requests after the screening results are obtained; the first calculation module is used for obtaining a time sequence total traffic variation curve and a traffic composition ratio by weighted summation based on a request number proportion and the time sequence request curve, wherein the request number proportion is a proportion value between the service link and the service request and a request total number; and the first characterization module is used for characterizing the time sequence total flow variation curve and the flow composition ratio as a reference scene simulating flow variation.
Optionally, the evaluation device comprises: a first generating module, configured to generate an initial test script based on the traffic sample for a reference scenario simulating traffic variation after characterizing the time-series total traffic variation curve and the traffic component ratio as the reference scenario simulating traffic variation, where the initial test script at least includes: testing behaviors, testing data and assertions; the first adjusting module is used for receiving test data input by external equipment and adjusting the initial test script based on the test data to obtain a target test script; and the first combination module is used for combining the service link and the target test script to obtain a link test script.
Optionally, the obtaining unit includes: the first analysis submodule is used for analyzing the link test script to obtain test behaviors and test data; and the first packaging submodule is used for packaging the test behavior and the test data into a request message by adopting a preset communication protocol.
Optionally, the sending unit includes: the first generation submodule is used for generating link request flow based on the link test script; the first fitting submodule is used for fitting the link request flow and a reference scene simulating flow change to obtain a reference test result and a concurrency degree time sequence curve and finish flow reference test; and the first injection submodule is used for reinitiating the service request according to the concurrency time sequence curve, injecting flow abnormity randomly on the basis of the reference flow, obtaining a flow comparison result and finishing the flow comparison test.
Optionally, the type of traffic anomaly comprises at least one of: the method comprises the steps of abnormal increase of a front-hand service, abnormal increase of a service request, abnormal increase of time consumption for processing the request and abnormal traffic load, wherein the front-hand service refers to the services with the front row in the execution sequence in a traffic link.
Optionally, the evaluation device comprises: the first assertion module is used for carrying out assertion check on the response message to obtain a check result before evaluating the state of the system to be tested for dealing with flow change in different dimensions based on the response message and the operation data; a first determining module, configured to determine that the execution of the link test script is complete if the check result indicates that all assertion checks pass.
Optionally, the evaluation unit comprises: the first determining submodule is used for determining a capacity grading threshold corresponding to each dimension; and the first analysis submodule is used for analyzing the state of the system to be tested for responding to the flow change in different dimensions based on the response message, the operation data and the capability grading threshold value corresponding to each dimension.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute any one of the above evaluation methods for a distributed system.
According to another aspect of embodiments of the present invention, there is also provided an electronic device, including one or more processors and a memory for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the evaluation method for a distributed system according to any one of the above.
In the disclosure, a link test script is obtained, the link test script is executed, and a request message is generated, wherein the link test script is obtained by combining the generated test script and a service link, the request message is sent to a system to be tested so as to perform flow benchmark test and flow comparison test on the system to be tested, run data in a test process is collected, a response message returned after the test of the system to be tested is completed is received, and the state of the system to be tested responding to flow change in different dimensions is evaluated based on the response message and the run data, so that an evaluation message is obtained. In the method, a flow benchmark test and a flow comparison test can be initiated for the system, anomalies are injected randomly in the flow comparison test process, a flow anomaly scene is manufactured, the performance of the system for dealing with sudden flow anomalies in different dimensions can be effectively observed and evaluated, the engineering risk can be reduced, the engineering quality is improved, and the technical problems that the influence of the flow anomalies on the system is not considered in the implementation process of chaotic engineering in the related technology, and the capability of the system for dealing with flow variations cannot be evaluated are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of an alternative chaotic engineering implementation system according to an embodiment of the present invention;
FIG. 2 is a flow diagram of an alternative evaluation method for a distributed system in accordance with embodiments of the present invention;
FIG. 3 is a schematic diagram of a multi-way tree constructed in an alternative recognition process according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an alternative implementation of flow-based chaotic engineering, according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an alternative evaluation apparatus for a distributed system in accordance with embodiments of the present invention;
fig. 6 is a block diagram of a hardware structure of an electronic device (or a mobile device) used in an evaluation method of a distributed system according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
To facilitate understanding of the invention by those skilled in the art, some terms or nouns referred to in the embodiments of the invention are explained below:
chaotic engineering: the method is a complex technical means for improving the elastic capability of a technical architecture, and aims to build confidence on the capability of a system for bearing turbulent flow conditions in a production environment.
The following embodiments of the invention can be applied to the field of chaotic engineering implementation, and are particularly suitable for the field of chaotic engineering implementation of evaluating the capacity of a system for coping with flow change. In the present invention, fig. 1 is a schematic diagram of an alternative chaotic engineering implementation system according to an embodiment of the present invention, including: the flow rate monitoring system comprises a flow rate acquisition unit 1, a flow rate analysis unit 2, a script management unit 3, a chaos implementation unit 4, a chaos evaluation unit 5 and an information storage unit 6, wherein the flow rate acquisition unit 1 comprises: the system comprises a flow acquisition module 11, a link analysis module 12 and a flow sample extraction module 13; the flow rate analyzing unit 2 includes: a cluster statistic module 21 and a reference scene design module 22; the script management unit 3 includes: a script generation module 31, a script management module 32 and a link script combination module 33; the chaos implementing unit 4 includes: a flow generation module 41, an abnormal injection module 42 and an information acquisition module 43; the chaos evaluation unit 5 includes: a dimension and threshold defining module 51, a grading evaluation module 52 and a prompt and early warning module 53.
The invention identifies the service request and the service link which greatly contribute to the flow through the collection, analysis and conversion of the flow data, forms a reference scene simulating the flow change in the chaotic engineering, and converts the service request and the service link into a link test script.
Example one
In accordance with an embodiment of the present invention, there is provided an evaluation method embodiment for a distributed system, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 2 is a flow chart of an alternative evaluation method for a distributed system according to an embodiment of the present invention, as shown in fig. 2, the method comprising the steps of:
step S201, obtaining a link test script, executing the link test script, and generating a request message, where the link test script is obtained by combining the generated test script and the service link.
Step S202, the request message is sent to the system to be tested so as to carry out flow benchmark test and flow comparison test on the system to be tested and collect operation data in the test process.
Step S203, receiving a response message returned after the system to be tested finishes testing.
And step S204, evaluating the state of the system to be tested in different dimensions corresponding to the flow change based on the response message and the operation data to obtain an evaluation message.
Through the steps, the link test script can be obtained, the link test script is executed, and a request message is generated, wherein the link test script is obtained after the generated test script is combined with the service link, the request message is sent to the system to be tested so as to carry out flow benchmark test and flow comparison test on the system to be tested, running data in the test process is collected, a response message returned after the system to be tested is received, and the state of the system to be tested, which corresponds to flow change in different dimensions, is evaluated based on the response message and the running data, so that an evaluation message is obtained. In the embodiment of the invention, a flow benchmark test and a flow comparison test can be initiated for the system, the abnormity is injected randomly in the flow comparison test process, a flow abnormity scene is manufactured, the performance of the system for dealing with the sudden flow abnormity in different dimensions can be effectively observed and evaluated, the engineering risk can be reduced, the engineering quality is improved, and the technical problems that the influence of the flow abnormity on the system is not considered in the implementation process of chaotic engineering and the capability of the system for dealing with the flow change cannot be evaluated in the related technology are solved.
The following describes an embodiment of the present invention in detail with reference to the above steps and fig. 1.
In the embodiment of the present invention, before obtaining the link test script, the evaluation method includes: monitoring traffic data of a network module on a server by adopting a preset probe strategy, wherein the traffic data at least carries a port identifier and a service request; when the port identification indicates a certain service port of the system to be tested, collecting a plurality of service requests in the flow data; analyzing the service request to obtain an analysis result; and traversing the plurality of service requests according to the time sequence by adopting a preset identification algorithm based on the analysis result to obtain the service link.
Optionally, after traversing the plurality of service requests according to the time sequence by using a preset identification algorithm based on the analysis result to obtain the service link, the evaluation method further includes: and under the condition that the service link appears for the first time, extracting the request messages and the response messages of all service requests in the service link to obtain a flow sample.
In the embodiment of the present invention, the traffic data of the network module on the server may be monitored by the traffic collection unit 1 through a probe program (i.e., a preset probe policy) bypass, a service request is collected and a service link is identified by multidimensional analysis, and a traffic sample in the service link is extracted, which includes the following specific processes:
the traffic collection module 11 may be deployed on an application server of a system to be tested, and cyclically monitor traffic data (the traffic data carries at least a port identifier and a service request) on a network card of the server (i.e., a network module on the server) by using a probe program (i.e., a preset probe policy), where the traffic collection module 11 collects the service request of which a target port is a service port of the system to be tested (i.e., when the port identifier indicates a certain service port of the system to be tested, collects multiple service requests in the traffic data), and notifies the link analysis module 12 of performing subsequent analysis, where the specific process is as follows:
after collecting a plurality of service requests, the collected service requests may be parsed by the link analysis module 12, where specific parsing contents include, but are not limited to: the calling party, the calling party IP, the session information, the calling and called information of the service program and the like analyze the service requests to obtain an analysis result, and then the service requests can be traversed according to the time sequence, a plurality of collected service requests are sorted, the service requests are analyzed by adopting a preset identification algorithm, and the associated service requests are identified as an ordered link (namely a service link).
The preset recognition algorithm includes, but is not limited to, the following recognition algorithms:
(1) according to the same communication connection identification algorithm: searching for SOCKET connection establishing operation and disconnection operation, and identifying multiple service requests using the same communication connection between the two operations as the same link sequence (namely, the service link indicated by the application) according to the time sequence.
(2) According to the same session identification algorithm: and searching the message header of each service request, and identifying a plurality of service requests using the same session (session) as the same link sequence according to the time sequence.
(3) Per-entry request identification algorithm: analyzing an upstream calling party and a downstream called service of a service request, regarding the service request of which the upstream calling party is an end user as an entry request, regarding the downstream service called by the service request as a lower node on a calling link, defining a multi-branch tree to store a service calling relation, storing a service request at one time by each node in the multi-branch tree, regarding the upstream calling party as a parent node of the service request node, regarding all the downstream called services as child nodes thereof, and after the analysis is completed, outputting each multi-branch tree in a traversal manner according to a forward order, namely a link sequence on a time sequence.
Taking a link of "querying user information" as an example for explanation, a complete service process includes Login (Login), querying user information (qryUserInfo), and Logout (Logout), where before querying user information, whether session information is valid (checkSession) needs to be checked, before Logout operation, whether a user is logged in (checkLogin) needs to be checked, an entry request is Login, fig. 3 is a schematic diagram of a multi-branch tree constructed in an optional identification process according to an embodiment of the present invention, as shown in fig. 3, including: the method comprises the steps of root node login, child node inquiry user information, child node checking session information, child node exit and child node checking user login.
(4) According to a self-defined message content identification algorithm: and performing deserialization analysis on the acquired message BODY (BODY) of each service request, self-defining and extracting and comparing the contents of specific FIELDs (FIELD), and identifying multiple service requests with consistent contents as the same link sequence according to the time sequence. For example, the system under test allocates a unique service number (TransID) to each service, analyzes the content of an HTTP message for a service request of an HTTP protocol, acquires the parameter value of a TransID access parameter for a GET method, and traverses all the filed of a submitted form for a POST method of the submitted form to acquire the corresponding field value with the field name of the TransID.
In the embodiment of the invention, because different service systems have different characteristics, a plurality of identification algorithms are required to be used simultaneously for identifying and analyzing the service link, and the link identification algorithms can be correspondingly expanded according to different communication protocols, application layer compositions, service data structures and the like.
For the identified service link, it can be described by a set of service request names on the link, and stored in the information storage unit 6, for example, if one service link relates to five service requests including "Login (Login)", "session check (checkSession)", "query user information (qryurenfo)", "Login check (checklogein)", and "exit (Logout)", the service link is described as follows:
"Login-checkSession-qryUserInfo-checkLogout" (qryUserInfo link for short).
In the embodiment of the present invention, whether a service link appears for the first time or not can be determined according to the analysis result of the link analysis module 12 and the description, for the service link appearing for the first time, a detailed request message and a response message (including a message header and a message body) of all service requests in the service link can be extracted through the traffic sample extraction module 13 in the cache of the traffic acquisition module 11 to obtain a traffic sample, and the traffic sample is stored in the information storage unit 6, and for the service link not appearing for the first time, the corresponding cache of the traffic acquisition module 11 is cleared.
Optionally, after extracting the request messages and the response messages of all service requests in the service link to obtain the traffic sample, the evaluation method further includes: performing statistical analysis on the service link and the service request by adopting a preset clustering algorithm and the expected category number to obtain the link call quantity and the request call quantity; and screening the service link corresponding to the link calling amount larger than the first preset threshold value, and screening the service request corresponding to the request calling amount larger than the second preset threshold value to obtain a screening result.
Optionally, after obtaining the screening result, the evaluation method further comprises: sorting the screening results into a time sequence request curve according to the time sequence of the execution requests; based on the request number proportion and the time sequence request curve, a time sequence total flow variation curve and a flow composition ratio are obtained through weighted summation, wherein the request number proportion is a proportion value between a service link and a service request and a request total number; and (3) representing the ratio of the time sequence total flow variation curve and the flow composition as a reference scene simulating flow variation.
In the embodiment of the present invention, the traffic analysis unit 2 may collect and analyze service requests and service link call data for the traffic collection unit 1, screen the service links and service requests that contribute greatly to the traffic by using a preset clustering algorithm and predefined threshold parameters (i.e., a first preset threshold and a second preset threshold), summarize each service request call and service link call on a time sequence according to the granularity of seconds, and calculate a total traffic variation curve and a traffic composition ratio (characterized as a reference scene simulating traffic variation) on the time sequence, where in a chaotic engineering implementation test process, a reference scene in the embodiment of the present invention, that is, a link request composition and a link request amount reference of each second is specifically as follows:
the service requests and the service links can be statistically screened by the cluster statistical module 21, the cluster statistical module 21 uses a preset clustering algorithm to count the service links and the service requests according to the collecting and analyzing results of the traffic collecting unit 1, and obtains a plurality of sets of the service link requests and the service requests and respective call quantities according to the expected category number (i.e. the expected category number is used to statistically analyze the service links and the service requests to obtain the link call quantities and the request call quantities), and then evaluates the link call quantities and the request call quantities according to a preset threshold to screen the service links and the service requests which contribute much to the traffic (screen the service links corresponding to the link call quantities larger than a first preset threshold, and screen the service requests corresponding to the request call quantities larger than a second preset threshold), and obtaining a screening result.
The preset clustering algorithm may select a common machine learning algorithm, including but not limited to: EM, DBSCAN, K-Means, etc. The K-Means clustering algorithm is more efficient than other clustering algorithms, the K-Means clustering algorithm is taken as an example for explanation in the embodiment of the invention, the input of the algorithm is vectorized service links and expected category number, and the output is categories and the number of service links contained in each category, wherein vectorization refers to mapping textual behaviors to vectorized mathematical space through a model for cluster model training, and common word vectorization methods include but are not limited to: bag of words model, word2vec, n-gram, etc.
For example, it is known that a large number of service links, including the service links identified by the link analysis module 12, have been collected in the information storage unit 6, the cluster statistics module 21 uses a K-Means clustering algorithm, and uses the total service link request records stored in the information storage unit 6 and the expected category number 3 as parameters (where "the expected category number" may be adjusted according to the total amount of the service links and the clustering precision), and after performing cluster learning, 3 category sets are obtained, which specifically indicates that 10021703 queries have been performed on each service link of the system to be tested in production after cluster calculation, where qryurinfo link has been performed 1000000 queries, Login-Logout has been performed 20000 queries, and Login-qrybance-Logout has been performed 1000 queries. Therefore, the three traffic links contribute significantly to the traffic, accounting for nearly 99.9% of the total traffic.
In the embodiment of the present invention, for the service links and the service requests screened by the cluster statistical module 21, the reference scene design module 22 may obtain details of the service links and the service requests from the traffic acquisition results stored in the information storage unit 6, arrange the details into a request curve (i.e., an chronological request curve) on a time sequence according to the execution time sequence, and then obtain a total traffic variation curve and a traffic composition ratio (characterized as a reference scene simulating traffic variation) on the time sequence according to a request number ratio (the request number ratio is a ratio between the service links and the service requests and a total request number) of the service links and the service requests and the chronological request curve by weighted summation.
Optionally, after characterizing the time-series total flow variation curve and the flow component ratio as a reference scenario of the simulated flow variation, the evaluation method includes: for a reference scene simulating flow variation, generating an initial test script based on a flow sample, wherein the initial test script at least comprises: testing behaviors, testing data and assertions; receiving test data input by external equipment, and adjusting an initial test script based on the test data to obtain a target test script; and combining the service link and the target test script to obtain a link test script.
In the embodiment of the present invention, the script management unit 3 may generate a pressure test script (i.e., an initial test script) of the service request according to the traffic sample acquired by the traffic acquisition unit 1 for the service link and the service request related to the reference scene, and combine the pressure test script and the initial test script into a link test script after the completion of the chaos engineering worker, where the specific process is as follows:
according to the request response details (i.e. the traffic sample) of the service sample of the service link extracted by the traffic acquisition unit 1, an initial test script of the service request is generated, a network interaction action can be extracted from a request message (i.e. the request message in the traffic sample) through the script generation module 31, a test behavior is defined, and then a content parser is used for performing a reverse formatting operation on the content of the interaction action, so as to extract test data.
The initial test script in the embodiment of the invention at least comprises the following components: testing behaviors, testing data, assertions and the like, wherein the specific steps are as follows:
(1) the test behavior is a series of operations on the project to be tested, the use behavior of the real user on the project to be tested is described, and the use pressure of the user on the project to be tested is simulated through playback of the test behavior in the test process. Taking the WEB service system of the BS architecture as an example, the test behavior is a series of HTTP request calls to system specific URIs.
(2) The test data is data which is used in the test process, has certain business meaning and meets the requirements of cases. Taking a WEB service system of a BS framework as an example, test data is a series of parameter assignments of system specific URI and is expressed in forms of a form KEY-VALUE set, JSON, XML and the like.
(3) The assertion is a Boolean expression and can consist of an actual value, an expected relationship and an expected value, wherein the actual value and the expected value are numerical expressions or character strings, wherein the numerical expressions or character strings can contain four arithmetic operations and variable references, when the assertion is checked, the variable replacement and the four arithmetic operations of the actual value and the expected value are firstly completed, and finally, whether the numerical values of the actual value and the expected value meet the expected relationship given by the assertion is judged, if so, the assertion is checked to pass, otherwise, the assertion fails.
In the embodiment of the present invention, the script management module 32 receives the perfection of the script by the engineering implementer in an interactive manner (i.e., receives the test data input by the external device, and adjusts the initial test script based on the test data), obtains the target test script, and stores the target test script in the information storage unit 6.
In the embodiment of the present invention, the link script combination module 33 may combine the service request scripts (i.e., target test scripts) generated by the script generation module 31 and completed by the script management module 32 into the link test scripts (i.e., combine the service links and the target test scripts to obtain the link test scripts) according to the service links identified by the link analysis module 12 in the traffic collection unit 1.
For example, for a qryUserInfo link under an HTTP protocol, script generation and link script combination of 5 traffic samples are involved, wherein checkSession and checkLogin are GET methods, and the rest are POST methods.
Step S201, obtaining a link test script, executing the link test script, and generating a request message, where the link test script is obtained by combining the generated test script and the service link.
Optionally, the step of executing the link test script and generating the request message includes: analyzing the link test script to obtain test behaviors and test data; and packaging the test behavior and the test data into a request message by adopting a preset communication protocol.
In the embodiment of the present invention, after obtaining a link test script (the link test script is obtained by combining a generated test script with a service link), the traffic generation module 41 may analyze the link test script (which may be analyzed to obtain a test behavior and test data) generated by the script management unit 3 and execute the test script in a loop, and during each execution in the loop, the traffic generation module 41 encapsulates the test behavior and the test data into a request message conforming to a communication protocol (that is, encapsulates the test behavior and the test data into the request message by using a preset communication protocol).
Step S202, the request message is sent to the system to be tested so as to carry out flow benchmark test and flow comparison test on the system to be tested and collect operation data in the test process.
In the embodiment of the invention, after being encapsulated into the request message, the request message can be sent to the system to be tested by a specified communication protocol according to the traffic composition ratio defined by the reference scene, and the response is received in a synchronous blocking waiting mode.
In the embodiment of the present invention, the chaos implementing unit 4 may initiate a request based on a link test script to implement a chaos engineering test, including: the method comprises a flow benchmark test and a flow comparison test, wherein the flow benchmark test is used for analyzing a fitting result after fitting a link request flow and a preset benchmark scene, and the flow comparison test is used for analyzing flow change data after injecting abnormal flow into a current network.
In addition, the information acquisition module 43 may continuously acquire multidimensional data (i.e., operating data) from the traffic generation module 41 and the system to be tested through a plurality of monitoring threads in the test implementation process, and store the multidimensional data in the information storage unit 6 as data support for chaotic evaluation. In the embodiment of the present invention, the multidimensional data collected by the information collecting module 43 includes, but is not limited to: service success rate, service availability, service capacity, queue congestion, application resources, and system resource consumption. For example:
(1) service availability acquisition: and regularly sending a Health Check (Health Check) instruction to each node of the system to be tested, and checking whether the service node can normally provide services.
(2) Service capability acquisition: collecting service capability indicators from the traffic generation module 41 includes: success rate for a particular service (assertion checks all passing are considered successful service invocation), average time consumed for service requests, throughput of service requests, etc.
(3) Collecting application resources: the method comprises the steps of utilizing an injection type monitoring instruction to collect application software resources supporting normal operation of a system to be tested, particularly resources with limited total resource amount (such as stack memories managed independently) or resources managed in a resource pool mode (such as a connection pool, a thread pool, an internal queue and the like), and monitoring the usage amount of the resources and the waiting congestion time of a resource requester.
(4) And (3) collecting system resource consumption: continuously calling a resource acquisition command of an operating system, and acquiring hardware resource data of a server system from the operating system, wherein the resource acquisition command comprises and is not limited to: CPU, internal memory, network read-write quantity, disk IO, etc.
For example, for the operating system Linux, when the information acquisition module 43 acquires the system resource consumption information, it acquires the dimension CPU consumption data such as user, sys, wa, and the like by defining a listening process to continuously call the vmstat 1 command.
Optionally, the step of performing a flow benchmark test and a flow comparison test on the system to be tested includes: generating link request flow based on the link test script; fitting the link request flow and a reference scene simulating flow change to obtain a reference test result and a concurrency degree time sequence curve, and completing flow reference test; and reinitiating the service request according to the concurrency time sequence curve, and injecting flow abnormity randomly on the basis of the reference flow to obtain a flow comparison result and finish the flow comparison test.
In the embodiment of the present invention, the traffic generation module 41 generates the link request traffic by executing the link test script, and then the traffic generation module 41 may initiate a traffic benchmark test, dynamically adjust the concurrency request number, control the link request traffic to be fitted to a benchmark scene simulating traffic variation, obtain a benchmark result and a concurrency request number change curve (i.e., a concurrency degree time sequence curve) on a time sequence, and complete the traffic benchmark test. The flow generation module 41 may reinitiate the service request according to the same concurrency timing curve of the benchmark test, and may inject flow anomalies randomly through the anomaly injection module 42 in the test process to increase the overall chaos degree of the system flow, obtain a flow comparison result, and complete the flow comparison test.
Optionally, the type of the traffic anomaly includes at least one of: the method comprises the following steps of abnormal increase of a front-hand service, abnormal increase of a service request, abnormal increase of time consumption for processing the request and abnormal traffic load, wherein the front-hand service refers to services which are arranged in the front of a business link in sequence.
In the embodiment of the present invention, on the basis of the flow benchmark test, the abnormal injection module 42 may inject flow abnormality randomly to increase the overall chaos degree of the flow requested by the system to be tested, and the abnormal injection module 42 may include two parts: the system comprises an instruction communication component communicated with a flow generation module 41 and a flow forwarding component between the flow generation module 41 and a system to be tested, wherein the instruction communication component randomly selects a service request, generates a flow control instruction to the flow generation module 41, and makes an incomplete link, flow amplification and other abnormalities from a flow source; the flow forwarding component works on a network transmission layer according to the principle of proxy forwarding, randomly selects a service request, and generates and transmits time-consuming and load-unbalanced abnormalities from the communication transmission angle.
The traffic exception in the embodiment of the present invention includes a series of exceptions that are related to service traffic scale, composition, load, etc. and affect the processing capability and performance of the system to be tested, and the type of the traffic exception may include the following multiple exception types, but does not represent that the present invention only includes the following multiple traffic exceptions:
(1) increased forehand service anomaly: the number of the simulated front-hand services is greatly increased due to the calling party, and the back-hand services of the congested link are normally executed and spread to the abnormity of other links through public resources. In specific implementation, for a front-hand service (i.e., a service with a top-ranked execution sequence in a service link) in a service link, the traffic generation module 41 receives an instruction from the exception injection module 42, randomly selects some concurrent threads in the execution process of a link script, interrupts return after the front-hand service is executed, does not execute a back-hand service, and indirectly increases the request amount of the front-hand service.
For example, when the forward hand service burst exception is injected into the qryUserInfo link, some concurrent threads are randomly selected, interruption return is performed after the Loign-checkSession request is sequentially executed, the Loign request is re-executed, and the Loign and checkSession request amount is increased.
(2) Service request exception increases: the method simulates the exception that a large number of service requests are suddenly increased due to a calling party, so that the flow and the network connection number are rapidly increased, and the exception is spread to other links through common resources. In specific implementation, the traffic generation module 41 receives the instruction of the exception injection module 42, adds some concurrent threads to initiate traffic requests, directly increases the link request amount, and amplifies the overall traffic scale.
For example, when a flow storm is abnormal for the qryUserInfo link, a newly added concurrent thread executes a qryUserInfo link script, and the overall flow scale and the ratio of the qryUserInfo link in the overall flow are increased.
(3) Processing request time consuming exception increases: and simulating the abnormity that the request time is prolonged, the service capacity is reduced and other links are spread through common resources due to network or service provider reasons. The traffic forwarding component of the anomaly injection module 42 randomly selects part of the service requests when forwarding the requests, sleeps for a period of time after receiving the response message, and forwards the response message to the caller, and also delays and sends the ACK response and the link terminal and other instructions in the communication process, thereby simulating the situations of network congestion and long processing time consumption of the service provider.
For example, when the qryUserInfo link is abnormally long in service injection time, the traffic forwarding component randomly selects an instance of a qryUserInfo link request, and when a checkSession request response message is received, the traffic forwarding component sleeps for 10 seconds and then returns a source address, so that the qryUserInfo link request is slow, an application lock resource is slowly released, and congestion occurs.
(4) Abnormal flow load: the method includes the steps that the request congestion and the service capacity reduction on service nodes are caused due to the fact that the request quantity of a specific service node is increased or the processing is slow, when the node is down due to the fact that the traffic pressure of the node is too large, the processing pressure of the rest nodes is increased due to the fact that the overflow traffic requests, and finally the abnormality that all the nodes are broken down by the traffic is caused. The traffic forwarding component of the abnormal injection module 42 selects a surviving (Active, normally serving) server, randomly selects part of the service requests when forwarding the requests, modifies the target address of the request message into the selected server, and after the selected server is out of order due to excessive traffic, the traffic forwarding component reselects a surviving server to continue the above operations.
For example, when the traffic load is not balanced and abnormal for the qryusenlnfo link, the traffic forwarding component randomly selects an instance of the qryuseninfo link request, modifies the request target address (http:// www.
Step S203, receiving a response message returned after the system to be tested finishes testing.
Optionally, before evaluating the state of the system to be tested in different dimensions for responding to the flow change based on the response message and the operation data, the evaluation method includes: performing assertion check on the response message to obtain a check result; in the event that the check result indicates that all assertion checks pass, it is determined that the link test script execution is complete.
In the embodiment of the present invention, after receiving the response message, the traffic generation module 41 parses the response message, checks whether the assertions pass, and determines that the execution result of the link test script passes when all the assertions pass.
And step S204, evaluating the state of the system to be tested in different dimensions corresponding to the flow change based on the response message and the operation data to obtain an evaluation message.
Optionally, the step of evaluating the state of the system to be tested in response to the flow change in different dimensions based on the response message and the operation data includes: determining a capacity grading threshold corresponding to each dimension; and analyzing the state of the system to be tested in different dimensions corresponding to the flow change based on the response message, the operation data and the capacity grading threshold corresponding to each dimension.
In the embodiment of the present invention, a chaos evaluation unit 5 may be used to evaluate a chaos engineering implementation process, the embodiment of the present invention defines multiple concerned dimensions such as system availability, service capability, abnormal phenomena, abnormal isolation, and the like based on an engineering concept, and defines a classification threshold for each dimension (i.e., determines a capability classification threshold corresponding to each dimension), in the evaluation implementation process, a chaos evaluation unit 4 evaluates capability ratings of a system to be tested in different concerned dimensions (i.e., analyzes states of the system to be tested in response to flow changes in different dimensions) according to a reference test and a comparison test result (i.e., a response message) output by the chaos implementation unit 4 and multi-dimensional operation data in the test process, and sends a prompt and early warning to engineering implementers for dimensions with low capability ratings, and the specific process is as follows:
the dimension and threshold definition module 51 defines the capability evaluation dimension of the system to be tested for dealing with flow anomaly and the capability grading threshold of each specific dimension as the evaluation basis for chaotic engineering implementation. For each dimension, a capability level is defined, along with a demarcation threshold for each level. For example, the following sets of capability assessment dimensions and thresholds are set forth merely for purposes of illustration and understanding, and are not intended to represent the only set of capability assessment dimensions and thresholds that can be used with the present invention:
(1) system availability: high-level: the phenomenon of system unavailability does not occur in the whole process (Health Check returns false, the same below); and (3) intermediate stage: the system is unavailable once during abnormal injection, but the system is recovered to be available after the abnormal injection is terminated; low-grade: the system unavailability phenomenon occurs during the abnormal injection and the abnormal injection is not recovered after being terminated.
(2) Service capability (traffic contrast test versus traffic benchmark test): high-level: the server has strict flow control protection, and the service capacities such as throughput, single time consumption and the like during abnormal injection are not influenced; and (3) intermediate stage: the service capacity such as throughput, single time consumption and the like during abnormal injection is reduced by no more than 50%, and the service capacity is recovered to be normal after the abnormal injection is finished; low-grade: the service capacity such as throughput, single time consumption and the like during abnormal injection is reduced by more than 50%, and the service capacity cannot be recovered to be normal after the abnormal injection is finished.
(3) Resource consumption: high-level: when abnormal injection occurs, the consumption of the application and system resources is not influenced, and the extra resource application directly returns application failure; and (3) intermediate stage: when abnormal injection is carried out, the consumption of application and system resources is not more than 50%, the resource application waiting time is not more than 1 second, and the queuing depth is not more than 200% of the daily depth; low-grade: when abnormal injection is carried out, the consumption of application and system resources is over 50%, the resource application waiting time is over 1 second, and the queuing depth is over 200% of the daily depth.
(4) Exception isolation: high-level: the server side has service level exception isolation, and the exception service does not cause service capacity exception of other services; and (3) intermediate stage: when abnormal injection is carried out, the service capacity of no more than 20% of irrelevant services is influenced (the service capacity such as throughput, single time consumption and the like is reduced by more than 20%, and the service capacity is considered to be influenced); low-grade: at the time of abnormal injection, over 20% of the service capabilities of the unrelated services are affected.
(5) Abnormal phenomena: high-level: the system to be tested has no unexpected abnormal phenomenon; and (3) intermediate stage: the system to be tested has less than 5 types of abnormal phenomena except the expectation; and (3) intermediate stage: and the system to be tested has less than 10 types of abnormal phenomena which are not expected, wherein the abnormal phenomena refer to other abnormal phenomena which do not occur in the benchmark test process and cannot be contained in the dimensions of the previous types.
In the embodiment of the present invention, the hierarchical evaluation module 52 performs hierarchical evaluation on the chaotic engineering of the system to be measured according to the data acquired by the information acquisition module 43 and the threshold value pre-configured by the dimension and threshold value definition module 51. For example, the hierarchical evaluation method of the hierarchical evaluation module 52 is explained based on the exemplary dimensions and thresholds of the dimension and threshold definition module 51:
(1) and (3) evaluating the system availability: traversing the system health check results in the benchmark comparison test process according to the time sequence in the reverse order, wherein when the reverse order traversal starts, the first system health check result is false, and the system availability evaluation is low; when the first system health check result in the reverse order is true, but the system health check result is false in the traversal process, evaluating the system health check result as a middle level; the health check result of the system is false after the traversal is completed and is evaluated as high-level
(2) Service capability evaluation: and inquiring the service test result of the traffic generation module 41, aligning according to time sequence, and then comparing and evaluating from the transverse and longitudinal angles.
1) The lateral contrast refers to: and calculating dimensions such as throughput, average response time and the like according to the unit of second, calculating the change proportion of the data compared with the previous second, traversing according to the time sequence, and evaluating according to the dimension and the grading threshold exemplified by the threshold definition module 41.
2) The longitudinal comparison means that the service capability difference ratio of the traffic benchmark test and the traffic comparison test at the same time is calculated according to the three granularities of the population, the link and the service, and then the evaluation is made according to the grading threshold exemplified by the dimension and threshold defining module 41 by traversing in the time sequence order.
(3) Resource consumption evaluation: according to the application resource consumption and system resource consumption information acquired by the information acquisition module 43, the change ratio of the data compared with the previous second is calculated in units of seconds. And then traverse in a time-series order, making evaluations according to the dimensions and the hierarchical threshold exemplified by the threshold definition module 41.
(4) And (3) abnormal isolation evaluation: on the basis of the result of the service granularity of the longitudinal comparison, after abnormal services injected at the same time are removed according to a time sequence, the number of services evaluated as a medium level and a low level is counted, namely, the services are irrelevant to the influence, and after the counting is completed, evaluation is made according to the dimension and the grading threshold value exemplified by the threshold value definition module 41.
(5) During the test process and after the test is finished, the number of unexpected abnormal phenomena is observed manually, and high-medium-low-level evaluation is made according to the grading threshold of the example.
In the embodiment of the present invention, the prompt and early warning module 43 may traverse the hierarchical evaluation result, and if there is a "medium level" or "low level" evaluation condition, may perform early warning on the engineering implementer, for example, notify the engineering implementer in a mail early warning manner.
The embodiment of the invention identifies the service request and the service link which greatly contribute to the flow through the collection, analysis and conversion of the production flow, forms the basic flow scene of the chaotic engineering and converts the basic flow scene into the link test script, initiates the flow basic test and the flow comparison test in the evaluation implementation stage, randomly injects abnormity in the flow comparison test process, manufactures the flow abnormal scene, and is used for evaluating the capability level of the system for dealing with the chaotic flow change in a plurality of concerned dimensions, the embodiment of the invention not only makes beneficial expansion on the concept of the traditional chaotic engineering, enriches the implementation method and the object of the chaotic engineering, but also can effectively observe and evaluate the performance of the system for the burst flow abnormity, and carries out grading evaluation on a plurality of dimensions of reliability, expandability, fault tolerance and the like of the distributed system, thereby reducing the engineering risk, and the engineering quality is improved.
Example two
Fig. 4 is a schematic diagram of an alternative implementation method of chaos engineering based on flow according to an embodiment of the present invention, as shown in fig. 4, including the following steps:
the method comprises the following steps: and (3) monitoring and acquiring flow:
in the embodiment of the invention, the traffic collection unit monitors network card data of a middleware (such as an application server) of a TCP application layer in the form of a bypass probe, collects service request traffic information and stores the service request traffic information in the data storage unit.
Step two: and (3) link relation analysis:
in the embodiment of the invention, a plurality of association requests in the flow are identified as ordered links from a plurality of dimensions such as communication connection, conversation, entry requests, self-defined message contents and the like.
Step three: flow sample extraction:
in the embodiment of the invention, for the link which appears for the first time, the detailed request message and the response message of all service requests in the link are extracted and stored as the flow sample.
Step four: and (3) flow clustering statistics:
in the embodiment of the invention, a clustering algorithm is used for counting the link requests and the service requests, the link call quantity and the request call quantity are evaluated according to a certain threshold value, and the links and the service requests which have large contribution to the flow are screened.
Step five: designing a reference flow scene:
in the embodiment of the invention, the call volume information of each link and service is integrated according to time sequence for the screened links and service requests, and a simulation scene (namely a reference scene) of the reference flow is formed.
Step six: completing a service request chain script:
in the embodiment of the invention, the service request script is generated based on the flow sample and perfected by the script developer, and the request script is combined into the request chain by combining the flow scene and the link design.
Step seven: initiating a flow benchmark test:
in the embodiment of the invention, the flow generation module initiates a benchmark test, dynamically adjusts the number of concurrent requests in the process, controls the link request flow to be fitted with a benchmark scene, and obtains a concurrent request number change curve (namely a concurrency degree time sequence curve) on a time sequence as a benchmark result. For each service request, a server response is obtained and the assertion is checked for passage.
Step eight: initiating a flow comparison test:
in the embodiment of the invention, the flow generation module initiates the request according to the concurrency time sequence curve. In the test process, for each service request, the server response is also obtained and whether the assertion passes or not is checked.
Step nine: and (3) abnormal flow injection:
in the embodiment of the invention, the anomaly injection module randomly and continuously injects flow anomalies in the transmission layer and the application layer, thereby increasing the overall chaos degree of the system.
Step ten: collecting the operation condition:
in the embodiment of the present invention, the multi-dimensional data of the system to be tested is collected during the test process, which includes but is not limited to: service success rate, service availability, service capacity, queue queuing congestion, resource consumption, etc.
Step eleven: chaotic engineering evaluation:
in the embodiment of the invention, based on the collected multidimensional data, the chaotic engineering grading evaluation is carried out on the system to be tested according to the pre-configured threshold value.
The embodiment of the invention provides a chaos engineering implementation method based on flow, which comprises the steps of collecting, analyzing and converting production flow, initiating reference flow, injecting abnormity and acquiring information in the chaos engineering implementation process, and evaluating the capacity of a chaos engineering implementation object.
EXAMPLE III
The evaluation apparatus for a distributed system provided in this embodiment includes a plurality of implementation units, each implementation unit corresponding to a respective implementation step in the first embodiment.
Fig. 5 is a schematic diagram of an alternative evaluation apparatus for a distributed system according to an embodiment of the present invention, and as shown in fig. 5, the evaluation apparatus may include: an acquisition unit 50, a transmission unit 51, a reception unit 52, an evaluation unit 53, wherein,
an obtaining unit 50, configured to obtain a link test script, execute the link test script, and generate a request message, where the link test script is obtained by combining the generated test script with a service link;
a sending unit 51, configured to send the request message to a system to be tested, so as to perform a flow benchmark test and a flow comparison test on the system to be tested, and acquire operation data in a test process;
a receiving unit 52, configured to receive a response message returned after the test of the system to be tested is completed;
and the evaluation unit 53 is configured to evaluate the state of the system to be tested in response to the flow change in different dimensions based on the response message and the operation data, so as to obtain an evaluation message.
The evaluation device can obtain the link test script through the obtaining unit 50, execute the link test script, and generate the request message, wherein the link test script is obtained by combining the generated test script and the service link, the request message is sent to the system to be tested through the sending unit 51 so as to perform the flow benchmark test and the flow comparison test on the system to be tested, the running data in the test process is collected, the response message returned after the test of the system to be tested is completed is received through the receiving unit 52, and the state of the system to be tested responding to the flow change in different dimensions is evaluated through the evaluation unit 53 based on the response message and the running data, so that the evaluation message is obtained. In the embodiment of the invention, a flow benchmark test and a flow comparison test can be initiated for the system, the abnormity is injected randomly in the flow comparison test process, a flow abnormity scene is manufactured, the performance of the system for dealing with the sudden flow abnormity in different dimensions can be effectively observed and evaluated, the engineering risk can be reduced, the engineering quality is improved, and the technical problems that the influence of the flow abnormity on the system is not considered in the implementation process of chaotic engineering and the capability of the system for dealing with the flow change cannot be evaluated in the related technology are solved.
Optionally, the evaluation device comprises: the first monitoring module is used for monitoring the traffic data of the network module on the server by adopting a preset probe strategy before acquiring the link test script, wherein the traffic data at least carries a port identifier and a service request; the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a plurality of service requests in flow data when a port identifier indicates a certain service port of a system to be tested; the first analysis module is used for analyzing the service request to obtain an analysis result; and the first traversal module is used for traversing the service requests according to the time sequence by adopting a preset identification algorithm based on the analysis result to obtain the service link.
Optionally, the evaluation device further comprises: and the first extraction module is used for traversing the service requests according to the time sequence by adopting a preset identification algorithm based on the analysis result to obtain the service link, and extracting the request messages and the response messages of all the service requests in the service link under the condition that the service link appears for the first time to obtain the flow sample.
Optionally, the evaluation device further comprises: the first analysis module is used for performing statistical analysis on the service link and the service request by adopting a preset clustering algorithm and expected category number after extracting the request messages and the response messages of all service requests in the service link to obtain a flow sample, so as to obtain the link call quantity and the request call quantity; and the first screening module is used for screening the service link corresponding to the link calling amount larger than the first preset threshold value and screening the service request corresponding to the request calling amount larger than the second preset threshold value to obtain a screening result.
Optionally, the evaluation device further comprises: the first sorting module is used for sorting the screening results into a time sequence request curve according to the time sequence of the execution requests after the screening results are obtained; the first calculation module is used for obtaining a time sequence total flow variation curve and a flow composition ratio by weighted summation based on a request number proportion and a time sequence request curve, wherein the request number proportion is a proportion value between a service link and a service request and a request total number; and the first characterization module is used for characterizing the time sequence total flow variation curve and the flow composition ratio as a reference scene simulating flow variation.
Optionally, the evaluation device comprises: a first generating module, configured to generate an initial test script based on a traffic sample for a reference scenario simulating traffic variation after characterizing a timing total traffic variation curve and a traffic composition ratio as the reference scenario simulating traffic variation, where the initial test script at least includes: testing behaviors, testing data and assertions; the first adjusting module is used for receiving test data input by external equipment and adjusting the initial test script based on the test data to obtain a target test script; and the first combination module is used for combining the service link and the target test script to obtain a link test script.
Optionally, the obtaining unit includes: the first analysis submodule is used for analyzing the link test script to obtain test behaviors and test data; and the first packaging submodule is used for packaging the test behavior and the test data into a request message by adopting a preset communication protocol.
Optionally, the sending unit includes: the first generation submodule is used for generating link request flow based on the link test script; the first fitting submodule is used for fitting the link request flow and a reference scene simulating flow change to obtain a reference test result and a concurrency degree time sequence curve and finish flow reference test; and the first injection submodule is used for reinitiating the service request according to the concurrency time sequence curve, injecting flow abnormity randomly on the basis of the reference flow, obtaining a flow comparison result and finishing the flow comparison test.
Optionally, the type of the traffic anomaly includes at least one of: the method comprises the following steps of abnormal increase of a front-hand service, abnormal increase of a service request, abnormal increase of time consumption for processing the request and abnormal traffic load, wherein the front-hand service refers to services which are arranged in the front of a business link in sequence.
Optionally, the evaluation device comprises: the first assertion module is used for carrying out assertion check on the response message before evaluating the state of the system to be tested for responding to flow change in different dimensions based on the response message and the operation data to obtain a check result; and the first determination module is used for determining that the execution of the link test script is finished under the condition that the check result indicates that all assertion checks pass.
Optionally, the evaluation unit includes: the first determining submodule is used for determining a capacity grading threshold corresponding to each dimension; and the first analysis submodule is used for analyzing the state of the system to be tested in different dimensions corresponding to the flow change based on the response message, the operation data and the capability grading threshold value corresponding to each dimension.
The above-mentioned evaluation device may further comprise a processor and a memory, wherein the above-mentioned acquiring unit 50, the sending unit 51, the receiving unit 52, the evaluation unit 53, etc. are stored in the memory as program units, and the processor executes the above-mentioned program units stored in the memory to realize the corresponding functions.
The processor comprises a kernel, and the kernel calls a corresponding program unit from the memory. The kernel can be set to be one or more than one, and the state of the system to be tested, which is corresponding to the flow change in different dimensions, is evaluated by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device: the method comprises the steps of obtaining a link test script, executing the link test script and generating a request message, wherein the link test script is obtained by combining the generated test script and a service link, sending the request message to a system to be tested so as to carry out flow benchmark test and flow comparison test on the system to be tested, collecting running data in the test process, receiving a response message returned after the test of the system to be tested is completed, and evaluating the state of the system to be tested, which corresponds to flow change, in different dimensions based on the response message and the running data to obtain an evaluation message.
According to another aspect of embodiments of the present invention, there is also provided an electronic device including one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the evaluation method for a distributed system of any one of the above.
Fig. 6 is a block diagram of a hardware structure of an electronic device (or a mobile device) used in an evaluation method of a distributed system according to an embodiment of the present invention. As shown in fig. 6, the electronic device may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), and memory 104 for storing data. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a keyboard, a power supply, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 6 is only an illustration and is not intended to limit the structure of the electronic device. For example, the electronic device may also include more or fewer components than shown in FIG. 6, or have a different configuration than shown in FIG. 6.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute any one of the above evaluation methods for a distributed system.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (14)

1. An evaluation method for a distributed system, comprising:
acquiring a link test script, executing the link test script and generating a request message, wherein the link test script is obtained by combining the generated test script and a service link;
sending the request message to a system to be tested so as to perform flow benchmark test and flow comparison test on the system to be tested and acquire operation data in the test process;
receiving a response message returned after the system to be tested finishes testing;
and evaluating the state of the system to be tested in different dimensions corresponding to the flow change based on the response message and the operation data to obtain an evaluation message.
2. The evaluation method of claim 1, wherein prior to obtaining the link test script, the evaluation method comprises:
monitoring traffic data of a network module on a server by adopting a preset probe strategy, wherein the traffic data at least carries a port identifier and a service request;
when the port identification indicates a certain service port of the system to be tested, acquiring a plurality of service requests in the flow data;
analyzing the service request to obtain an analysis result;
and traversing the service requests according to a time sequence by adopting a preset identification algorithm based on the analysis result to obtain the service link.
3. The evaluation method according to claim 2, wherein after traversing the plurality of service requests in time sequence by using a preset identification algorithm based on the analysis result to obtain a service link, the evaluation method further comprises:
and under the condition that the service link appears for the first time, extracting the request messages and the response messages of all service requests in the service link to obtain a flow sample.
4. The evaluation method according to claim 3, wherein after extracting the request messages and the response messages of all the service requests in the service link to obtain the traffic samples, the evaluation method further comprises:
performing statistical analysis on the service link and the service request by adopting a preset clustering algorithm and the expected category number to obtain a link call amount and a request call amount;
and screening the service link corresponding to the link calling amount larger than the first preset threshold value, and screening the service request corresponding to the request calling amount larger than the second preset threshold value to obtain a screening result.
5. The method of claim 4, wherein after obtaining the screening results, the method further comprises:
sorting the screening results into a time sequence request curve according to the time sequence of the execution requests;
based on a request number proportion and the time sequence request curve, obtaining a time sequence total flow variation curve and a flow composition ratio by weighted summation, wherein the request number proportion is a proportion value between the service link and the service request and the total number of requests;
and characterizing the time sequence total flow variation curve and the flow composition ratio as a reference scene simulating flow variation.
6. The evaluation method according to claim 5, wherein after characterizing the time-series total flow variation curve and the flow composition fraction as a reference scenario simulating flow variation, the evaluation method comprises:
for the reference scenario simulating flow variation, generating an initial test script based on the flow sample, wherein the initial test script at least comprises: testing behaviors, testing data and assertions;
receiving test data input by external equipment, and adjusting the initial test script based on the test data to obtain a target test script;
and combining the service link and the target test script to obtain a link test script.
7. The method according to claim 1, wherein the step of executing the link test script to generate the request message comprises:
analyzing the link test script to obtain test behaviors and test data;
and packaging the test behavior and the test data into a request message by adopting a preset communication protocol.
8. The evaluation method of claim 1, wherein the step of performing a flow benchmark test and a flow contrast test on the system under test comprises:
generating link request flow based on the link test script;
fitting the link request flow and a reference scene simulating flow change to obtain a reference test result and a concurrency degree time sequence curve, and completing flow reference test;
and reinitiating the service request according to the concurrency time sequence curve, and injecting flow abnormity randomly on the basis of the reference flow to obtain a flow comparison result, thereby completing the flow comparison test.
9. The evaluation method of claim 8, wherein the type of traffic anomaly comprises at least one of: the method comprises the steps of abnormal increase of a front-hand service, abnormal increase of a service request, abnormal increase of time consumption for processing the request and abnormal traffic load, wherein the front-hand service refers to the services with the front row in the execution sequence in a traffic link.
10. The evaluation method according to claim 1, before evaluating the state of the system under test in response to traffic changes in different dimensions based on the response message and the operation data, the evaluation method comprising:
performing assertion check on the response message to obtain a check result;
in the event that the check result indicates that all assertion checks pass, determining that the link test script execution is complete.
11. The evaluation method according to claim 1, wherein the step of evaluating the state of the system under test in response to traffic changes in different dimensions based on the response message and the operation data comprises:
determining a capacity grading threshold corresponding to each dimension;
and analyzing the state of the system to be tested for responding to flow change in different dimensions based on the response message, the operation data and the capacity grading threshold corresponding to each dimension.
12. An evaluation apparatus for a distributed system, comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a link test script, executing the link test script and generating a request message, and the link test script is obtained by combining the generated test script and a service link;
the sending unit is used for sending the request message to a system to be tested so as to perform flow benchmark test and flow comparison test on the system to be tested and acquire operation data in the test process;
the receiving unit is used for receiving a response message returned after the system to be tested finishes testing;
and the evaluation unit is used for evaluating the state of the system to be tested in different dimensions corresponding to the flow change based on the response message and the operation data to obtain an evaluation message.
13. A computer-readable storage medium, comprising a stored computer program, wherein when the computer program runs, the computer-readable storage medium controls a device to execute the evaluation method for a distributed system according to any one of claims 1 to 11.
14. An electronic device comprising one or more processors and memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the evaluation method for a distributed system of any one of claims 1 to 11.
CN202111349273.3A 2021-11-15 2021-11-15 Evaluation method and device for distributed system, electronic equipment and storage medium Active CN114124759B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111349273.3A CN114124759B (en) 2021-11-15 2021-11-15 Evaluation method and device for distributed system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111349273.3A CN114124759B (en) 2021-11-15 2021-11-15 Evaluation method and device for distributed system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114124759A true CN114124759A (en) 2022-03-01
CN114124759B CN114124759B (en) 2024-03-08

Family

ID=80396364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111349273.3A Active CN114124759B (en) 2021-11-15 2021-11-15 Evaluation method and device for distributed system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114124759B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880182A (en) * 2022-06-10 2022-08-09 中国电信股份有限公司 Monitoring platform test method and device, electronic equipment and readable storage medium
CN115168222A (en) * 2022-07-21 2022-10-11 北京同创永益科技发展有限公司 Method for producing lossless chaotic engineering experiment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761208A (en) * 2005-11-17 2006-04-19 郭世泽 System and method for evaluating security and survivability of network information system
CN106130830A (en) * 2016-08-31 2016-11-16 北京奇虎科技有限公司 The method of testing of safety equipment stability and test device
CN110618924A (en) * 2019-09-19 2019-12-27 浙江诺诺网络科技有限公司 Link pressure testing method of web application system
CN111831569A (en) * 2020-07-22 2020-10-27 平安普惠企业管理有限公司 Test method and device based on fault injection, computer equipment and storage medium
CN113381913A (en) * 2021-08-13 2021-09-10 飞狐信息技术(天津)有限公司 Traffic processing method, gateway, traffic comparison system and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761208A (en) * 2005-11-17 2006-04-19 郭世泽 System and method for evaluating security and survivability of network information system
CN106130830A (en) * 2016-08-31 2016-11-16 北京奇虎科技有限公司 The method of testing of safety equipment stability and test device
CN110618924A (en) * 2019-09-19 2019-12-27 浙江诺诺网络科技有限公司 Link pressure testing method of web application system
CN111831569A (en) * 2020-07-22 2020-10-27 平安普惠企业管理有限公司 Test method and device based on fault injection, computer equipment and storage medium
CN113381913A (en) * 2021-08-13 2021-09-10 飞狐信息技术(天津)有限公司 Traffic processing method, gateway, traffic comparison system and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880182A (en) * 2022-06-10 2022-08-09 中国电信股份有限公司 Monitoring platform test method and device, electronic equipment and readable storage medium
CN114880182B (en) * 2022-06-10 2024-01-02 中国电信股份有限公司 Monitoring platform testing method and device, electronic equipment and readable storage medium
CN115168222A (en) * 2022-07-21 2022-10-11 北京同创永益科技发展有限公司 Method for producing lossless chaotic engineering experiment
CN115168222B (en) * 2022-07-21 2023-02-28 北京同创永益科技发展有限公司 Method for producing lossless chaotic engineering experiment

Also Published As

Publication number Publication date
CN114124759B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
EP2871574B1 (en) Analytics for application programming interfaces
US8572226B2 (en) Enhancing network details using network monitoring scripts
KR102522005B1 (en) Apparatus for VNF Anomaly Detection based on Machine Learning for Virtual Network Management and a method thereof
Ghosh et al. A framework for quantitative analysis of cascades on networks
CN114124759A (en) Evaluation method and device for distributed system, electronic equipment and storage medium
US20130116976A1 (en) Method, computer program, and information processing apparatus for analyzing performance of computer system
CN106502907A (en) A kind of distributed software abnormality diagnostic method that is followed the trail of based on perform track
CN109586952A (en) Method of server expansion, device
Wu et al. Zeno: Diagnosing performance problems with temporal provenance
Tang et al. An integrated framework for optimizing automatic monitoring systems in large IT infrastructures
US10217073B2 (en) Monitoring transactions from distributed applications and using selective metrics
CN104584483A (en) Method and apparatus for automatically determining causes of service quality degradation
CN110764980A (en) Log processing method and device
CN109495291B (en) Calling abnormity positioning method and device and server
US20140282422A1 (en) Using canary instances for software analysis
CN113986746A (en) Performance test method and device and computer readable storage medium
CN111881185A (en) Data monitoring method, device, equipment and storage medium
CN115271736A (en) Method, device, equipment, storage medium and product for verifying transaction consistency
Pranata et al. Misconfiguration discovery with principal component analysis for cloud-native services
CN106502887A (en) A kind of stability test method, test controller and system
WO2016085443A1 (en) Application management based on data correlations
CN114064757A (en) Application program optimization method, device, equipment and medium
Plestys et al. The measurement of grid QoS parameters
CN111111211B (en) Game data reporting method, device, system, equipment and storage medium
CN113259878B (en) Call bill settlement method, system, electronic device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant