WO2021047575A1

WO2021047575A1 - Load testing method and apparatus, and electronic device and computer-readable storage medium

Info

Publication number: WO2021047575A1
Application number: PCT/CN2020/114411
Authority: WO
Inventors: 林江彬; 王勇; 陈金富
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2019-09-12
Filing date: 2020-09-10
Publication date: 2021-03-18
Also published as: CN112486738B; CN112486738A

Abstract

A load testing method and apparatus, and an electronic device and a computer-readable storage medium. The method comprises: obtaining load data and determining operation data sequences of an operator on the basis of the load data (S101); clustering the operation data sequences of the operator to obtain one or more operator classification groups (S102); and determining a target operator satisfying a preset condition in the operator classification groups, and generating load testing data according to a data sequence of the target operator to perform load testing (S103). According to the method, a sequence combined by an operator event and context information is used for helping recover workload of a load, thereby implementing workload recovery in different operator behavior granularity levels; in addition, a representative operator is explored by means of a clustering method, so that the generation of workload can be implemented with the help of a small number of users.

Description

Load testing method, device, electronic equipment and computer readable storage medium

This application claims the priority of a Chinese patent application filed on September 12, 2019 with the application number 201910866125.5 and the title of the invention "load test method, device, electronic equipment, and computer-readable storage medium", the entire content of which is incorporated by reference In this application.

Technical field

The embodiment of the present invention relates to the technical field of data testing, in particular to a load testing method, device, electronic equipment, and computer-readable storage medium.

Background technique

With the development of data technology and Internet technology, more and more service providers provide services to users through software systems, and many of them provide services to a large number of users and have a major impact on the daily lives of billions of users around the world. Software systems, such as Amazon AWS, Google Gmail and Netflix. Obviously, the stable operation of these large-scale software systems is very important. Even minor faults will bring users a bad user experience, loss of data and loss of revenue. Therefore, load testing is usually used in practice to ensure the operating quality of the software system under load.

The goal of load testing is to ensure that the software system performs well under actual workloads. In order to achieve this goal, the workload needs to be restored first, and then the load test is designed according to the restored workload. Recovering load workloads is a challenging task because of the need to strike a balance between the level of granularity of the workload and the cost of using such workloads for load testing. If the restored workload is too rough, that is, the workload is too simplified. For example, the SPECweb96 benchmark defines a workload that only specifies the probability of accessing the file, such as "the file is less than 1KB, accounting for 35% of all requests", then the recovery The workload cannot capture the difference in user behavior, which leads to the loss of representativeness of the load test; if the workload gradually replays the exact field workload, although the exact user behavior can be reproduced, the cost of maintaining the workload is very high. This is because the software system has a large number of users. Replaying the exact workload requires load testing to simulate a large amount of contextual information for each user. It is also necessary to develop simulation code for each specific sequence of events. At the same time, it is almost impossible to twice Observe the exact same workload, so you need to constantly update the workload.

In order to achieve the ideal granularity level of the workload, the prior art usually designs the workload based on the representative user behaviors of a small number of clusters, and when aggregating users, it is usually implemented based on the frequency of operations of different users. However, due to the high variability of users in large-scale software systems, it is too rough to only consider the frequency of events. On the contrary, the sequence and context of user operations can make the workload more representative. For example, one user repeatedly reads small blocks of data from a file, and then writes each small block back to the file, while another user reads it interactively. And write a large amount of small pieces of data to the file. If you only consider the frequency of operations such as reads and writes, you cannot distinguish the workloads of these two users, but adding more detailed information about these user operations will result in high recovery, execution, and maintenance costs.

Summary of the invention

The embodiments of the present invention provide a load testing method, device, electronic equipment, and computer-readable storage medium.

In the first aspect, an embodiment of the present invention provides a load test method.

Specifically, the load test method includes:

Acquiring load data, and determining an operator operation data sequence based on the load data;

Perform clustering on the operator operation data sequence to obtain one or more operator clusters;

Determine the target operator in the operator class group that meets the preset condition, and generate load test data according to the data sequence of the target operator to perform the load test.

With reference to the first aspect, in the first implementation manner of the first aspect of the embodiment of the present invention, the load data is load log data or simulated load data or real-time load data.

Combining the first aspect and the first implementation manner of the first aspect, in the second implementation manner of the first aspect of the embodiment of the present invention, the acquiring load data, and determining the operator operation data sequence based on the load data, include:

Obtain load log data;

Determine the operator identification information in the load log data;

The load log data corresponding to the operator identification information is acquired based on the operator identification information, and the operator operation data sequence corresponding to the operator identification information is obtained.

Combining the first aspect, the first implementation manner of the first aspect and the second implementation manner of the first aspect, in the third implementation manner of the first aspect of the present disclosure, the operation data sequence of the operator is performed Clustering to obtain one or more operator groups, including:

Calculating the distance matrix of the operation data sequence of the operator;

Perform clustering on the operator operation data sequence according to the distance matrix to obtain one or more operator class groups.

Combining the first implementation manner of the first aspect, the second implementation manner of the first aspect, and the third implementation manner of the first aspect, in the fourth implementation manner of the first aspect of the present disclosure, the calculation The distance matrix of the operator's operation data sequence, including:

Constructing an operator's operation data sequence matrix based on the operator's operation data sequence;

Generating a similarity matrix of the operator's operation data sequence based on the operator's operation data sequence;

The operator's operation data sequence matrix and the operator's operation data sequence similarity matrix are multiplied to obtain the distance matrix of the operator.

Combining the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, and the fourth implementation manner of the first aspect, the present disclosure is described in the first aspect. In a fifth implementation manner of the aspect, the generation of the operator operation data sequence similarity matrix based on the operator operation data sequence is implemented as:

Determine frequent sequences in the operator's operation data sequence;

Calculating the edit distance between the frequent sequences;

The similarity between the operator's operation data sequences is determined according to the edit distance between the frequent sequences, and the operator's operation data sequence similarity matrix is generated.

Combining the first aspect, the first implementation of the first aspect, the second implementation of the first aspect, the third implementation of the first aspect, the fourth implementation of the first aspect, and the first implementation of the first aspect There are five implementation manners. In the sixth implementation manner of the first aspect of the present disclosure, the target operator in the operator class group that meets a preset condition is determined, and the load is generated according to the data sequence of the target operator Test data for load testing, including:

Determine a target operator in the operator group that meets a preset condition;

Acquiring the frequent sequence and frequency of occurrence of the target operator;

Calculating the occurrence probability of the frequent sequence according to the occurrence frequency of the frequent sequence;

Generating load test data according to the occurrence probability of the frequent sequence;

Play back and run the load test data to perform a load test.

In the second aspect, an embodiment of the present invention provides a load clustering method.

Specifically, the load clustering method includes:

Perform clustering on the operator operation data sequence to obtain one or more operator clusters.

In the third aspect, an embodiment of the present invention provides a load testing device.

Specifically, the load test device includes:

A determining module, configured to obtain load data, and determine an operator operation data sequence based on the load data;

The first clustering module is configured to perform clustering on the operator operation data sequence to obtain one or more operator clusters;

The generating module is configured to determine a target operator in the operator group that meets a preset condition, and generate load test data according to the data sequence of the target operator to perform the load test.

With reference to the third aspect, in the first implementation manner of the third aspect in an embodiment of the present invention, the load data is load log data.

With reference to the third aspect and the first implementation manner of the third aspect, in the second implementation manner of the third aspect of the embodiment of the present invention, the determining module includes:

The first obtaining submodule is configured to obtain load log data;

The first determining submodule is configured to determine the operator identification information in the load log data;

The second acquisition submodule is configured to acquire load log data corresponding to the operator identification information based on the operator identification information, and obtain an operator operation data sequence corresponding to the operator identification information.

With reference to the third aspect, the first implementation manner of the third aspect, and the second implementation manner of the third aspect, in the third implementation manner of the third aspect of the present disclosure, the first clustering module includes:

The first calculation sub-module is configured to calculate the distance matrix of the operation data sequence of the operator;

The clustering sub-module is configured to perform clustering on the operator operation data sequence according to the distance matrix to obtain one or more operator clusters.

In combination with the third aspect, the first implementation manner of the third aspect, the second implementation manner of the third aspect, and the third implementation manner of the third aspect, in the fourth implementation manner of the third aspect, the present disclosure The first calculation sub-module includes:

A construction sub-module configured to construct an operator operation data sequence matrix based on the operator operation data sequence;

The first generating sub-module is configured to generate an operator operation data sequence similarity matrix based on the operator operation data sequence;

The multiplication sub-module is configured to multiply the operator's operation data sequence matrix and the operator's operation data sequence similarity matrix to obtain the distance matrix of the operator.

Combining the third aspect, the first implementation manner of the third aspect, the second implementation manner of the third aspect, the third implementation manner of the third aspect, and the fourth implementation manner of the third aspect, the present disclosure In a fifth implementation manner of the aspect, the first generation submodule is configured to:

The second determining submodule is configured to determine frequent sequences in the operator's operation data sequence;

The second calculation sub-module is configured to calculate the edit distance between the frequent sequences;

The second generation sub-module is configured to determine the similarity between the operator operation data sequences according to the edit distance between the frequent sequences, and generate the operator operation data sequence similarity matrix.

Combining the third aspect, the first implementation manner of the third aspect, the second implementation manner of the third aspect, the third implementation manner of the third aspect, the fourth implementation manner of the third aspect, and the third aspect There are five implementation manners. In the sixth implementation manner of the third aspect of the present disclosure, the generating module includes:

The third determining submodule is configured to determine a target operator in the operator class group that meets a preset condition;

The third obtaining sub-module is configured to obtain the frequent sequence of the target operator and its appearance frequency;

The third calculation sub-module is configured to calculate the occurrence probability of the frequent sequence according to the occurrence frequency of the frequent sequence;

The third generation sub-module is configured to generate load test data according to the occurrence probability of the frequent sequence;

The test sub-module is configured to replay and run the load test data for load test.

In the fourth aspect, an embodiment of the present invention provides a load testing device.

Specifically, the load test device includes:

An obtaining module configured to obtain load data, and determine an operator operation data sequence based on the load data;

The second clustering module is configured to cluster the operator operation data sequence to obtain one or more operator clusters.

In a fifth aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, the memory is used to store one or more load testing devices/load clustering devices that support the execution of the load testing method/load clustering method described above The processor is configured to execute the computer instructions stored in the memory. The load testing device/load clustering device may further include a communication interface for the load testing device/load clustering device to communicate with other equipment or a communication network.

In a sixth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer instructions used by the load test device/load clustering device, which includes the load test method/load clustering method used for executing the load test method/load clustering method. Computer instructions related to the test device/load clustering device.

The technical solutions provided by the embodiments of the present invention may include the following beneficial effects:

The above technical solution obtains the operator operation data sequence based on the load data, and obtains one or more operator groups by clustering the operator operation data sequences. According to the target operator in the operator group group that meets the preset conditions The data sequence generates load test data for load test. This technical solution uses a sequence of operator events and context information to help restore the workload, so as to achieve workload recovery at different levels of operator behavior granularity. In addition, clustering methods are used to mine representative operators, thereby With a small number of users, workload generation can be achieved.

It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and cannot limit the embodiments of the present invention.

Description of the drawings

With reference to the accompanying drawings, through the following detailed description of the non-limiting implementation manners, other features, objectives, and advantages of the embodiments of the present invention will become more apparent. In the attached picture:

Fig. 1 shows a flowchart of a load test method according to an embodiment of the present invention;

FIG. 2 shows a flowchart of step S101 of the load testing method according to the embodiment shown in FIG. 1;

FIG. 3 shows a flowchart of step S102 of the load testing method according to the embodiment shown in FIG. 1;

FIG. 4 shows a flowchart of step S301 of the load test method according to the embodiment shown in FIG. 3;

FIG. 5 shows a flowchart of step S103 of the load testing method according to the embodiment shown in FIG. 1;

Fig. 6 shows a flowchart of a load clustering method according to an embodiment of the present invention;

FIG. 7 shows a structural block diagram of a load testing device according to an embodiment of the present invention;

FIG. 8 shows a structural block diagram of the determination module 701 of the load test device according to the embodiment shown in FIG. 7;

FIG. 9 shows a structural block diagram of the first clustering module 702 of the load testing device according to the embodiment shown in FIG. 7;

FIG. 10 shows a structural block diagram of the first calculation sub-module 901 of the load testing device according to the embodiment shown in FIG. 9;

FIG. 11 shows a structural block diagram of the generation module 703 of the load test device according to the embodiment shown in FIG. 7;

FIG. 12 shows a structural block diagram of a load clustering device according to an embodiment of the present invention;

FIG. 13 shows a schematic diagram of an application scenario according to an embodiment of the present invention;

FIG. 14 shows a structural block diagram of an electronic device according to an embodiment of the present invention;

FIG. 15 is a schematic structural diagram of a computer system suitable for implementing a load test method according to an embodiment of the present invention.

detailed description

Hereinafter, exemplary implementations of the embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. In addition, for the sake of clarity, parts that are not related to the description of the exemplary embodiments are omitted in the drawings.

In the embodiments of the present invention, it should be understood that terms such as "including" or "having" are intended to indicate the existence of the features, numbers, steps, behaviors, components, parts, or combinations thereof disclosed in this specification, and are not intended to The possibility that one or more other features, numbers, steps, behaviors, components, parts or combinations thereof exist or be added is excluded.

In addition, it should be noted that the embodiments of the present invention and the features in the embodiments can be combined with each other if there is no conflict. The embodiments of the present invention will be described in detail below with reference to the drawings and in conjunction with the embodiments.

The technical solution provided by the embodiment of the present invention obtains the operator operation data sequence based on load data, and obtains one or more operator class groups by clustering the operator operation data sequence, and meets the preset requirements in the operator class group The data sequence of the target operator of the condition generates load test data for load test. This technical solution uses a sequence of operator events and context information to help restore the workload, so as to achieve workload recovery at different levels of operator behavior granularity. In addition, clustering methods are used to mine representative operators, thereby With a small number of users, workload generation can be achieved.

Fig. 1 shows a flow chart of a load test method according to an embodiment of the present invention. As shown in Fig. 1, the load test method includes the following steps S101-S103:

In step S101, obtain load data, and determine an operator operation data sequence based on the load data;

In step S102, perform clustering on the operator operation data sequence to obtain one or more operator clusters;

In step S103, a target operator in the operator group that meets a preset condition is determined, and load test data is generated according to the data sequence of the target operator to perform the load test.

As mentioned above, with the development of data technology and Internet technology, more and more service providers provide services to users through software systems. In order to ensure the stable operation of software systems, load testing is usually used to ensure software systems in practice. The running quality under load. The goal of load testing is to ensure that the software system performs well under actual workloads. In order to achieve this goal, the workload needs to be restored first, and then the load test is designed according to the restored workload. Recovering load workloads is a challenging task because of the need to strike a balance between the level of granularity of the workload and the cost of using such workloads for load testing. If the restored workload is too rough, that is, the workload is too simplified, the restored workload cannot capture the difference in user behavior, resulting in the loss of representativeness of the load test; if the workload gradually replays the exact field workload, it can be reproduced. Exact user behavior, but the cost of maintaining the workload is very high. In the prior art, workloads are usually designed based on representative user behaviors of a small number of clusters, and when users are aggregated, they are usually implemented based on the frequency of operations of different users. However, due to the high variability of users in large-scale software systems, it is too rough to only consider the frequency of events. Adding more detailed information about user operations will lead to excessive recovery, execution, and maintenance costs.

In consideration of the above problems, in this embodiment, a load test method is proposed. The method obtains operator operation data sequence based on load data, and obtains one or more operators by clustering the operator operation data sequence. The class group generates load test data for load test according to the data sequence of the target operator that meets the preset conditions in the operator class group. This technical solution uses a sequence of operator events and context information to help restore the workload, so as to achieve workload recovery at different levels of operator behavior granularity. In addition, clustering methods are used to mine representative operators, thereby With a small number of users, workload generation can be achieved.

In an embodiment of the present invention, the load data refers to load data generated or generated based on the operation of the operator within a preset time period. Wherein, the operator refers to an operator such as a load operation user, a load operation machine, or a load operation resource. Wherein, the operation of the operator may be, for example, a search operation, a deletion operation, a new operation, an editing operation, and the like.

In an embodiment of the present invention, the load data may include one or more of the following data: operator identification information and load data corresponding to the operator identification information, including load workload, load content, load Processing results and so on. Wherein, the operator identification information is used to uniquely identify the operator.

In an embodiment of the present invention, the operator's operation data sequence refers to a data sequence composed of load data generated by a specific operator, and the operator's operation data sequence can reflect the characteristics of the operator's operation event and can reflect There is some contextual information associated with the operation event of the operator, where the characteristics of the operation event of the operator may include, for example, one or more of the following characteristics: the purpose of the operation event of the operator, the content of the operation event of the operator, and the operation of the operator Event effects, etc., where the context information may include, for example, one or more of the following information: operation event information of other operators related to the operator, and time sequence of the existence of the operation event with the operator Other operation event information of the operator and so on. Based on the abundant information in the data sequence, data information with different levels of operator behavior granularity can be obtained according to actual application requirements. Wherein, the operator's operation data sequence may be arranged according to a certain preset rule, for example, in chronological order, in the order of appearance of a certain field, or the order of appearance of a certain field, and so on. For example, according to the order in which a certain field appears, the operator can manipulate the data sequence for searching and deleting a new sequence, searching for an editing sequence, adding an editing sequence, and so on.

Considering that the log data is a kind of data that can record the event process and experience in detail, it is complete and complete, and the operator load data can be obtained more completely and accurately from the log data. Therefore, in an embodiment of the present invention , The load data is load log data. Of course, the load data may also be simulated load data or real-time load data.

In an embodiment of the present invention, as shown in FIG. 2, the step S101, which is the step of obtaining load data, and determining the operator's operation data sequence based on the load data, includes the following steps S201-S203:

In step S201, obtain load log data;

In step S202, determine the operator identification information in the load log data;

In step S203, the load log data corresponding to the operator identification information is obtained based on the operator identification information, and the operator operation data sequence corresponding to the operator identification information is obtained.

In order to obtain the operator's operation data sequence, and then obtain the operator's behavior information, in this embodiment, the operator's operation data sequence is analyzed and mined from the load data. Specifically, first obtain load log data; then determine the operator identification information appearing in the load log data; finally obtain the load log data corresponding to the operator identification information based on the operator identification information, and compare it with the The load log data corresponding to the operator identification information is combined to obtain the operator operation data sequence corresponding to the operator identification information.

In an embodiment of the present invention, when combining the load log data corresponding to the operator identification information, the combination can be done in chronological order, or in the order in which a certain field appears, or according to a certain field. Combine the fields in the order of frequency. Those skilled in the art can select an appropriate data combination method according to the needs of the actual application and the characteristics of the combined data, which is not specifically limited in the present disclosure.

In an embodiment of the present invention, as shown in FIG. 3, the step S102, which is the step of clustering the operator operation data sequence to obtain one or more operator class groups, includes the following steps S301-S302 :

In step S301, the distance matrix of the operation data sequence of the operator is calculated;

In step S302, cluster the operator operation data sequence according to the distance matrix to obtain one or more operator class groups.

In order to ensure the correctness of the load test data, use as little operator operation data as possible to generate load test data, reduce the amount of load test calculations, and improve the efficiency of load test work. In this embodiment, the operators are gathered together. The class method facilitates the subsequent acquisition of representative operators and their corresponding load test data based on the clustering class group, and finally obtains the load test data for the load test. Among them, clustering refers to the process of dividing a collection of physical or abstract objects into multiple classes composed of similar objects, that is, the cluster generated by the clustering operation is a collection of a set of data objects. Objects in one cluster are similar to each other, but different from objects in other clusters.

In one embodiment of the present invention, a hierarchical clustering method is used to cluster the operator operation data sequence based on the Pearson distance, and the final clustering result can be displayed in a tree diagram. In the clustering process, Use the Calinski-Harabasz stopping rule to cut the dendrogram and determine the final number of clusters. Specifically, the following steps are adopted to implement clustering: firstly, the distance matrix of the operator's operation data sequence is calculated; then the operator's operation data sequence is clustered according to the distance matrix to obtain one or more operator class groups , The operators in the same operator group can be considered that their operation behaviors are relatively similar, and then a representative operator in each operator group can be obtained to represent the behavior of the operator group.

In an embodiment of the present invention, before clustering, the parameters used by the clustering method can also be initialized.

Of course, the above-mentioned hierarchical clustering method is only illustrative. In practical applications, other clustering methods can also be used, such as partition-based clustering algorithms, density-based clustering algorithms, and distribution-based clustering algorithms. and many more. In the cluster stop judgment, in addition to the Calinski-Harabasz stop rule in the example above, other cluster stop judgment methods can also be used, such as the Silhouette contour coefficient stop rule, the Davies-Bouldin stop rule and so on. Those skilled in the art can select an appropriate clustering method and cluster stop judgment rule according to actual application requirements and the characteristics of the objects to be clustered, which are not particularly limited in the present disclosure.

In an embodiment of the present invention, as shown in FIG. 4, the step S301, that is, the step of calculating the distance matrix of the operator's operation data sequence, includes the following steps S401-S403:

In step S401, construct an operator operation data sequence matrix based on the operator operation data sequence;

In step S402, a similarity matrix of the operator's operation data sequence is generated based on the operator's operation data sequence;

In step S403, the operator's operation data sequence matrix and the operator's operation data sequence similarity matrix are multiplied to obtain the distance matrix of the operator.

In this embodiment, when calculating the distance matrix of the operator's operation data sequence, the operator's operation data sequence matrix is first constructed based on the operator's operation data sequence. For example, the corresponding operator can be assigned according to the operator identification information. The operation data sequence is combined into an operator operation data sequence matrix; then an operator operation data sequence similarity matrix is generated based on the operator operation data sequence, wherein the operator operation data sequence similarity matrix is used to represent the operator operation data The degree of similarity between the sequences; finally, the operator’s operation data sequence matrix is multiplied by the operator’s operation data sequence similarity matrix to obtain the distance matrix of the operator, and the distance matrix obtained by this method It not only considers the similarity between the operator and the operator, but also considers the similarity between the operating data sequences of all operators.

In an embodiment of the present invention, the step S402, that is, the step of generating a similarity matrix of the operator's operation data sequence based on the operator's operation data sequence, can be implemented as:

Determine frequent sequences in the operator's operation data sequence;

Calculating the edit distance between the frequent sequences;

Considering that the frequent sequence is representative to a certain extent and can represent the characteristics of the data in the data sequence set, therefore, in this embodiment, the operator operation data sequence similarity matrix is generated based on the frequent sequence. Specifically, first determine the frequent sequence in the operator's operation data sequence, where the frequent sequence refers to a data sequence whose frequency of occurrence is higher than a preset frequency threshold, which can represent the characteristics of the data to a certain extent; and then calculate For the edit distance between the frequent sequences, for example, the Levenshtein method may be used to calculate the edit distance between the frequent sequences, of course, other edit distance calculation methods may also be used, and the present disclosure does not specifically limit the specific edit distance calculation method ; Finally, the similarity between the operating data sequences of the operator is calculated according to the editing distance between the frequent sequences, the closer the editing distance is, the higher the similarity is, based on the operating data sequence of the operator The similarity between the two can generate the similarity matrix of the operation data sequence of the operator.

In the above example, the Levenshtein edit distance similarity calculation method is used. Of course, other string similarity calculation methods can also be used, such as the cosine similarity calculation method and the Jaccard coefficient similarity calculation method, etc., depending on the actual situation. The application needs and the characteristics of the data sequence select an appropriate similarity calculation method, which is not specifically limited in the present disclosure.

In an embodiment of the present invention, as shown in FIG. 5, the step S103 is to determine a target operator in the operator group that meets a preset condition, and generate a load test according to the data sequence of the target operator The steps of data load test include the following steps S501-S505:

In step S501, determine a target operator in the operator group that meets a preset condition;

In step S502, obtain the frequent sequence of the target operator and its appearance frequency;

In step S503, the occurrence probability of the frequent sequence is calculated according to the occurrence frequency of the frequent sequence;

In step S504, load test data is generated according to the occurrence probability of the frequent sequence;

In step S505, the load test data is replayed and run to perform a load test.

As mentioned above, in order to use as little operator operation data as possible to generate load test data under the premise of ensuring the correctness of the load test data, reduce the amount of load test calculations, and improve the efficiency of load test work, the present disclosure gathers operators Class, based on the clustering class group to obtain representative operators and their corresponding load test data to obtain the load test data for the final load test. In this embodiment, after obtaining one or more operator groups, first select a target operator in the operator group that meets a preset condition, where the preset condition refers to a preset representative point The condition is that the selected target operator is a representative operator in the corresponding operator group; then the frequent sequence of the target operator and its appearance frequency are obtained, wherein the frequent sequence can be obtained according to the method described above The present disclosure will not go into details here. While acquiring frequent sequences, the frequency of occurrence of frequent sequences can be acquired accordingly; the occurrence probability of the frequent sequences is calculated according to the frequency of occurrence of the frequent sequences, for example, the target operator’s The frequency of occurrence of the frequent sequence is divided by the total number of the target operator’s data sequence to obtain the occurrence probability of the target operator’s frequent sequence; then the load test data is generated according to the occurrence probability of the frequent sequence; finally the replay operation is performed Load test data to implement load test.

In an embodiment of the present invention, the central point algorithm (Partitioning Around Medoids, PAM) can be used to identify the representative operators in each operator class group. The central point algorithm is a clustering algorithm based on k-center points. Has strong robustness and accuracy. Of course, other methods can also be used to select the representative operator. Those skilled in the art can select an appropriate method for identifying the representative operator according to actual application requirements and characteristics of operator group data, which is not specifically limited in the present disclosure.

In an embodiment of the present invention, after the occurrence probability of the frequent sequence of the target operator is obtained, since the target operator is representative in its operator group, the frequent sequence of the target operator can be used And its occurrence probability to replace the operation data sequence and its occurrence probability of other operators in the corresponding operator class group, and then generate the load test data that will realize the load test work. For example, if the frequent sequence of the target operator of a certain operator group is search delete new sequence, search edit sequence and new edit sequence, the corresponding occurrence probability is 50%, 25% and 25%, then the above can be used The frequent sequence and its occurrence probability replace the operation data sequence and its occurrence probability of other operators in the operator class group. Assuming that there are two operators besides the target operator: operator 1 and operator 2, then the final load is generated The test data can be:

Target operator: search and delete new sequence, 50%; search and edit sequence, 25%; add and edit sequence, 25%;

Operator 1: Search and delete new sequence, 50%; search and edit sequence, 25%; add and edit sequence, 25%;

Operator 2 searches and deletes new sequences, 50%; searches and edits sequences, 25%; adds and edits sequences, 25%.

In one embodiment of the present invention, the load test data can be replayed and run with the help of the replay tool FIO or JMeter to realize the load test. In the load test process, after the load test data is replayed and run, the test performance data of the test system is recorded, Compare it with the performance data of the original load operation to get the load test result.

Fig. 6 shows a flowchart of a load clustering method according to an embodiment of the present invention. As shown in Fig. 6, the load clustering method includes the following steps S601-S602:

In step S601, obtain load data, and determine an operator operation data sequence based on the load data;

In step S602, perform clustering on the operator operation data sequence to obtain one or more operator clusters.

The above-mentioned technical features in this embodiment have been explained in detail above, and the present disclosure will not repeat them here.

The following are device embodiments of the present invention, which can be used to implement the method embodiments of the present invention.

Fig. 7 shows a structural block diagram of a load testing device according to an embodiment of the present invention. The device can be implemented as part or all of an electronic device through software, hardware, or a combination of both. As shown in Figure 7, the load test device includes:

The determining module 701 is configured to obtain load data, and determine an operator operation data sequence based on the load data;

The first clustering module 702 is configured to cluster the operator operation data sequence to obtain one or more operator clusters;

The generating module 703 is configured to determine a target operator in the operator group that meets a preset condition, and generate load test data for load test according to the data sequence of the target operator.

Considering the above problems, in this embodiment, a load testing device is proposed. The device obtains an operator's operation data sequence based on load data, and obtains one or more operators by clustering the operator's operation data sequence. The class group generates load test data for load test according to the data sequence of the target operator that meets the preset conditions in the operator class group. This technical solution uses a sequence of operator events and context information to help restore the workload, so as to achieve workload recovery at different levels of operator behavior granularity. In addition, clustering methods are used to mine representative operators, thereby With a small number of users, workload generation can be achieved.

In an embodiment of the present invention, as shown in FIG. 8, the determining module 701 includes:

The first obtaining submodule 801 is configured to obtain load log data;

The first determining submodule 802 is configured to determine the operator identification information in the load log data;

The second acquisition submodule 803 is configured to acquire load log data corresponding to the operator identification information based on the operator identification information, and obtain an operator operation data sequence corresponding to the operator identification information.

In order to obtain the operator's operation data sequence, and then obtain the operator's behavior information, in this embodiment, the operator's operation data sequence is analyzed and mined from the load data. Specifically, the first obtaining sub-module 801 obtains load log data; the first determining sub-module 802 determines the operator identification information appearing in the load log data; the second obtaining sub-module 803 obtains the data based on the operator identification information. The load log data corresponding to the operator identification information is combined with the load log data corresponding to the operator identification information to obtain the operator operation data sequence corresponding to the operator identification information.

In an embodiment of the present invention, when the second obtaining submodule 803 combines the load log data corresponding to the operator identification information, the combination may be performed in chronological order or in the order in which a certain field appears. Combine, or combine in the order of frequency of occurrence of a certain field. Those skilled in the art can select an appropriate data combination method according to the needs of the actual application and the characteristics of the combined data, which is not specifically limited in the present disclosure.

In an embodiment of the present invention, as shown in FIG. 9, the first clustering module 702 includes:

The first calculation sub-module 901 is configured to calculate the distance matrix of the operation data sequence of the operator;

The clustering sub-module 902 is configured to cluster the operator operation data sequence according to the distance matrix to obtain one or more operator clusters.

In order to use as few operator operation data as possible to generate load test data under the premise of ensuring the correctness of the load test data, reduce the amount of load test calculations, and improve the work efficiency of the load test, in this embodiment, the first clustering The module 702 uses a method of clustering operators to facilitate subsequent acquisition of representative operators and their corresponding load test data based on clustering groups, and finally obtains load test data for load testing. Among them, clustering refers to the process of dividing a collection of physical or abstract objects into multiple classes composed of similar objects, that is, the cluster generated by the clustering operation is a collection of a set of data objects. Objects in one cluster are similar to each other, but different from objects in other clusters.

In an embodiment of the present invention, the first clustering module 702 uses a hierarchical clustering method to cluster the operator operation data sequence based on the Pearson distance, and the final clustering result can be displayed in a tree diagram. In the clustering process, the Calinski-Harabasz stopping rule is used to cut the dendrogram and determine the final number of clusters. Specifically, the following solution is adopted to implement clustering: the first calculation submodule 901 calculates the distance matrix of the operator operation data sequence; the clustering submodule 902 performs clustering on the operator operation data sequence according to the distance matrix, Obtain one or more operator class groups. Operators in the same operator class group can be considered to have similar operating behaviors, and then representative operators in each operator class group can be obtained to represent the operator class group the behavior of.

In an embodiment of the present invention, as shown in FIG. 10, the first calculation submodule 901 includes:

The construction sub-module 1001 is configured to construct an operator operation data sequence matrix based on the operator operation data sequence;

The first generating sub-module 1002 is configured to generate an operator operation data sequence similarity matrix based on the operator operation data sequence;

The multiplication sub-module 1003 is configured to multiply the operation data sequence matrix of the operator and the similarity matrix of the operation data sequence of the operator to obtain the distance matrix of the operator.

In this embodiment, when the first calculation submodule 901 calculates the distance matrix of the operator operation data sequence, the construction submodule 1001 first constructs the operator operation data sequence matrix based on the operator operation data sequence, such as , The corresponding operator operation data sequence can be combined into an operator operation data sequence matrix according to the operator identification information; the first generation sub-module 1002 then generates the operator operation data sequence similarity matrix based on the operator operation data sequence, where The operator operation data sequence similarity matrix is used to characterize the similarity between the operator operation data sequences; the multiplication sub-module 1003 finally makes the operator operation data sequence matrix similar to the operator operation data sequence The distance matrix of the operator can be obtained by multiplying the degree matrix. The distance matrix obtained by this method not only considers the similarity between the operator and the operator, but also considers the difference between the operation data sequences of all operators. Similarity.

In an embodiment of the present invention, the first generation submodule 1002 may be configured as:

Determine frequent sequences in the operator's operation data sequence;

Calculating the edit distance between the frequent sequences;

Considering that frequent sequences are representative to a certain extent and can represent the characteristics of the data in the data sequence set, therefore, in this embodiment, the first generation sub-module 1002 generates operator operation data based on frequent sequences Sequence similarity matrix. Specifically, first determine the frequent sequence in the operator's operation data sequence, where the frequent sequence refers to a data sequence whose frequency of occurrence is higher than a preset frequency threshold, which can represent the characteristics of the data to a certain extent; and then calculate For the edit distance between the frequent sequences, for example, the Levenshtein method may be used to calculate the edit distance between the frequent sequences, of course, other edit distance calculation methods may also be used, and the present disclosure does not specifically limit the specific edit distance calculation method ; Finally, the similarity between the operating data sequences of the operator is calculated according to the editing distance between the frequent sequences, the closer the editing distance is, the higher the similarity is, based on the operating data sequence of the operator The similarity between the two can generate the similarity matrix of the operation data sequence of the operator.

In an embodiment of the present invention, as shown in FIG. 11, the generating module 703 includes:

The third determining submodule 1101 is configured to determine a target operator in the operator group that meets a preset condition;

The third obtaining submodule 1102 is configured to obtain the frequent sequence of the target operator and its appearance frequency;

The third calculation submodule 1103 is configured to calculate the occurrence probability of the frequent sequence according to the occurrence frequency of the frequent sequence;

The third generation submodule 1104 is configured to generate load test data according to the occurrence probability of the frequent sequence;

The test sub-module 1105 is configured to replay and run the load test data to perform a load test.

As mentioned above, in order to use as little operator operation data as possible to generate load test data under the premise of ensuring the correctness of the load test data, reduce the amount of load test calculations, and improve the efficiency of load test work, the present disclosure gathers operators Class, based on the clustering class group to obtain representative operators and their corresponding load test data to obtain the load test data for the final load test. In this embodiment, after obtaining one or more operator class groups, the third determining submodule 1101 first selects a target operator in the operator class group that meets a preset condition, where the preset condition refers to Is the preset representative point condition, that is, the selected target operator is a representative operator in the corresponding operator class group; the third acquisition submodule 1102 then acquires the frequent sequence of the target operator and its appearance frequency, Among them, the frequent sequence can be obtained according to the method described above. This disclosure will not repeat it here. While obtaining the frequent sequence, the frequency of the frequent sequence can be obtained accordingly; the third calculation sub-module 1103 is based on the frequency of the frequent sequence Calculate the occurrence probability of the frequent sequence, for example, divide the occurrence frequency of the frequent sequence of the target operator by the total number of data sequences of the target operator to obtain the occurrence probability of the frequent sequence of the target operator; The three generation sub-module 1104 then generates load test data according to the occurrence probability of the frequent sequence; the test sub-module 1105 finally replays and runs the load test data to implement the load test.

In an embodiment of the present invention, the third determining sub-module 1101 may use a central point algorithm (Partitioning Around Medoids, PAM) to identify the representative operator in each operator class group. The central point algorithm is based on k -The clustering algorithm of the center point has strong robustness and accuracy. Of course, other methods can also be used to select the representative operator. Those skilled in the art can select an appropriate method for identifying the representative operator according to actual application requirements and characteristics of operator group data, which is not specifically limited in the present disclosure.

In an embodiment of the present invention, after obtaining the occurrence probability of the frequent sequence of the target operator, since the target operator is representative in its operator class group, the third generation submodule 1104 can use The frequent sequence and the occurrence probability of the target operator replace the operation data sequence and the occurrence probability of other operators in the corresponding operator group, thereby generating load test data that will implement the load test work. For example, if the frequent sequence of the target operator of a certain operator group is search delete new sequence, search edit sequence and new edit sequence, the corresponding occurrence probability is 50%, 25% and 25%, then the above can be used The frequent sequence and its occurrence probability replace the operation data sequence and its occurrence probability of other operators in the operator class group. Assuming that there are two operators besides the target operator: operator 1 and operator 2, then the final load is generated The test data can be:

In one embodiment of the present invention, the test sub-module 1105 can use the playback tool FIO or JMeter to replay and run the load test data to implement the load test. During the load test, after the load test data is replayed and run, the test is recorded. The test performance data of the system is compared with the performance data of the original load operation to obtain the load test result.

Fig. 12 shows a structural block diagram of a load clustering device according to an embodiment of the present invention. The device can be implemented as part or all of an electronic device through software, hardware, or a combination of the two. As shown in FIG. 12, the load clustering device includes:

The obtaining module 1201 is configured to obtain load data, and determine an operator operation data sequence based on the load data;

The second clustering module 1202 is configured to perform clustering on the operator operation data sequence to obtain one or more operator clusters.

Next, an application scenario is taken as an example to further illustrate the technical solution of the present invention. As shown in FIG. 13, in this application scenario, the load test device can be deployed in a distributed data system. One or more distributed data devices in the system, such as client 1301, perform load testing. In the distributed data system, multiple clients 1301 are respectively connected to a database 1302, the load test device 1303 obtains load data from the database 1302, and the determination module 1304 in the load test device 1303 is based on the The load data is determined to obtain the operator operation data sequence, the first clustering module 1305 in the load test device 1303 clusters the operator operation data sequence to obtain one or more operator class groups, the load test The generating module 1306 in the device 1303 determines the target operator in the operator class group that meets the preset conditions, and generates load test data according to the data sequence of the target operator to perform the load test, and finally obtains the load test result.

The embodiment of the present invention also discloses an electronic device. FIG. 14 shows a structural block diagram of an electronic device according to an embodiment of the present invention. As shown in FIG. 14, the electronic device 1400 includes a memory 1401 and a processor 1402; among them,

The memory 1401 is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor 1402 to implement any of the above method steps.

FIG. 15 is a schematic structural diagram of a computer system suitable for implementing the load test method according to the embodiment of the present invention.

As shown in FIG. 15, the computer system 1500 includes a processing unit 1501, which can execute the above-mentioned implementation according to a program stored in a read-only memory (ROM) 1502 or a program loaded from a storage portion 1508 into a random access memory (RAM) 1503 Various treatments in the way. In the RAM 1503, various programs and data required for the operation of the system 1500 are also stored. The processing unit 1501, ROM 1502, and RAM 1503 are connected to each other through a bus 1504. An input/output (I/O) interface 1505 is also connected to the bus 1504.

The following components are connected to the I/O interface 1505: an input part 1506 including a keyboard, a mouse, etc.; an output part 1507 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and speakers, etc.; a storage part 1508 including a hard disk, etc. ; And a communication section 1509 including a network interface card such as a LAN card, a modem, and the like. The communication section 1509 performs communication processing via a network such as the Internet. The driver 1510 is also connected to the I/O interface 1505 as needed. A removable medium 1511, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 1510 as required, so that the computer program read therefrom is installed into the storage portion 1508 as required. Wherein, the processing unit 1501 may be implemented as a processing unit such as CPU, GPU, FPAG, and NPU.

In particular, according to the embodiments of the present invention, the method described above may be implemented as a computer software program. For example, the embodiment of the present invention includes a computer program product, which includes a computer program tangibly contained on a readable medium thereof, and the computer program includes program code for executing the load test method. In such an embodiment, the computer program may be downloaded and installed from the network through the communication part 1509, and/or installed from the removable medium 1511.

The flowcharts and block diagrams in the drawings illustrate the possible implementation architecture, functions, and operations of the system, method, and computer program product according to various embodiments of the present invention. In this regard, each block in the route diagram or block diagram may represent a module, program segment, or part of the code, and the module, program segment, or part of the code contains one or more functions for realizing the specified logic function. Executable instructions. It should also be noted that in some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.

The units or modules involved in the embodiments described in the present invention can be implemented in software or hardware. The described units or modules may also be provided in the processor, and the names of these units or modules do not constitute a limitation on the units or modules themselves under certain circumstances.

As another aspect, the embodiments of the present invention also provide a computer-readable storage medium. The computer-readable storage medium may be the computer-readable storage medium included in the device described in the above-mentioned embodiment; or it may exist alone. , A computer-readable storage medium that is not installed in the device. The computer-readable storage medium stores one or more programs, and the programs are used by one or more processors to execute the methods described in the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention and an explanation of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in the embodiments of the present invention is not limited to the technical solution formed by the specific combination of the above technical features, and should also cover the above-mentioned technical solutions without departing from the inventive concept. Other technical solutions formed by any combination of technical features or their equivalent features. For example, the above-mentioned features and the technical features disclosed in the embodiments of the present invention (but not limited to) having similar functions are replaced with each other to form a technical solution.

Claims

A load testing method is characterized in that it comprises:

Acquiring load data, and determining an operator operation data sequence based on the load data;

Perform clustering on the operator operation data sequence to obtain one or more operator clusters;

Determine the target operator in the operator class group that meets the preset condition, and generate load test data according to the data sequence of the target operator to perform the load test.
The method according to claim 1, wherein the load data is load log data or simulated load data or real-time load data.
The method according to claim 2, wherein said obtaining load data and determining an operator operation data sequence based on said load data comprises:

Obtain load log data;

Determine the operator identification information in the load log data;

The load log data corresponding to the operator identification information is acquired based on the operator identification information, and the operator operation data sequence corresponding to the operator identification information is obtained.
The method according to any one of claims 1 to 3, wherein the clustering the operator operation data sequence to obtain one or more operator clusters comprises:

Calculating the distance matrix of the operation data sequence of the operator;

Perform clustering on the operator operation data sequence according to the distance matrix to obtain one or more operator class groups.
The method according to claim 4, wherein the calculating the distance matrix of the operation data sequence of the operator comprises:

Constructing an operator's operation data sequence matrix based on the operator's operation data sequence;

Generating a similarity matrix of the operator's operation data sequence based on the operator's operation data sequence;

The operator's operation data sequence matrix and the operator's operation data sequence similarity matrix are multiplied to obtain the distance matrix of the operator.
The method according to claim 5, wherein said generating an operator operation data sequence similarity matrix based on said operator operation data sequence is implemented as:

Determine frequent sequences in the operator's operation data sequence;

Calculating the edit distance between the frequent sequences;

The similarity between the operator's operation data sequences is determined according to the edit distance between the frequent sequences, and the operator's operation data sequence similarity matrix is generated.
The method according to claim 1, wherein the determining a target operator in the operator class group that meets a preset condition, and generating load test data according to the data sequence of the target operator to perform the load test, include:

Determine a target operator in the operator group that meets a preset condition;

Acquiring the frequent sequence and frequency of occurrence of the target operator;

Calculating the occurrence probability of the frequent sequence according to the occurrence frequency of the frequent sequence;

Generating load test data according to the occurrence probability of the frequent sequence;

Play back and run the load test data to perform a load test.
A load clustering method is characterized in that it comprises:

Acquiring load data, and determining an operator operation data sequence based on the load data;

Perform clustering on the operator operation data sequence to obtain one or more operator clusters.
A load testing device is characterized in that it comprises:

A determining module, configured to obtain load data, and determine an operator operation data sequence based on the load data;

The first clustering module is configured to perform clustering on the operator operation data sequence to obtain one or more operator clusters;

The generating module is configured to determine a target operator in the operator group that meets a preset condition, and generate load test data according to the data sequence of the target operator to perform the load test.
The device according to claim 9, wherein the load data is load log data or simulated load data or real-time load data.
The device according to claim 10, wherein the determining module comprises:

The first obtaining submodule is configured to obtain load log data;

The first determining submodule is configured to determine the operator identification information in the load log data;

The second acquisition submodule is configured to acquire load log data corresponding to the operator identification information based on the operator identification information, and obtain an operator operation data sequence corresponding to the operator identification information.
The device according to any one of claims 9-11, wherein the first clustering module comprises:

The first calculation sub-module is configured to calculate the distance matrix of the operation data sequence of the operator;

The clustering sub-module is configured to perform clustering on the operator operation data sequence according to the distance matrix to obtain one or more operator clusters.
The device according to claim 12, wherein the first calculation sub-module comprises:

A construction sub-module configured to construct an operator operation data sequence matrix based on the operator operation data sequence;

The first generating sub-module is configured to generate an operator operation data sequence similarity matrix based on the operator operation data sequence;

The multiplication sub-module is configured to multiply the operator's operation data sequence matrix and the operator's operation data sequence similarity matrix to obtain the distance matrix of the operator.
The device according to claim 13, wherein the first generating submodule is configured to:

The second determining submodule is configured to determine frequent sequences in the operator's operation data sequence;

The second calculation sub-module is configured to calculate the edit distance between the frequent sequences;

The second generation sub-module is configured to determine the similarity between the operator operation data sequences according to the edit distance between the frequent sequences, and generate the operator operation data sequence similarity matrix.
The device according to claim 9, wherein the generating module comprises:

The third determining submodule is configured to determine a target operator in the operator class group that meets a preset condition;

The third obtaining sub-module is configured to obtain the frequent sequence of the target operator and its appearance frequency;

The third calculation sub-module is configured to calculate the occurrence probability of the frequent sequence according to the occurrence frequency of the frequent sequence;

The third generation sub-module is configured to generate load test data according to the occurrence probability of the frequent sequence;

The test sub-module is configured to replay and run the load test data for load test.
A load clustering device is characterized in that it comprises:

An obtaining module configured to obtain load data, and determine an operator operation data sequence based on the load data;

The second clustering module is configured to cluster the operator operation data sequence to obtain one or more operator clusters.
An electronic device, which is characterized by comprising a memory and a processor; wherein,

The memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of any one of claims 1-8.
A computer-readable storage medium having computer instructions stored thereon, wherein the computer instructions implement the method steps of any one of claims 1-8 when the computer instructions are executed by a processor.