WO2021047575A1 - Load testing method and apparatus, and electronic device and computer-readable storage medium - Google Patents

Load testing method and apparatus, and electronic device and computer-readable storage medium Download PDF

Info

Publication number
WO2021047575A1
WO2021047575A1 PCT/CN2020/114411 CN2020114411W WO2021047575A1 WO 2021047575 A1 WO2021047575 A1 WO 2021047575A1 CN 2020114411 W CN2020114411 W CN 2020114411W WO 2021047575 A1 WO2021047575 A1 WO 2021047575A1
Authority
WO
WIPO (PCT)
Prior art keywords
operator
load
operation data
data sequence
sequence
Prior art date
Application number
PCT/CN2020/114411
Other languages
French (fr)
Chinese (zh)
Inventor
林江彬
王勇
陈金富
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2021047575A1 publication Critical patent/WO2021047575A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2066Optimisation of the communication load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software

Definitions

  • the embodiment of the present invention relates to the technical field of data testing, in particular to a load testing method, device, electronic equipment, and computer-readable storage medium.
  • the goal of load testing is to ensure that the software system performs well under actual workloads.
  • the workload needs to be restored first, and then the load test is designed according to the restored workload.
  • Recovering load workloads is a challenging task because of the need to strike a balance between the level of granularity of the workload and the cost of using such workloads for load testing. If the restored workload is too rough, that is, the workload is too simplified.
  • the SPECweb96 benchmark defines a workload that only specifies the probability of accessing the file, such as "the file is less than 1KB, accounting for 35% of all requests", then the recovery The workload cannot capture the difference in user behavior, which leads to the loss of representativeness of the load test; if the workload gradually replays the exact field workload, although the exact user behavior can be reproduced, the cost of maintaining the workload is very high. This is because the software system has a large number of users. Replaying the exact workload requires load testing to simulate a large amount of contextual information for each user. It is also necessary to develop simulation code for each specific sequence of events. At the same time, it is almost impossible to twice Observe the exact same workload, so you need to constantly update the workload.
  • the prior art usually designs the workload based on the representative user behaviors of a small number of clusters, and when aggregating users, it is usually implemented based on the frequency of operations of different users.
  • the sequence and context of user operations can make the workload more representative. For example, one user repeatedly reads small blocks of data from a file, and then writes each small block back to the file, while another user reads it interactively. And write a large amount of small pieces of data to the file. If you only consider the frequency of operations such as reads and writes, you cannot distinguish the workloads of these two users, but adding more detailed information about these user operations will result in high recovery, execution, and maintenance costs.
  • the embodiments of the present invention provide a load testing method, device, electronic equipment, and computer-readable storage medium.
  • the load test method includes:
  • the load data is load log data or simulated load data or real-time load data.
  • the acquiring load data, and determining the operator operation data sequence based on the load data include:
  • the operation data sequence of the operator is performed Clustering to obtain one or more operator groups, including:
  • the calculation The distance matrix of the operator's operation data sequence including:
  • the present disclosure is described in the first aspect.
  • the generation of the operator operation data sequence similarity matrix based on the operator operation data sequence is implemented as:
  • the similarity between the operator's operation data sequences is determined according to the edit distance between the frequent sequences, and the operator's operation data sequence similarity matrix is generated.
  • the target operator in the operator class group that meets a preset condition is determined, and the load is generated according to the data sequence of the target operator Test data for load testing, including:
  • an embodiment of the present invention provides a load clustering method.
  • the load clustering method includes:
  • an embodiment of the present invention provides a load testing device.
  • the load test device includes:
  • a determining module configured to obtain load data, and determine an operator operation data sequence based on the load data
  • the first clustering module is configured to perform clustering on the operator operation data sequence to obtain one or more operator clusters
  • the generating module is configured to determine a target operator in the operator group that meets a preset condition, and generate load test data according to the data sequence of the target operator to perform the load test.
  • the load data is load log data.
  • the first obtaining submodule is configured to obtain load log data
  • the second acquisition submodule is configured to acquire load log data corresponding to the operator identification information based on the operator identification information, and obtain an operator operation data sequence corresponding to the operator identification information.
  • the first clustering module includes:
  • the first calculation sub-module is configured to calculate the distance matrix of the operation data sequence of the operator
  • the first calculation sub-module includes:
  • a construction sub-module configured to construct an operator operation data sequence matrix based on the operator operation data sequence
  • the first generating sub-module is configured to generate an operator operation data sequence similarity matrix based on the operator operation data sequence
  • the first generation submodule is configured to:
  • the second determining submodule is configured to determine frequent sequences in the operator's operation data sequence
  • the second calculation sub-module is configured to calculate the edit distance between the frequent sequences
  • the second generation sub-module is configured to determine the similarity between the operator operation data sequences according to the edit distance between the frequent sequences, and generate the operator operation data sequence similarity matrix.
  • the generating module includes:
  • the third obtaining sub-module is configured to obtain the frequent sequence of the target operator and its appearance frequency
  • the third calculation sub-module is configured to calculate the occurrence probability of the frequent sequence according to the occurrence frequency of the frequent sequence
  • the third generation sub-module is configured to generate load test data according to the occurrence probability of the frequent sequence
  • the test sub-module is configured to replay and run the load test data for load test.
  • an embodiment of the present invention provides a load testing device.
  • the load test device includes:
  • An obtaining module configured to obtain load data, and determine an operator operation data sequence based on the load data
  • the second clustering module is configured to cluster the operator operation data sequence to obtain one or more operator clusters.
  • an embodiment of the present invention provides an electronic device, including a memory and a processor, the memory is used to store one or more load testing devices/load clustering devices that support the execution of the load testing method/load clustering method described above
  • the processor is configured to execute the computer instructions stored in the memory.
  • the load testing device/load clustering device may further include a communication interface for the load testing device/load clustering device to communicate with other equipment or a communication network.
  • an embodiment of the present invention provides a computer-readable storage medium for storing computer instructions used by the load test device/load clustering device, which includes the load test method/load clustering method used for executing the load test method/load clustering method. Computer instructions related to the test device/load clustering device.
  • the above technical solution obtains the operator operation data sequence based on the load data, and obtains one or more operator groups by clustering the operator operation data sequences. According to the target operator in the operator group group that meets the preset conditions The data sequence generates load test data for load test.
  • This technical solution uses a sequence of operator events and context information to help restore the workload, so as to achieve workload recovery at different levels of operator behavior granularity.
  • clustering methods are used to mine representative operators, thereby With a small number of users, workload generation can be achieved.
  • Fig. 1 shows a flowchart of a load test method according to an embodiment of the present invention
  • FIG. 2 shows a flowchart of step S101 of the load testing method according to the embodiment shown in FIG. 1;
  • FIG. 3 shows a flowchart of step S102 of the load testing method according to the embodiment shown in FIG. 1;
  • FIG. 4 shows a flowchart of step S301 of the load test method according to the embodiment shown in FIG. 3;
  • FIG. 5 shows a flowchart of step S103 of the load testing method according to the embodiment shown in FIG. 1;
  • Fig. 6 shows a flowchart of a load clustering method according to an embodiment of the present invention
  • FIG. 7 shows a structural block diagram of a load testing device according to an embodiment of the present invention.
  • FIG. 8 shows a structural block diagram of the determination module 701 of the load test device according to the embodiment shown in FIG. 7;
  • FIG. 9 shows a structural block diagram of the first clustering module 702 of the load testing device according to the embodiment shown in FIG. 7;
  • FIG. 10 shows a structural block diagram of the first calculation sub-module 901 of the load testing device according to the embodiment shown in FIG. 9;
  • FIG. 11 shows a structural block diagram of the generation module 703 of the load test device according to the embodiment shown in FIG. 7;
  • FIG. 12 shows a structural block diagram of a load clustering device according to an embodiment of the present invention.
  • FIG. 13 shows a schematic diagram of an application scenario according to an embodiment of the present invention.
  • FIG. 14 shows a structural block diagram of an electronic device according to an embodiment of the present invention.
  • FIG. 15 is a schematic structural diagram of a computer system suitable for implementing a load test method according to an embodiment of the present invention.
  • the technical solution provided by the embodiment of the present invention obtains the operator operation data sequence based on load data, and obtains one or more operator class groups by clustering the operator operation data sequence, and meets the preset requirements in the operator class group
  • the data sequence of the target operator of the condition generates load test data for load test.
  • This technical solution uses a sequence of operator events and context information to help restore the workload, so as to achieve workload recovery at different levels of operator behavior granularity.
  • clustering methods are used to mine representative operators, thereby With a small number of users, workload generation can be achieved.
  • Fig. 1 shows a flow chart of a load test method according to an embodiment of the present invention. As shown in Fig. 1, the load test method includes the following steps S101-S103:
  • step S101 obtain load data, and determine an operator operation data sequence based on the load data
  • step S102 perform clustering on the operator operation data sequence to obtain one or more operator clusters
  • step S103 a target operator in the operator group that meets a preset condition is determined, and load test data is generated according to the data sequence of the target operator to perform the load test.
  • load testing is usually used to ensure software systems in practice.
  • the running quality under load The goal of load testing is to ensure that the software system performs well under actual workloads.
  • the workload needs to be restored first, and then the load test is designed according to the restored workload. Recovering load workloads is a challenging task because of the need to strike a balance between the level of granularity of the workload and the cost of using such workloads for load testing.
  • the restored workload is too rough, that is, the workload is too simplified, the restored workload cannot capture the difference in user behavior, resulting in the loss of representativeness of the load test; if the workload gradually replays the exact field workload, it can be reproduced.
  • Exact user behavior but the cost of maintaining the workload is very high.
  • workloads are usually designed based on representative user behaviors of a small number of clusters, and when users are aggregated, they are usually implemented based on the frequency of operations of different users.
  • it is too rough to only consider the frequency of events. Adding more detailed information about user operations will lead to excessive recovery, execution, and maintenance costs.
  • a load test method obtains operator operation data sequence based on load data, and obtains one or more operators by clustering the operator operation data sequence.
  • the class group generates load test data for load test according to the data sequence of the target operator that meets the preset conditions in the operator class group.
  • This technical solution uses a sequence of operator events and context information to help restore the workload, so as to achieve workload recovery at different levels of operator behavior granularity.
  • clustering methods are used to mine representative operators, thereby With a small number of users, workload generation can be achieved.
  • the load data refers to load data generated or generated based on the operation of the operator within a preset time period.
  • the operator refers to an operator such as a load operation user, a load operation machine, or a load operation resource.
  • the operation of the operator may be, for example, a search operation, a deletion operation, a new operation, an editing operation, and the like.
  • the load data may include one or more of the following data: operator identification information and load data corresponding to the operator identification information, including load workload, load content, load Processing results and so on.
  • the operator identification information is used to uniquely identify the operator.
  • the operator's operation data sequence refers to a data sequence composed of load data generated by a specific operator, and the operator's operation data sequence can reflect the characteristics of the operator's operation event and can reflect There is some contextual information associated with the operation event of the operator, where the characteristics of the operation event of the operator may include, for example, one or more of the following characteristics: the purpose of the operation event of the operator, the content of the operation event of the operator, and the operation of the operator Event effects, etc., where the context information may include, for example, one or more of the following information: operation event information of other operators related to the operator, and time sequence of the existence of the operation event with the operator Other operation event information of the operator and so on.
  • the operator's operation data sequence may be arranged according to a certain preset rule, for example, in chronological order, in the order of appearance of a certain field, or the order of appearance of a certain field, and so on. For example, according to the order in which a certain field appears, the operator can manipulate the data sequence for searching and deleting a new sequence, searching for an editing sequence, adding an editing sequence, and so on.
  • the load data is load log data.
  • the load data may also be simulated load data or real-time load data.
  • the step S101 which is the step of obtaining load data, and determining the operator's operation data sequence based on the load data, includes the following steps S201-S203:
  • step S201 obtain load log data
  • step S202 determine the operator identification information in the load log data
  • step S203 the load log data corresponding to the operator identification information is obtained based on the operator identification information, and the operator operation data sequence corresponding to the operator identification information is obtained.
  • the operator's operation data sequence is analyzed and mined from the load data. Specifically, first obtain load log data; then determine the operator identification information appearing in the load log data; finally obtain the load log data corresponding to the operator identification information based on the operator identification information, and compare it with the The load log data corresponding to the operator identification information is combined to obtain the operator operation data sequence corresponding to the operator identification information.
  • the combination when combining the load log data corresponding to the operator identification information, the combination can be done in chronological order, or in the order in which a certain field appears, or according to a certain field. Combine the fields in the order of frequency.
  • Those skilled in the art can select an appropriate data combination method according to the needs of the actual application and the characteristics of the combined data, which is not specifically limited in the present disclosure.
  • the step S102 which is the step of clustering the operator operation data sequence to obtain one or more operator class groups, includes the following steps S301-S302 :
  • step S301 the distance matrix of the operation data sequence of the operator is calculated
  • clustering refers to the process of dividing a collection of physical or abstract objects into multiple classes composed of similar objects, that is, the cluster generated by the clustering operation is a collection of a set of data objects. Objects in one cluster are similar to each other, but different from objects in other clusters.
  • a hierarchical clustering method is used to cluster the operator operation data sequence based on the Pearson distance, and the final clustering result can be displayed in a tree diagram.
  • the clustering process Use the Calinski-Harabasz stopping rule to cut the dendrogram and determine the final number of clusters.
  • the following steps are adopted to implement clustering: firstly, the distance matrix of the operator's operation data sequence is calculated; then the operator's operation data sequence is clustered according to the distance matrix to obtain one or more operator class groups , The operators in the same operator group can be considered that their operation behaviors are relatively similar, and then a representative operator in each operator group can be obtained to represent the behavior of the operator group.
  • the parameters used by the clustering method before clustering, can also be initialized.
  • clustering method is only illustrative. In practical applications, other clustering methods can also be used, such as partition-based clustering algorithms, density-based clustering algorithms, and distribution-based clustering algorithms. and many more.
  • cluster stop judgment in addition to the Calinski-Harabasz stop rule in the example above, other cluster stop judgment methods can also be used, such as the Silhouette contour coefficient stop rule, the Davies-Bouldin stop rule and so on.
  • Silhouette contour coefficient stop rule the Davies-Bouldin stop rule and so on.
  • Those skilled in the art can select an appropriate clustering method and cluster stop judgment rule according to actual application requirements and the characteristics of the objects to be clustered, which are not particularly limited in the present disclosure.
  • the step S301 that is, the step of calculating the distance matrix of the operator's operation data sequence, includes the following steps S401-S403:
  • step S401 construct an operator operation data sequence matrix based on the operator operation data sequence
  • step S402 a similarity matrix of the operator's operation data sequence is generated based on the operator's operation data sequence
  • step S403 the operator's operation data sequence matrix and the operator's operation data sequence similarity matrix are multiplied to obtain the distance matrix of the operator.
  • the operator's operation data sequence matrix is first constructed based on the operator's operation data sequence. For example, the corresponding operator can be assigned according to the operator identification information.
  • the operation data sequence is combined into an operator operation data sequence matrix; then an operator operation data sequence similarity matrix is generated based on the operator operation data sequence, wherein the operator operation data sequence similarity matrix is used to represent the operator operation data The degree of similarity between the sequences; finally, the operator’s operation data sequence matrix is multiplied by the operator’s operation data sequence similarity matrix to obtain the distance matrix of the operator, and the distance matrix obtained by this method It not only considers the similarity between the operator and the operator, but also considers the similarity between the operating data sequences of all operators.
  • the step S402 that is, the step of generating a similarity matrix of the operator's operation data sequence based on the operator's operation data sequence, can be implemented as:
  • the similarity between the operator's operation data sequences is determined according to the edit distance between the frequent sequences, and the operator's operation data sequence similarity matrix is generated.
  • Levenshtein edit distance similarity calculation method is used.
  • other string similarity calculation methods can also be used, such as the cosine similarity calculation method and the Jaccard coefficient similarity calculation method, etc., depending on the actual situation.
  • the application needs and the characteristics of the data sequence select an appropriate similarity calculation method, which is not specifically limited in the present disclosure.
  • the step S103 is to determine a target operator in the operator group that meets a preset condition, and generate a load test according to the data sequence of the target operator
  • the steps of data load test include the following steps S501-S505:
  • step S501 determine a target operator in the operator group that meets a preset condition
  • step S502 obtain the frequent sequence of the target operator and its appearance frequency
  • step S503 the occurrence probability of the frequent sequence is calculated according to the occurrence frequency of the frequent sequence
  • step S505 the load test data is replayed and run to perform a load test.
  • the present disclosure gathers operators Class, based on the clustering class group to obtain representative operators and their corresponding load test data to obtain the load test data for the final load test.
  • the preset condition refers to a preset representative point
  • the condition is that the selected target operator is a representative operator in the corresponding operator group; then the frequent sequence of the target operator and its appearance frequency are obtained, wherein the frequent sequence can be obtained according to the method described above
  • the frequency of occurrence of frequent sequences can be acquired accordingly; the occurrence probability of the frequent sequences is calculated according to the frequency of occurrence of the frequent sequences, for example, the target operator’s The frequency of occurrence of the frequent sequence is divided by the total number of the target operator’s data sequence to obtain the occurrence probability of the target operator’s frequent sequence; then the load test data is generated according to the occurrence probability of the frequent sequence; finally the replay operation is performed Load test data to implement load test.
  • the central point algorithm Partitioning Around Medoids, PAM
  • PAM Partitioning Around Medoids
  • the central point algorithm is a clustering algorithm based on k-center points. Has strong robustness and accuracy.
  • other methods can also be used to select the representative operator. Those skilled in the art can select an appropriate method for identifying the representative operator according to actual application requirements and characteristics of operator group data, which is not specifically limited in the present disclosure.
  • Target operator search and delete new sequence, 50%; search and edit sequence, 25%; add and edit sequence, 25%;
  • Operator 1 Search and delete new sequence, 50%; search and edit sequence, 25%; add and edit sequence, 25%;
  • Operator 2 searches and deletes new sequences, 50%; searches and edits sequences, 25%; adds and edits sequences, 25%.
  • the load test data can be replayed and run with the help of the replay tool FIO or JMeter to realize the load test.
  • the test performance data of the test system is recorded, Compare it with the performance data of the original load operation to get the load test result.
  • step S602 perform clustering on the operator operation data sequence to obtain one or more operator clusters.
  • Fig. 7 shows a structural block diagram of a load testing device according to an embodiment of the present invention.
  • the device can be implemented as part or all of an electronic device through software, hardware, or a combination of both.
  • the load test device includes:
  • the first clustering module 702 is configured to cluster the operator operation data sequence to obtain one or more operator clusters;
  • the generating module 703 is configured to determine a target operator in the operator group that meets a preset condition, and generate load test data for load test according to the data sequence of the target operator.
  • load testing is usually used to ensure software systems in practice.
  • the running quality under load The goal of load testing is to ensure that the software system performs well under actual workloads.
  • the workload needs to be restored first, and then the load test is designed according to the restored workload. Recovering load workloads is a challenging task because of the need to strike a balance between the level of granularity of the workload and the cost of using such workloads for load testing.
  • the restored workload is too rough, that is, the workload is too simplified, the restored workload cannot capture the difference in user behavior, resulting in the loss of representativeness of the load test; if the workload gradually replays the exact field workload, it can be reproduced.
  • Exact user behavior but the cost of maintaining the workload is very high.
  • workloads are usually designed based on representative user behaviors of a small number of clusters, and when users are aggregated, they are usually implemented based on the frequency of operations of different users.
  • it is too rough to only consider the frequency of events. Adding more detailed information about user operations will lead to excessive recovery, execution, and maintenance costs.
  • a load testing device obtains an operator's operation data sequence based on load data, and obtains one or more operators by clustering the operator's operation data sequence.
  • the class group generates load test data for load test according to the data sequence of the target operator that meets the preset conditions in the operator class group.
  • This technical solution uses a sequence of operator events and context information to help restore the workload, so as to achieve workload recovery at different levels of operator behavior granularity.
  • clustering methods are used to mine representative operators, thereby With a small number of users, workload generation can be achieved.
  • the load data refers to load data generated or generated based on the operation of the operator within a preset time period.
  • the operator refers to an operator such as a load operation user, a load operation machine, or a load operation resource.
  • the operation of the operator may be, for example, a search operation, a deletion operation, a new operation, an editing operation, and the like.
  • the load data may include one or more of the following data: operator identification information and load data corresponding to the operator identification information, including load workload, load content, load Processing results and so on.
  • the operator identification information is used to uniquely identify the operator.
  • the operator's operation data sequence refers to a data sequence composed of load data generated by a specific operator, and the operator's operation data sequence can reflect the characteristics of the operator's operation event and can reflect There is some contextual information associated with the operation event of the operator, where the characteristics of the operation event of the operator may include, for example, one or more of the following characteristics: the purpose of the operation event of the operator, the content of the operation event of the operator, and the operation of the operator Event effects, etc., where the context information may include, for example, one or more of the following information: operation event information of other operators related to the operator, and time sequence of the existence of the operation event with the operator Other operation event information of the operator and so on.
  • the load data is load log data.
  • the load data may also be simulated load data or real-time load data.
  • the first obtaining submodule 801 is configured to obtain load log data
  • the first determining submodule 802 is configured to determine the operator identification information in the load log data
  • the second acquisition submodule 803 is configured to acquire load log data corresponding to the operator identification information based on the operator identification information, and obtain an operator operation data sequence corresponding to the operator identification information.
  • the operator's operation data sequence is analyzed and mined from the load data. Specifically, the first obtaining sub-module 801 obtains load log data; the first determining sub-module 802 determines the operator identification information appearing in the load log data; the second obtaining sub-module 803 obtains the data based on the operator identification information. The load log data corresponding to the operator identification information is combined with the load log data corresponding to the operator identification information to obtain the operator operation data sequence corresponding to the operator identification information.
  • the combination when the second obtaining submodule 803 combines the load log data corresponding to the operator identification information, the combination may be performed in chronological order or in the order in which a certain field appears. Combine, or combine in the order of frequency of occurrence of a certain field.
  • Those skilled in the art can select an appropriate data combination method according to the needs of the actual application and the characteristics of the combined data, which is not specifically limited in the present disclosure.
  • the first clustering module 702 includes:
  • the first calculation sub-module 901 is configured to calculate the distance matrix of the operation data sequence of the operator
  • the clustering sub-module 902 is configured to cluster the operator operation data sequence according to the distance matrix to obtain one or more operator clusters.
  • the first clustering uses a method of clustering operators to facilitate subsequent acquisition of representative operators and their corresponding load test data based on clustering groups, and finally obtains load test data for load testing.
  • clustering refers to the process of dividing a collection of physical or abstract objects into multiple classes composed of similar objects, that is, the cluster generated by the clustering operation is a collection of a set of data objects. Objects in one cluster are similar to each other, but different from objects in other clusters.
  • the first clustering module 702 uses a hierarchical clustering method to cluster the operator operation data sequence based on the Pearson distance, and the final clustering result can be displayed in a tree diagram.
  • the Calinski-Harabasz stopping rule is used to cut the dendrogram and determine the final number of clusters.
  • the first calculation submodule 901 calculates the distance matrix of the operator operation data sequence; the clustering submodule 902 performs clustering on the operator operation data sequence according to the distance matrix, Obtain one or more operator class groups. Operators in the same operator class group can be considered to have similar operating behaviors, and then representative operators in each operator class group can be obtained to represent the operator class group the behavior of.
  • the parameters used by the clustering method before clustering, can also be initialized.
  • clustering method is only illustrative. In practical applications, other clustering methods can also be used, such as partition-based clustering algorithms, density-based clustering algorithms, and distribution-based clustering algorithms. and many more.
  • cluster stop judgment in addition to the Calinski-Harabasz stop rule in the example above, other cluster stop judgment methods can also be used, such as the Silhouette contour coefficient stop rule, the Davies-Bouldin stop rule and so on.
  • Silhouette contour coefficient stop rule the Davies-Bouldin stop rule and so on.
  • Those skilled in the art can select an appropriate clustering method and cluster stop judgment rule according to actual application requirements and the characteristics of the objects to be clustered, which are not particularly limited in the present disclosure.
  • the first calculation submodule 901 includes:
  • the construction sub-module 1001 is configured to construct an operator operation data sequence matrix based on the operator operation data sequence;
  • the first generating sub-module 1002 is configured to generate an operator operation data sequence similarity matrix based on the operator operation data sequence;
  • the multiplication sub-module 1003 is configured to multiply the operation data sequence matrix of the operator and the similarity matrix of the operation data sequence of the operator to obtain the distance matrix of the operator.
  • the construction submodule 1001 first constructs the operator operation data sequence matrix based on the operator operation data sequence, such as , The corresponding operator operation data sequence can be combined into an operator operation data sequence matrix according to the operator identification information; the first generation sub-module 1002 then generates the operator operation data sequence similarity matrix based on the operator operation data sequence, where The operator operation data sequence similarity matrix is used to characterize the similarity between the operator operation data sequences; the multiplication sub-module 1003 finally makes the operator operation data sequence matrix similar to the operator operation data sequence
  • the distance matrix of the operator can be obtained by multiplying the degree matrix. The distance matrix obtained by this method not only considers the similarity between the operator and the operator, but also considers the difference between the operation data sequences of all operators. Similarity.
  • the first generation submodule 1002 may be configured as:
  • the similarity between the operator's operation data sequences is determined according to the edit distance between the frequent sequences, and the operator's operation data sequence similarity matrix is generated.
  • the first generation sub-module 1002 generates operator operation data based on frequent sequences Sequence similarity matrix. Specifically, first determine the frequent sequence in the operator's operation data sequence, where the frequent sequence refers to a data sequence whose frequency of occurrence is higher than a preset frequency threshold, which can represent the characteristics of the data to a certain extent; and then calculate For the edit distance between the frequent sequences, for example, the Levenshtein method may be used to calculate the edit distance between the frequent sequences, of course, other edit distance calculation methods may also be used, and the present disclosure does not specifically limit the specific edit distance calculation method ; Finally, the similarity between the operating data sequences of the operator is calculated according to the editing distance between the frequent sequences, the closer the editing distance is, the higher the similarity is, based on the operating data sequence of the operator The similarity between the two can generate the similarity matrix of the operation data sequence of the operator
  • Levenshtein edit distance similarity calculation method is used.
  • other string similarity calculation methods can also be used, such as the cosine similarity calculation method and the Jaccard coefficient similarity calculation method, etc., depending on the actual situation.
  • the application needs and the characteristics of the data sequence select an appropriate similarity calculation method, which is not specifically limited in the present disclosure.
  • the generating module 703 includes:
  • the third determining submodule 1101 is configured to determine a target operator in the operator group that meets a preset condition
  • the third obtaining submodule 1102 is configured to obtain the frequent sequence of the target operator and its appearance frequency
  • the third calculation submodule 1103 is configured to calculate the occurrence probability of the frequent sequence according to the occurrence frequency of the frequent sequence;
  • the present disclosure gathers operators Class, based on the clustering class group to obtain representative operators and their corresponding load test data to obtain the load test data for the final load test.
  • the third determining submodule 1101 first selects a target operator in the operator class group that meets a preset condition, where the preset condition refers to Is the preset representative point condition, that is, the selected target operator is a representative operator in the corresponding operator class group; the third acquisition submodule 1102 then acquires the frequent sequence of the target operator and its appearance frequency, Among them, the frequent sequence can be obtained according to the method described above. This disclosure will not repeat it here.
  • the frequency of the frequent sequence can be obtained accordingly; the third calculation sub-module 1103 is based on the frequency of the frequent sequence Calculate the occurrence probability of the frequent sequence, for example, divide the occurrence frequency of the frequent sequence of the target operator by the total number of data sequences of the target operator to obtain the occurrence probability of the frequent sequence of the target operator; The three generation sub-module 1104 then generates load test data according to the occurrence probability of the frequent sequence; the test sub-module 1105 finally replays and runs the load test data to implement the load test.
  • the third determining sub-module 1101 may use a central point algorithm (Partitioning Around Medoids, PAM) to identify the representative operator in each operator class group.
  • the central point algorithm is based on k -The clustering algorithm of the center point has strong robustness and accuracy.
  • PAM Partitioning Around Medoids
  • Other methods can also be used to select the representative operator. Those skilled in the art can select an appropriate method for identifying the representative operator according to actual application requirements and characteristics of operator group data, which is not specifically limited in the present disclosure.
  • the third generation submodule 1104 can use The frequent sequence and the occurrence probability of the target operator replace the operation data sequence and the occurrence probability of other operators in the corresponding operator group, thereby generating load test data that will implement the load test work. For example, if the frequent sequence of the target operator of a certain operator group is search delete new sequence, search edit sequence and new edit sequence, the corresponding occurrence probability is 50%, 25% and 25%, then the above can be used The frequent sequence and its occurrence probability replace the operation data sequence and its occurrence probability of other operators in the operator class group. Assuming that there are two operators besides the target operator: operator 1 and operator 2, then the final load is generated The test data can be:
  • Target operator search and delete new sequence, 50%; search and edit sequence, 25%; add and edit sequence, 25%;
  • Operator 2 searches and deletes new sequences, 50%; searches and edits sequences, 25%; adds and edits sequences, 25%.
  • the test sub-module 1105 can use the playback tool FIO or JMeter to replay and run the load test data to implement the load test.
  • the test is recorded.
  • the test performance data of the system is compared with the performance data of the original load operation to obtain the load test result.
  • Fig. 12 shows a structural block diagram of a load clustering device according to an embodiment of the present invention.
  • the device can be implemented as part or all of an electronic device through software, hardware, or a combination of the two.
  • the load clustering device includes:
  • the obtaining module 1201 is configured to obtain load data, and determine an operator operation data sequence based on the load data;
  • the second clustering module 1202 is configured to perform clustering on the operator operation data sequence to obtain one or more operator clusters.
  • the load test device can be deployed in a distributed data system.
  • One or more distributed data devices in the system such as client 1301, perform load testing.
  • multiple clients 1301 are respectively connected to a database 1302, the load test device 1303 obtains load data from the database 1302, and the determination module 1304 in the load test device 1303 is based on the The load data is determined to obtain the operator operation data sequence, the first clustering module 1305 in the load test device 1303 clusters the operator operation data sequence to obtain one or more operator class groups, the load test The generating module 1306 in the device 1303 determines the target operator in the operator class group that meets the preset conditions, and generates load test data according to the data sequence of the target operator to perform the load test, and finally obtains the load test result.
  • FIG. 14 shows a structural block diagram of an electronic device according to an embodiment of the present invention.
  • the electronic device 1400 includes a memory 1401 and a processor 1402; among them,
  • the memory 1401 is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor 1402 to implement any of the above method steps.
  • FIG. 15 is a schematic structural diagram of a computer system suitable for implementing the load test method according to the embodiment of the present invention.
  • the computer system 1500 includes a processing unit 1501, which can execute the above-mentioned implementation according to a program stored in a read-only memory (ROM) 1502 or a program loaded from a storage portion 1508 into a random access memory (RAM) 1503 Various treatments in the way. In the RAM 1503, various programs and data required for the operation of the system 1500 are also stored.
  • the processing unit 1501, ROM 1502, and RAM 1503 are connected to each other through a bus 1504.
  • An input/output (I/O) interface 1505 is also connected to the bus 1504.
  • the following components are connected to the I/O interface 1505: an input part 1506 including a keyboard, a mouse, etc.; an output part 1507 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and speakers, etc.; a storage part 1508 including a hard disk, etc. ; And a communication section 1509 including a network interface card such as a LAN card, a modem, and the like. The communication section 1509 performs communication processing via a network such as the Internet.
  • the driver 1510 is also connected to the I/O interface 1505 as needed.
  • a removable medium 1511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 1510 as required, so that the computer program read therefrom is installed into the storage portion 1508 as required.
  • the processing unit 1501 may be implemented as a processing unit such as CPU, GPU, FPAG, and NPU.
  • the method described above may be implemented as a computer software program.
  • the embodiment of the present invention includes a computer program product, which includes a computer program tangibly contained on a readable medium thereof, and the computer program includes program code for executing the load test method.
  • the computer program may be downloaded and installed from the network through the communication part 1509, and/or installed from the removable medium 1511.
  • each block in the route diagram or block diagram may represent a module, program segment, or part of the code, and the module, program segment, or part of the code contains one or more functions for realizing the specified logic function.
  • Executable instructions may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the units or modules involved in the embodiments described in the present invention can be implemented in software or hardware.
  • the described units or modules may also be provided in the processor, and the names of these units or modules do not constitute a limitation on the units or modules themselves under certain circumstances.
  • the embodiments of the present invention also provide a computer-readable storage medium.
  • the computer-readable storage medium may be the computer-readable storage medium included in the device described in the above-mentioned embodiment; or it may exist alone.
  • the computer-readable storage medium stores one or more programs, and the programs are used by one or more processors to execute the methods described in the embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

A load testing method and apparatus, and an electronic device and a computer-readable storage medium. The method comprises: obtaining load data and determining operation data sequences of an operator on the basis of the load data (S101); clustering the operation data sequences of the operator to obtain one or more operator classification groups (S102); and determining a target operator satisfying a preset condition in the operator classification groups, and generating load testing data according to a data sequence of the target operator to perform load testing (S103). According to the method, a sequence combined by an operator event and context information is used for helping recover workload of a load, thereby implementing workload recovery in different operator behavior granularity levels; in addition, a representative operator is explored by means of a clustering method, so that the generation of workload can be implemented with the help of a small number of users.

Description

负载测试方法、装置、电子设备及计算机可读存储介质Load testing method, device, electronic equipment and computer readable storage medium
本申请要求2019年09月12日递交的申请号为201910866125.5、发明名称为“负载测试方法、装置、电子设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed on September 12, 2019 with the application number 201910866125.5 and the title of the invention "load test method, device, electronic equipment, and computer-readable storage medium", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本发明实施例涉及数据测试技术领域,具体涉及一种负载测试方法、装置、电子设备及计算机可读存储介质。The embodiment of the present invention relates to the technical field of data testing, in particular to a load testing method, device, electronic equipment, and computer-readable storage medium.
背景技术Background technique
随着数据技术和互联网技术的发展,越来越多的服务提供商通过软件系统为用户提供服务,其中不乏为数量众多的用户提供服务、对全球数十亿用户的日常生活产生重大影响的大型软件系统,比如亚马逊AWS、谷歌Gmail和Netflix。显然,这些大型软件系统的稳定运行非常重要,即使是细小的故障也会给用户带来不良的用户体验,数据的丢失以及收益的损失。因此,在实践中通常使用负载测试来确保软件系统在负载下的运行质量。With the development of data technology and Internet technology, more and more service providers provide services to users through software systems, and many of them provide services to a large number of users and have a major impact on the daily lives of billions of users around the world. Software systems, such as Amazon AWS, Google Gmail and Netflix. Obviously, the stable operation of these large-scale software systems is very important. Even minor faults will bring users a bad user experience, loss of data and loss of revenue. Therefore, load testing is usually used in practice to ensure the operating quality of the software system under load.
负载测试的目标是确保软件系统在现实工作负载下表现良好,为了达到这个目标,首先需要恢复工作负载,然后根据恢复的工作负载设计负载测试。恢复负载工作量是一项挑战性的任务,因为需要在工作负载的粒度级别与使用此类工作负载进行负载测试的成本之间取得平衡。若恢复的工作负载太粗糙,即工作负载过于简化,比如,SPECweb96基准定义了一个工作负载,它仅指定访问文件的概率,例如“文件少于1KB,占所有请求的35%”,则该恢复的工作负载无法捕获用户行为的差异,从而导致负载测试丧失代表性;若工作负载逐步重放确切的现场工作负载,虽然能够重现确切的用户行为,但该方法维持工作负载的成本非常高,这是因为,软件系统用户数量众多,重放确切的工作负载需要负载测试来模拟每个用户大量的上下文信息,还需要为每个特定的事件序列开发模拟代码,同时,由于几乎不可能两次观察完全相同的工作负载,因此需要对于工作负载不断的更新。The goal of load testing is to ensure that the software system performs well under actual workloads. In order to achieve this goal, the workload needs to be restored first, and then the load test is designed according to the restored workload. Recovering load workloads is a challenging task because of the need to strike a balance between the level of granularity of the workload and the cost of using such workloads for load testing. If the restored workload is too rough, that is, the workload is too simplified. For example, the SPECweb96 benchmark defines a workload that only specifies the probability of accessing the file, such as "the file is less than 1KB, accounting for 35% of all requests", then the recovery The workload cannot capture the difference in user behavior, which leads to the loss of representativeness of the load test; if the workload gradually replays the exact field workload, although the exact user behavior can be reproduced, the cost of maintaining the workload is very high. This is because the software system has a large number of users. Replaying the exact workload requires load testing to simulate a large amount of contextual information for each user. It is also necessary to develop simulation code for each specific sequence of events. At the same time, it is almost impossible to twice Observe the exact same workload, so you need to constantly update the workload.
为了达到工作负载的理想粒度级别,现有技术通常基于少量群集的代表性用户行为来设计工作负载,而在聚集用户时,通常是基于不同用户操作的频率来实现。但是,由于大型软件系统中用户存在高度可变性,仅仅考虑事件的发生频率不免过于粗糙。相反, 用户操作的顺序和上下文可以使工作负载更具代表性,比如,一个用户重复从文件中读取小块数据,然后将每个小块写回文件,而另一个用户以交互方式读取并将大量小块数据写入文件。如果仅考虑读取和写入等操作的频率,则无法区分这两个用户的工作负载,但若添加有关这些用户操作的更多详细信息又将会导致恢复、执行和维护成本过高。In order to achieve the ideal granularity level of the workload, the prior art usually designs the workload based on the representative user behaviors of a small number of clusters, and when aggregating users, it is usually implemented based on the frequency of operations of different users. However, due to the high variability of users in large-scale software systems, it is too rough to only consider the frequency of events. On the contrary, the sequence and context of user operations can make the workload more representative. For example, one user repeatedly reads small blocks of data from a file, and then writes each small block back to the file, while another user reads it interactively. And write a large amount of small pieces of data to the file. If you only consider the frequency of operations such as reads and writes, you cannot distinguish the workloads of these two users, but adding more detailed information about these user operations will result in high recovery, execution, and maintenance costs.
发明内容Summary of the invention
本发明实施例提供一种负载测试方法、装置、电子设备及计算机可读存储介质。The embodiments of the present invention provide a load testing method, device, electronic equipment, and computer-readable storage medium.
第一方面,本发明实施例中提供了一种负载测试方法。In the first aspect, an embodiment of the present invention provides a load test method.
具体的,所述负载测试方法,包括:Specifically, the load test method includes:
获取负载数据,并基于所述负载数据确定操作方操作数据序列;Acquiring load data, and determining an operator operation data sequence based on the load data;
对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组;Perform clustering on the operator operation data sequence to obtain one or more operator clusters;
确定所述操作方类组中满足预设条件的目标操作方,并根据所述目标操作方的数据序列生成负载测试数据进行负载测试。Determine the target operator in the operator class group that meets the preset condition, and generate load test data according to the data sequence of the target operator to perform the load test.
结合第一方面,本发明实施例在第一方面的第一种实现方式中,所述负载数据为负载日志数据或者模拟负载数据或者实时负载数据。With reference to the first aspect, in the first implementation manner of the first aspect of the embodiment of the present invention, the load data is load log data or simulated load data or real-time load data.
结合第一方面和第一方面的第一种实现方式,本发明实施例在第一方面的第二种实现方式中,所述获取负载数据,并基于所述负载数据确定操作方操作数据序列,包括:Combining the first aspect and the first implementation manner of the first aspect, in the second implementation manner of the first aspect of the embodiment of the present invention, the acquiring load data, and determining the operator operation data sequence based on the load data, include:
获取负载日志数据;Obtain load log data;
确定所述负载日志数据中的操作方标识信息;Determine the operator identification information in the load log data;
基于所述操作方标识信息获取与所述操作方标识信息对应的负载日志数据,得到与所述操作方标识信息对应的操作方操作数据序列。The load log data corresponding to the operator identification information is acquired based on the operator identification information, and the operator operation data sequence corresponding to the operator identification information is obtained.
结合第一方面、第一方面的第一种实现方式和第一方面的第二种实现方式,本公开在第一方面的第三种实现方式中,所述对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组,包括:Combining the first aspect, the first implementation manner of the first aspect and the second implementation manner of the first aspect, in the third implementation manner of the first aspect of the present disclosure, the operation data sequence of the operator is performed Clustering to obtain one or more operator groups, including:
计算所述操作方操作数据序列的距离矩阵;Calculating the distance matrix of the operation data sequence of the operator;
根据所述距离矩阵对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组。Perform clustering on the operator operation data sequence according to the distance matrix to obtain one or more operator class groups.
结合第一方面的第一种实现方式、第一方面的第二种实现方式和第一方面的第三种实现方式,本公开在第一方面的第四种实现方式中,所述计算所述操作方操作数据序列的距离矩阵,包括:Combining the first implementation manner of the first aspect, the second implementation manner of the first aspect, and the third implementation manner of the first aspect, in the fourth implementation manner of the first aspect of the present disclosure, the calculation The distance matrix of the operator's operation data sequence, including:
基于所述操作方操作数据序列构建操作方操作数据序列矩阵;Constructing an operator's operation data sequence matrix based on the operator's operation data sequence;
基于所述操作方操作数据序列生成操作方操作数据序列相似度矩阵;Generating a similarity matrix of the operator's operation data sequence based on the operator's operation data sequence;
将所述操作方操作数据序列矩阵与所述操作方操作数据序列相似度矩阵相乘,得到所述操作方的距离矩阵。The operator's operation data sequence matrix and the operator's operation data sequence similarity matrix are multiplied to obtain the distance matrix of the operator.
结合第一方面、第一方面的第一种实现方式、第一方面的第二种实现方式、第一方面的第三种实现方式和第一方面的第四种实现方式,本公开在第一方面的第五种实现方式中,所述基于所述操作方操作数据序列生成操作方操作数据序列相似度矩阵,被实施为:Combining the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, and the fourth implementation manner of the first aspect, the present disclosure is described in the first aspect. In a fifth implementation manner of the aspect, the generation of the operator operation data sequence similarity matrix based on the operator operation data sequence is implemented as:
确定所述操作方操作数据序列中的频繁序列;Determine frequent sequences in the operator's operation data sequence;
计算所述频繁序列之间的编辑距离;Calculating the edit distance between the frequent sequences;
根据所述频繁序列之间的编辑距离确定所述操作方操作数据序列之间的相似度,并生成所述操作方操作数据序列相似度矩阵。The similarity between the operator's operation data sequences is determined according to the edit distance between the frequent sequences, and the operator's operation data sequence similarity matrix is generated.
结合第一方面、第一方面的第一种实现方式、第一方面的第二种实现方式、第一方面的第三种实现方式、第一方面的第四种实现方式和第一方面的第五种实现方式,本公开在第一方面的第六种实现方式中,所述确定所述操作方类组中满足预设条件的目标操作方,并根据所述目标操作方的数据序列生成负载测试数据进行负载测试,包括:Combining the first aspect, the first implementation of the first aspect, the second implementation of the first aspect, the third implementation of the first aspect, the fourth implementation of the first aspect, and the first implementation of the first aspect There are five implementation manners. In the sixth implementation manner of the first aspect of the present disclosure, the target operator in the operator class group that meets a preset condition is determined, and the load is generated according to the data sequence of the target operator Test data for load testing, including:
确定所述操作方类组中满足预设条件的目标操作方;Determine a target operator in the operator group that meets a preset condition;
获取所述目标操作方的频繁序列及其出现频率;Acquiring the frequent sequence and frequency of occurrence of the target operator;
根据所述频繁序列的出现频率计算所述频繁序列的出现概率;Calculating the occurrence probability of the frequent sequence according to the occurrence frequency of the frequent sequence;
根据所述频繁序列的出现概率生成负载测试数据;Generating load test data according to the occurrence probability of the frequent sequence;
回放运行所述负载测试数据进行负载测试。Play back and run the load test data to perform a load test.
第二方面,本发明实施例中提供了一种负载聚类方法。In the second aspect, an embodiment of the present invention provides a load clustering method.
具体的,所述负载聚类方法,包括:Specifically, the load clustering method includes:
获取负载数据,并基于所述负载数据确定操作方操作数据序列;Acquiring load data, and determining an operator operation data sequence based on the load data;
对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组。Perform clustering on the operator operation data sequence to obtain one or more operator clusters.
第三方面,本发明实施例中提供了一种负载测试装置。In the third aspect, an embodiment of the present invention provides a load testing device.
具体的,所述负载测试装置,包括:Specifically, the load test device includes:
确定模块,被配置为获取负载数据,并基于所述负载数据确定操作方操作数据序列;A determining module, configured to obtain load data, and determine an operator operation data sequence based on the load data;
第一聚类模块,被配置为对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组;The first clustering module is configured to perform clustering on the operator operation data sequence to obtain one or more operator clusters;
生成模块,被配置为确定所述操作方类组中满足预设条件的目标操作方,并根据所述目标操作方的数据序列生成负载测试数据进行负载测试。The generating module is configured to determine a target operator in the operator group that meets a preset condition, and generate load test data according to the data sequence of the target operator to perform the load test.
结合第三方面,本发明实施例在第三方面的第一种实现方式中,所述负载数据为负载日志数据。With reference to the third aspect, in the first implementation manner of the third aspect in an embodiment of the present invention, the load data is load log data.
结合第三方面和第三方面的第一种实现方式,本发明实施例在第三方面的第二种实现方式中,所述确定模块包括:With reference to the third aspect and the first implementation manner of the third aspect, in the second implementation manner of the third aspect of the embodiment of the present invention, the determining module includes:
第一获取子模块,被配置为获取负载日志数据;The first obtaining submodule is configured to obtain load log data;
第一确定子模块,被配置为确定所述负载日志数据中的操作方标识信息;The first determining submodule is configured to determine the operator identification information in the load log data;
第二获取子模块,被配置为基于所述操作方标识信息获取与所述操作方标识信息对应的负载日志数据,得到与所述操作方标识信息对应的操作方操作数据序列。The second acquisition submodule is configured to acquire load log data corresponding to the operator identification information based on the operator identification information, and obtain an operator operation data sequence corresponding to the operator identification information.
结合第三方面、第三方面的第一种实现方式和第三方面的第二种实现方式,本公开在第三方面的第三种实现方式中,所述第一聚类模块包括:With reference to the third aspect, the first implementation manner of the third aspect, and the second implementation manner of the third aspect, in the third implementation manner of the third aspect of the present disclosure, the first clustering module includes:
第一计算子模块,被配置为计算所述操作方操作数据序列的距离矩阵;The first calculation sub-module is configured to calculate the distance matrix of the operation data sequence of the operator;
聚类子模块,被配置为根据所述距离矩阵对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组。The clustering sub-module is configured to perform clustering on the operator operation data sequence according to the distance matrix to obtain one or more operator clusters.
结合第三方面、第三方面的第一种实现方式、第三方面的第二种实现方式和第三方面的第三种实现方式,本公开在第三方面的第四种实现方式中,所述第一计算子模块包括:In combination with the third aspect, the first implementation manner of the third aspect, the second implementation manner of the third aspect, and the third implementation manner of the third aspect, in the fourth implementation manner of the third aspect, the present disclosure The first calculation sub-module includes:
构建子模块,被配置为基于所述操作方操作数据序列构建操作方操作数据序列矩阵;A construction sub-module configured to construct an operator operation data sequence matrix based on the operator operation data sequence;
第一生成子模块,被配置为基于所述操作方操作数据序列生成操作方操作数据序列相似度矩阵;The first generating sub-module is configured to generate an operator operation data sequence similarity matrix based on the operator operation data sequence;
相乘子模块,被配置为将所述操作方操作数据序列矩阵与所述操作方操作数据序列相似度矩阵相乘,得到所述操作方的距离矩阵。The multiplication sub-module is configured to multiply the operator's operation data sequence matrix and the operator's operation data sequence similarity matrix to obtain the distance matrix of the operator.
结合第三方面、第三方面的第一种实现方式、第三方面的第二种实现方式、第三方面的第三种实现方式和第三方面的第四种实现方式,本公开在第三方面的第五种实现方式中,所述第一生成子模块被配置为:Combining the third aspect, the first implementation manner of the third aspect, the second implementation manner of the third aspect, the third implementation manner of the third aspect, and the fourth implementation manner of the third aspect, the present disclosure In a fifth implementation manner of the aspect, the first generation submodule is configured to:
第二确定子模块,被配置为确定所述操作方操作数据序列中的频繁序列;The second determining submodule is configured to determine frequent sequences in the operator's operation data sequence;
第二计算子模块,被配置为计算所述频繁序列之间的编辑距离;The second calculation sub-module is configured to calculate the edit distance between the frequent sequences;
第二生成子模块,被配置为根据所述频繁序列之间的编辑距离确定所述操作方操作数据序列之间的相似度,并生成所述操作方操作数据序列相似度矩阵。The second generation sub-module is configured to determine the similarity between the operator operation data sequences according to the edit distance between the frequent sequences, and generate the operator operation data sequence similarity matrix.
结合第三方面、第三方面的第一种实现方式、第三方面的第二种实现方式、第三方面的第三种实现方式、第三方面的第四种实现方式和第三方面的第五种实现方式,本公开在第三方面的第六种实现方式中,所述生成模块包括:Combining the third aspect, the first implementation manner of the third aspect, the second implementation manner of the third aspect, the third implementation manner of the third aspect, the fourth implementation manner of the third aspect, and the third aspect There are five implementation manners. In the sixth implementation manner of the third aspect of the present disclosure, the generating module includes:
第三确定子模块,被配置为确定所述操作方类组中满足预设条件的目标操作方;The third determining submodule is configured to determine a target operator in the operator class group that meets a preset condition;
第三获取子模块,被配置为获取所述目标操作方的频繁序列及其出现频率;The third obtaining sub-module is configured to obtain the frequent sequence of the target operator and its appearance frequency;
第三计算子模块,被配置为根据所述频繁序列的出现频率计算所述频繁序列的出现概率;The third calculation sub-module is configured to calculate the occurrence probability of the frequent sequence according to the occurrence frequency of the frequent sequence;
第三生成子模块,被配置为根据所述频繁序列的出现概率生成负载测试数据;The third generation sub-module is configured to generate load test data according to the occurrence probability of the frequent sequence;
测试子模块,被配置为回放运行所述负载测试数据进行负载测试。The test sub-module is configured to replay and run the load test data for load test.
第四方面,本发明实施例中提供了一种负载测试装置。In the fourth aspect, an embodiment of the present invention provides a load testing device.
具体的,所述负载测试装置,包括:Specifically, the load test device includes:
获取模块,被配置为获取负载数据,并基于所述负载数据确定操作方操作数据序列;An obtaining module configured to obtain load data, and determine an operator operation data sequence based on the load data;
第二聚类模块,被配置为对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组。The second clustering module is configured to cluster the operator operation data sequence to obtain one or more operator clusters.
第五方面,本发明实施例提供了一种电子设备,包括存储器和处理器,所述存储器用于存储一条或多条支持负载测试装置/负载聚类装置执行上述负载测试方法/负载聚类方法的计算机指令,所述处理器被配置为用于执行所述存储器中存储的计算机指令。所述负载测试装置/负载聚类装置还可以包括通信接口,用于负载测试装置/负载聚类装置与其他设备或通信网络通信。In a fifth aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, the memory is used to store one or more load testing devices/load clustering devices that support the execution of the load testing method/load clustering method described above The processor is configured to execute the computer instructions stored in the memory. The load testing device/load clustering device may further include a communication interface for the load testing device/load clustering device to communicate with other equipment or a communication network.
第六方面,本发明实施例提供了一种计算机可读存储介质,用于存储负载测试装置/负载聚类装置所用的计算机指令,其包含用于执行上述负载测试方法/负载聚类方法为负载测试装置/负载聚类装置所涉及的计算机指令。In a sixth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer instructions used by the load test device/load clustering device, which includes the load test method/load clustering method used for executing the load test method/load clustering method. Computer instructions related to the test device/load clustering device.
本发明实施例提供的技术方案可包括以下有益效果:The technical solutions provided by the embodiments of the present invention may include the following beneficial effects:
上述技术方案基于负载数据来获取操作方操作数据序列,并通过对于操作方操作数据序列进行聚类,得到一个或多个操作方类组,根据操作方类组中满足预设条件的目标操作方的数据序列生成负载测试数据进行负载测试。该技术方案使用操作方事件和上下文信息组合的序列来帮助恢复负载工作量,从而实现在不同操作方行为粒度级别上的工作负载恢复,另外通过聚类方法来挖掘具有代表性的操作方,从而借助少量用户就可以实现工作负载的生成。The above technical solution obtains the operator operation data sequence based on the load data, and obtains one or more operator groups by clustering the operator operation data sequences. According to the target operator in the operator group group that meets the preset conditions The data sequence generates load test data for load test. This technical solution uses a sequence of operator events and context information to help restore the workload, so as to achieve workload recovery at different levels of operator behavior granularity. In addition, clustering methods are used to mine representative operators, thereby With a small number of users, workload generation can be achieved.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能 限制本发明实施例。It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and cannot limit the embodiments of the present invention.
附图说明Description of the drawings
结合附图,通过以下非限制性实施方式的详细描述,本发明实施例的其它特征、目的和优点将变得更加明显。在附图中:With reference to the accompanying drawings, through the following detailed description of the non-limiting implementation manners, other features, objectives, and advantages of the embodiments of the present invention will become more apparent. In the attached picture:
图1示出根据本发明一实施方式的负载测试方法的流程图;Fig. 1 shows a flowchart of a load test method according to an embodiment of the present invention;
图2示出根据图1所示实施方式的负载测试方法的步骤S101的流程图;FIG. 2 shows a flowchart of step S101 of the load testing method according to the embodiment shown in FIG. 1;
图3示出根据图1所示实施方式的负载测试方法的步骤S102的流程图;FIG. 3 shows a flowchart of step S102 of the load testing method according to the embodiment shown in FIG. 1;
图4示出根据图3所示实施方式的负载测试方法的步骤S301的流程图;FIG. 4 shows a flowchart of step S301 of the load test method according to the embodiment shown in FIG. 3;
图5示出根据图1所示实施方式的负载测试方法的步骤S103的流程图;FIG. 5 shows a flowchart of step S103 of the load testing method according to the embodiment shown in FIG. 1;
图6示出根据本发明一实施方式的负载聚类方法的流程图;Fig. 6 shows a flowchart of a load clustering method according to an embodiment of the present invention;
图7示出根据本发明一实施方式的负载测试装置的结构框图;FIG. 7 shows a structural block diagram of a load testing device according to an embodiment of the present invention;
图8示出根据图7所示实施方式的负载测试装置的确定模块701的结构框图;FIG. 8 shows a structural block diagram of the determination module 701 of the load test device according to the embodiment shown in FIG. 7;
图9示出根据图7所示实施方式的负载测试装置的第一聚类模块702的结构框图;FIG. 9 shows a structural block diagram of the first clustering module 702 of the load testing device according to the embodiment shown in FIG. 7;
图10示出根据图9所示实施方式的负载测试装置的第一计算子模块901的结构框图;FIG. 10 shows a structural block diagram of the first calculation sub-module 901 of the load testing device according to the embodiment shown in FIG. 9;
图11示出根据图7所示实施方式的负载测试装置的生成模块703的结构框图;FIG. 11 shows a structural block diagram of the generation module 703 of the load test device according to the embodiment shown in FIG. 7;
图12示出根据本发明一实施方式的负载聚类装置的结构框图;FIG. 12 shows a structural block diagram of a load clustering device according to an embodiment of the present invention;
图13示出根据本发明一实施方式的应用场景示意图;FIG. 13 shows a schematic diagram of an application scenario according to an embodiment of the present invention;
图14示出根据本发明一实施方式的电子设备的结构框图;FIG. 14 shows a structural block diagram of an electronic device according to an embodiment of the present invention;
图15是适于用来实现根据本发明一实施方式的负载测试方法的计算机系统的结构示意图。FIG. 15 is a schematic structural diagram of a computer system suitable for implementing a load test method according to an embodiment of the present invention.
具体实施方式detailed description
下文中,将参考附图详细描述本发明实施例的示例性实施方式,以使本领域技术人员可容易地实现它们。此外,为了清楚起见,在附图中省略了与描述示例性实施方式无关的部分。Hereinafter, exemplary implementations of the embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. In addition, for the sake of clarity, parts that are not related to the description of the exemplary embodiments are omitted in the drawings.
在本发明实施例中,应理解,诸如“包括”或“具有”等的术语旨在指示本说明书中所公开的特征、数字、步骤、行为、部件、部分或其组合的存在,并且不欲排除一个或多个其他特征、数字、步骤、行为、部件、部分或其组合存在或被添加的可能性。In the embodiments of the present invention, it should be understood that terms such as "including" or "having" are intended to indicate the existence of the features, numbers, steps, behaviors, components, parts, or combinations thereof disclosed in this specification, and are not intended to The possibility that one or more other features, numbers, steps, behaviors, components, parts or combinations thereof exist or be added is excluded.
另外还需要说明的是,在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明实施例。In addition, it should be noted that the embodiments of the present invention and the features in the embodiments can be combined with each other if there is no conflict. The embodiments of the present invention will be described in detail below with reference to the drawings and in conjunction with the embodiments.
本发明实施例提供的技术方案基于负载数据来获取操作方操作数据序列,并通过对于操作方操作数据序列进行聚类,得到一个或多个操作方类组,根据操作方类组中满足预设条件的目标操作方的数据序列生成负载测试数据进行负载测试。该技术方案使用操作方事件和上下文信息组合的序列来帮助恢复负载工作量,从而实现在不同操作方行为粒度级别上的工作负载恢复,另外通过聚类方法来挖掘具有代表性的操作方,从而借助少量用户就可以实现工作负载的生成。The technical solution provided by the embodiment of the present invention obtains the operator operation data sequence based on load data, and obtains one or more operator class groups by clustering the operator operation data sequence, and meets the preset requirements in the operator class group The data sequence of the target operator of the condition generates load test data for load test. This technical solution uses a sequence of operator events and context information to help restore the workload, so as to achieve workload recovery at different levels of operator behavior granularity. In addition, clustering methods are used to mine representative operators, thereby With a small number of users, workload generation can be achieved.
图1示出根据本发明一实施方式的负载测试方法的流程图,如图1所示,所述负载测试方法包括以下步骤S101-S103:Fig. 1 shows a flow chart of a load test method according to an embodiment of the present invention. As shown in Fig. 1, the load test method includes the following steps S101-S103:
在步骤S101中,获取负载数据,并基于所述负载数据确定操作方操作数据序列;In step S101, obtain load data, and determine an operator operation data sequence based on the load data;
在步骤S102中,对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组;In step S102, perform clustering on the operator operation data sequence to obtain one or more operator clusters;
在步骤S103中,确定所述操作方类组中满足预设条件的目标操作方,并根据所述目标操作方的数据序列生成负载测试数据进行负载测试。In step S103, a target operator in the operator group that meets a preset condition is determined, and load test data is generated according to the data sequence of the target operator to perform the load test.
上文提及,随着数据技术和互联网技术的发展,越来越多的服务提供商通过软件系统为用户提供服务,为了保障软件系统的稳定运行,在实践中通常使用负载测试来确保软件系统在负载下的运行质量。负载测试的目标是确保软件系统在现实工作负载下表现良好,为了达到这个目标,首先需要恢复工作负载,然后根据恢复的工作负载设计负载测试。恢复负载工作量是一项挑战性的任务,因为需要在工作负载的粒度级别与使用此类工作负载进行负载测试的成本之间取得平衡。若恢复的工作负载太粗糙,即工作负载过于简化,则该恢复的工作负载无法捕获用户行为的差异,从而导致负载测试丧失代表性;若工作负载逐步重放确切的现场工作负载,虽然能够重现确切的用户行为,但该方法维持工作负载的成本非常高。现有技术通常基于少量群集的代表性用户行为来设计工作负载,而在聚集用户时,通常是基于不同用户操作的频率来实现。但是,由于大型软件系统中用户存在高度可变性,仅仅考虑事件的发生频率不免过于粗糙,若添加有关用户操作的更多详细信息又将会导致恢复、执行和维护成本过高。As mentioned above, with the development of data technology and Internet technology, more and more service providers provide services to users through software systems. In order to ensure the stable operation of software systems, load testing is usually used to ensure software systems in practice. The running quality under load. The goal of load testing is to ensure that the software system performs well under actual workloads. In order to achieve this goal, the workload needs to be restored first, and then the load test is designed according to the restored workload. Recovering load workloads is a challenging task because of the need to strike a balance between the level of granularity of the workload and the cost of using such workloads for load testing. If the restored workload is too rough, that is, the workload is too simplified, the restored workload cannot capture the difference in user behavior, resulting in the loss of representativeness of the load test; if the workload gradually replays the exact field workload, it can be reproduced. Exact user behavior, but the cost of maintaining the workload is very high. In the prior art, workloads are usually designed based on representative user behaviors of a small number of clusters, and when users are aggregated, they are usually implemented based on the frequency of operations of different users. However, due to the high variability of users in large-scale software systems, it is too rough to only consider the frequency of events. Adding more detailed information about user operations will lead to excessive recovery, execution, and maintenance costs.
考虑到上述问题,在该实施方式中,提出一种负载测试方法,该方法基于负载数据来获取操作方操作数据序列,并通过对于操作方操作数据序列进行聚类,得到一个或多个操作方类组,根据操作方类组中满足预设条件的目标操作方的数据序列生成负载测试 数据进行负载测试。该技术方案使用操作方事件和上下文信息组合的序列来帮助恢复负载工作量,从而实现在不同操作方行为粒度级别上的工作负载恢复,另外通过聚类方法来挖掘具有代表性的操作方,从而借助少量用户就可以实现工作负载的生成。In consideration of the above problems, in this embodiment, a load test method is proposed. The method obtains operator operation data sequence based on load data, and obtains one or more operators by clustering the operator operation data sequence. The class group generates load test data for load test according to the data sequence of the target operator that meets the preset conditions in the operator class group. This technical solution uses a sequence of operator events and context information to help restore the workload, so as to achieve workload recovery at different levels of operator behavior granularity. In addition, clustering methods are used to mine representative operators, thereby With a small number of users, workload generation can be achieved.
在本发明一实施方式中,所述负载数据指的是在预设时间段内基于操作方的操作产生或生成的负载数据。其中,所述操作方指的是负载操作用户、负载操作机器或者负载操作资源等操作方。其中,所述操作方的操作比如可以为搜索操作、删除操作、新增操作、编辑操作等操作。In an embodiment of the present invention, the load data refers to load data generated or generated based on the operation of the operator within a preset time period. Wherein, the operator refers to an operator such as a load operation user, a load operation machine, or a load operation resource. Wherein, the operation of the operator may be, for example, a search operation, a deletion operation, a new operation, an editing operation, and the like.
在本发明一实施方式中,所述负载数据可包括以下数据中的一种或多种:操作方标识信息和与所述操作方标识信息对应的负载数据,包括负载工作量、负载内容、负载处理结果等等。其中,所述操作方标识信息用于对于操作方进行唯一性标识。In an embodiment of the present invention, the load data may include one or more of the following data: operator identification information and load data corresponding to the operator identification information, including load workload, load content, load Processing results and so on. Wherein, the operator identification information is used to uniquely identify the operator.
在本发明一实施方式中,所述操作方操作数据序列指的是由某一特定操作方产生的负载数据组成的数据序列,所述操作方操作数据序列能够体现操作方操作事件特征,能够体现与操作方操作事件存在某种关联的上下文信息,其中,所述操作方操作事件特征比如可以包括以下特征中的一种或多种:操作方操作事件目的、操作方操作事件内容、操作方操作事件效果等等,其中,所述上下文信息比如可以包括以下信息中的一种或多种:与所述操作方相关的其他操作方的操作事件信息、与所述操作方操作事件存在时间先后顺序的该操作方的其他操作事件信息等等。基于所述数据序列中丰富的信息,可根据实际应用的需要获取具有不同操作方行为粒度级别的数据信息。其中,所述操作方操作数据序列可按照某一预设规律排列,比如,按照时间先后的顺序,按照某一字段出现的顺序或者某一字段出现频率的顺序等等。比如,按照某一字段出现的顺序,所述操作方操作数据序列可以为搜索删除新增序列、搜索编辑序列、新增编辑序列等等。In an embodiment of the present invention, the operator's operation data sequence refers to a data sequence composed of load data generated by a specific operator, and the operator's operation data sequence can reflect the characteristics of the operator's operation event and can reflect There is some contextual information associated with the operation event of the operator, where the characteristics of the operation event of the operator may include, for example, one or more of the following characteristics: the purpose of the operation event of the operator, the content of the operation event of the operator, and the operation of the operator Event effects, etc., where the context information may include, for example, one or more of the following information: operation event information of other operators related to the operator, and time sequence of the existence of the operation event with the operator Other operation event information of the operator and so on. Based on the abundant information in the data sequence, data information with different levels of operator behavior granularity can be obtained according to actual application requirements. Wherein, the operator's operation data sequence may be arranged according to a certain preset rule, for example, in chronological order, in the order of appearance of a certain field, or the order of appearance of a certain field, and so on. For example, according to the order in which a certain field appears, the operator can manipulate the data sequence for searching and deleting a new sequence, searching for an editing sequence, adding an editing sequence, and so on.
考虑到日志数据是一种能够详细记录事件过程和经历的数据,具有完备性和完整性,从日志数据中能够更完整、更准确地获取操作方负载数据,因此,在本发明一实施方式中,所述负载数据为负载日志数据。当然,所述负载数据也可以为模拟负载数据或者实时负载数据。Considering that the log data is a kind of data that can record the event process and experience in detail, it is complete and complete, and the operator load data can be obtained more completely and accurately from the log data. Therefore, in an embodiment of the present invention , The load data is load log data. Of course, the load data may also be simulated load data or real-time load data.
在本发明一实施方式中,如图2所示,所述步骤S101,即获取负载数据,并基于所述负载数据确定操作方操作数据序列的步骤,包括以下步骤S201-S203:In an embodiment of the present invention, as shown in FIG. 2, the step S101, which is the step of obtaining load data, and determining the operator's operation data sequence based on the load data, includes the following steps S201-S203:
在步骤S201中,获取负载日志数据;In step S201, obtain load log data;
在步骤S202中,确定所述负载日志数据中的操作方标识信息;In step S202, determine the operator identification information in the load log data;
在步骤S203中,基于所述操作方标识信息获取与所述操作方标识信息对应的负载日 志数据,得到与所述操作方标识信息对应的操作方操作数据序列。In step S203, the load log data corresponding to the operator identification information is obtained based on the operator identification information, and the operator operation data sequence corresponding to the operator identification information is obtained.
为了获取操作方操作数据序列,进而得到操作方的行为信息,在该实施方式中,从所述负载数据中分析并挖掘出操作方操作数据序列。具体地,首先获取负载日志数据;然后确定所述负载日志数据中出现的操作方标识信息;最后基于所述操作方标识信息获取与所述操作方标识信息对应的负载日志数据,将与所述操作方标识信息对应的负载日志数据组合起来,即可得到与所述操作方标识信息对应的操作方操作数据序列。In order to obtain the operator's operation data sequence, and then obtain the operator's behavior information, in this embodiment, the operator's operation data sequence is analyzed and mined from the load data. Specifically, first obtain load log data; then determine the operator identification information appearing in the load log data; finally obtain the load log data corresponding to the operator identification information based on the operator identification information, and compare it with the The load log data corresponding to the operator identification information is combined to obtain the operator operation data sequence corresponding to the operator identification information.
在本发明一实施方式中,在组合与所述操作方标识信息对应的负载日志数据时,既可按照时间先后的顺序进行组合,也可按照某一字段出现的顺序进行组合,再或者按照某一字段出现频率的顺序进行组合。本领域技术人员可根据实际应用的需要以及被组合数据的特点选择合适的数据组合方式,本公开对其不作具体限定。In an embodiment of the present invention, when combining the load log data corresponding to the operator identification information, the combination can be done in chronological order, or in the order in which a certain field appears, or according to a certain field. Combine the fields in the order of frequency. Those skilled in the art can select an appropriate data combination method according to the needs of the actual application and the characteristics of the combined data, which is not specifically limited in the present disclosure.
在本发明一实施方式中,如图3所示,所述步骤S102,即对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组的步骤,包括以下步骤S301-S302:In an embodiment of the present invention, as shown in FIG. 3, the step S102, which is the step of clustering the operator operation data sequence to obtain one or more operator class groups, includes the following steps S301-S302 :
在步骤S301中,计算所述操作方操作数据序列的距离矩阵;In step S301, the distance matrix of the operation data sequence of the operator is calculated;
在步骤S302中,根据所述距离矩阵对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组。In step S302, cluster the operator operation data sequence according to the distance matrix to obtain one or more operator class groups.
为了在保证负载测试数据正确性的前提下,使用尽可能少的操作方操作数据生成负载测试数据,减少负载测试运算量,提高负载测试工作效率,在该实施方式中,通过对于操作方进行聚类的方法以便于后续基于聚类类组获取具有代表性的操作方及其对应的负载测试数据,最终得到进行负载测试的负载测试数据。其中,聚类指的是将物理或抽象对象的集合分成由类似的对象组成的多个类的过程,也即由聚类操作所生成的类组是一组数据对象的集合,这些对象与同一个类组中的对象彼此相似,与其他类组中的对象彼此相异。In order to ensure the correctness of the load test data, use as little operator operation data as possible to generate load test data, reduce the amount of load test calculations, and improve the efficiency of load test work. In this embodiment, the operators are gathered together. The class method facilitates the subsequent acquisition of representative operators and their corresponding load test data based on the clustering class group, and finally obtains the load test data for the load test. Among them, clustering refers to the process of dividing a collection of physical or abstract objects into multiple classes composed of similar objects, that is, the cluster generated by the clustering operation is a collection of a set of data objects. Objects in one cluster are similar to each other, but different from objects in other clusters.
在本发明一实施方式中,使用层次聚类的方法,基于皮尔孙距离对于所述操作方操作数据序列进行聚类,最终聚类的结果可通过树状图展示出来,在聚类过程中,使用Calinski-Harabasz停止规则来切割树状图并确定最终的类组数目。具体地,采用以下步骤实现聚类:首先计算所述操作方操作数据序列的距离矩阵;然后根据所述距离矩阵对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组,位于同一操作方类组中的操作方可认为操作行为比较类似,进而可获取每一操作方类组中具有代表性的操作方来代表该操作方类组的行为。In one embodiment of the present invention, a hierarchical clustering method is used to cluster the operator operation data sequence based on the Pearson distance, and the final clustering result can be displayed in a tree diagram. In the clustering process, Use the Calinski-Harabasz stopping rule to cut the dendrogram and determine the final number of clusters. Specifically, the following steps are adopted to implement clustering: firstly, the distance matrix of the operator's operation data sequence is calculated; then the operator's operation data sequence is clustered according to the distance matrix to obtain one or more operator class groups , The operators in the same operator group can be considered that their operation behaviors are relatively similar, and then a representative operator in each operator group can be obtained to represent the behavior of the operator group.
在本发明一实施方式中,在聚类之前,还可对于聚类方法所使用的参数进行初始化。In an embodiment of the present invention, before clustering, the parameters used by the clustering method can also be initialized.
当然,上述层次聚类的方法仅为示例性说明,在实际应用中,也可以使用其他聚类方法,比如,基于划分的聚类算法、基于密度的聚类算法、以及基于分布的聚类算法等等。在聚类停止判断时,除了上文示例的Calinski-Harabasz停止规则,还可以使用其他聚类停止判断方法,比如Silhouette轮廓系数停止规则,Davies-Bouldin停止规则等等。本领域技术人员可根据实际应用的需要以及待聚类对象的特点选择合适的聚类方法及聚类停止判断规则,本公开对其不作特别限定。Of course, the above-mentioned hierarchical clustering method is only illustrative. In practical applications, other clustering methods can also be used, such as partition-based clustering algorithms, density-based clustering algorithms, and distribution-based clustering algorithms. and many more. In the cluster stop judgment, in addition to the Calinski-Harabasz stop rule in the example above, other cluster stop judgment methods can also be used, such as the Silhouette contour coefficient stop rule, the Davies-Bouldin stop rule and so on. Those skilled in the art can select an appropriate clustering method and cluster stop judgment rule according to actual application requirements and the characteristics of the objects to be clustered, which are not particularly limited in the present disclosure.
在本发明一实施方式中,如图4所示,所述步骤S301,即计算所述操作方操作数据序列的距离矩阵的步骤,包括以下步骤S401-S403:In an embodiment of the present invention, as shown in FIG. 4, the step S301, that is, the step of calculating the distance matrix of the operator's operation data sequence, includes the following steps S401-S403:
在步骤S401中,基于所述操作方操作数据序列构建操作方操作数据序列矩阵;In step S401, construct an operator operation data sequence matrix based on the operator operation data sequence;
在步骤S402中,基于所述操作方操作数据序列生成操作方操作数据序列相似度矩阵;In step S402, a similarity matrix of the operator's operation data sequence is generated based on the operator's operation data sequence;
在步骤S403中,将所述操作方操作数据序列矩阵与所述操作方操作数据序列相似度矩阵相乘,得到所述操作方的距离矩阵。In step S403, the operator's operation data sequence matrix and the operator's operation data sequence similarity matrix are multiplied to obtain the distance matrix of the operator.
在该实施方式中,在计算所述操作方操作数据序列的距离矩阵时,首先基于所述操作方操作数据序列构建操作方操作数据序列矩阵,比如,可按照操作方标识信息将对应的操作方操作数据序列组合成操作方操作数据序列矩阵;然后基于所述操作方操作数据序列生成操作方操作数据序列相似度矩阵,其中,所述操作方操作数据序列相似度矩阵用于表征操作方操作数据序列彼此之间的相似度;最后将所述操作方操作数据序列矩阵与所述操作方操作数据序列相似度矩阵相乘,即可得到所述操作方的距离矩阵,使用该方法得到的距离矩阵不仅考虑了操作方与操作方之间的相似度,同时也考虑了所有操作方操作数据序列之间的相似度。In this embodiment, when calculating the distance matrix of the operator's operation data sequence, the operator's operation data sequence matrix is first constructed based on the operator's operation data sequence. For example, the corresponding operator can be assigned according to the operator identification information. The operation data sequence is combined into an operator operation data sequence matrix; then an operator operation data sequence similarity matrix is generated based on the operator operation data sequence, wherein the operator operation data sequence similarity matrix is used to represent the operator operation data The degree of similarity between the sequences; finally, the operator’s operation data sequence matrix is multiplied by the operator’s operation data sequence similarity matrix to obtain the distance matrix of the operator, and the distance matrix obtained by this method It not only considers the similarity between the operator and the operator, but also considers the similarity between the operating data sequences of all operators.
在本发明一实施方式中,所述步骤S402,即基于所述操作方操作数据序列生成操作方操作数据序列相似度矩阵的步骤,可被实施为:In an embodiment of the present invention, the step S402, that is, the step of generating a similarity matrix of the operator's operation data sequence based on the operator's operation data sequence, can be implemented as:
确定所述操作方操作数据序列中的频繁序列;Determine frequent sequences in the operator's operation data sequence;
计算所述频繁序列之间的编辑距离;Calculating the edit distance between the frequent sequences;
根据所述频繁序列之间的编辑距离确定所述操作方操作数据序列之间的相似度,并生成所述操作方操作数据序列相似度矩阵。The similarity between the operator's operation data sequences is determined according to the edit distance between the frequent sequences, and the operator's operation data sequence similarity matrix is generated.
考虑到频繁序列在一定程度上具有一定的代表性,能够代表数据序列集合中数据出现的特点,因此,在该实施方式中,基于频繁序列来生成操作方操作数据序列相似度矩阵。具体地,首先确定所述操作方操作数据序列中的频繁序列,其中,所述频繁序列指 的是出现频率高于预设频率阈值的数据序列,能够在一定程度上代表数据的特点;然后计算所述频繁序列之间的编辑距离,比如,可使用Levenshtein方法来计算所述频繁序列之间的编辑距离,当然也可以使用其他编辑距离计算方法,本公开对于具体的编辑距离计算方法不作具体限定;最后根据所述频繁序列之间的编辑距离计算所述操作方操作数据序列之间的相似度,编辑距离越近的频繁序列,其相似度就越高,基于所述操作方操作数据序列之间的相似度即可生成所述操作方操作数据序列相似度矩阵。Considering that the frequent sequence is representative to a certain extent and can represent the characteristics of the data in the data sequence set, therefore, in this embodiment, the operator operation data sequence similarity matrix is generated based on the frequent sequence. Specifically, first determine the frequent sequence in the operator's operation data sequence, where the frequent sequence refers to a data sequence whose frequency of occurrence is higher than a preset frequency threshold, which can represent the characteristics of the data to a certain extent; and then calculate For the edit distance between the frequent sequences, for example, the Levenshtein method may be used to calculate the edit distance between the frequent sequences, of course, other edit distance calculation methods may also be used, and the present disclosure does not specifically limit the specific edit distance calculation method ; Finally, the similarity between the operating data sequences of the operator is calculated according to the editing distance between the frequent sequences, the closer the editing distance is, the higher the similarity is, based on the operating data sequence of the operator The similarity between the two can generate the similarity matrix of the operation data sequence of the operator.
在上述示例中,采用的是Levenshtein编辑距离相似度计算方法,当然也可以采用其他字符串相似度计算方法,比如余弦相似度计算方法和杰卡德系数相似度计算方法等等,具体可根据实际应用的需要以及数据序列的特点选择合适的相似度计算方法,本公开对其不作具体限定。In the above example, the Levenshtein edit distance similarity calculation method is used. Of course, other string similarity calculation methods can also be used, such as the cosine similarity calculation method and the Jaccard coefficient similarity calculation method, etc., depending on the actual situation. The application needs and the characteristics of the data sequence select an appropriate similarity calculation method, which is not specifically limited in the present disclosure.
在本发明一实施方式中,如图5所示,所述步骤S103,即确定所述操作方类组中满足预设条件的目标操作方,并根据所述目标操作方的数据序列生成负载测试数据进行负载测试的步骤,包括以下步骤S501-S505:In an embodiment of the present invention, as shown in FIG. 5, the step S103 is to determine a target operator in the operator group that meets a preset condition, and generate a load test according to the data sequence of the target operator The steps of data load test include the following steps S501-S505:
在步骤S501中,确定所述操作方类组中满足预设条件的目标操作方;In step S501, determine a target operator in the operator group that meets a preset condition;
在步骤S502中,获取所述目标操作方的频繁序列及其出现频率;In step S502, obtain the frequent sequence of the target operator and its appearance frequency;
在步骤S503中,根据所述频繁序列的出现频率计算所述频繁序列的出现概率;In step S503, the occurrence probability of the frequent sequence is calculated according to the occurrence frequency of the frequent sequence;
在步骤S504中,根据所述频繁序列的出现概率生成负载测试数据;In step S504, load test data is generated according to the occurrence probability of the frequent sequence;
在步骤S505中,回放运行所述负载测试数据进行负载测试。In step S505, the load test data is replayed and run to perform a load test.
上文提及,为了在保证负载测试数据正确性的前提下,使用尽可能少的操作方操作数据生成负载测试数据,减少负载测试运算量,提高负载测试工作效率,本公开对于操作方进行聚类,基于聚类类组获取具有代表性的操作方及其对应的负载测试数据以得到最终进行负载测试的负载测试数据。在该实施方式中,在得到一个或多个操作方类组之后,首先选择所述操作方类组中满足预设条件的目标操作方,其中,所述预设条件指的是预设代表点条件,即选择得到的目标操作方为对应操作方类组中具有代表性的操作方;然后获取所述目标操作方的频繁序列及其出现频率,其中,可根据上文描述方法来获取频繁序列,本公开在此不再赘述,在获取频繁序列的同时可相应获取频繁序列的出现频率;根据所述频繁序列的出现频率计算所述频繁序列的出现概率,比如,将所述目标操作方的频繁序列的出现频率除以所述目标操作方数据序列的总数量即可得到所述目标操作方频繁序列的出现概率;然后根据所述频繁序列的出现概率生成负载测试数据;最后回放运行所述负载测试数据来实现负载测试。As mentioned above, in order to use as little operator operation data as possible to generate load test data under the premise of ensuring the correctness of the load test data, reduce the amount of load test calculations, and improve the efficiency of load test work, the present disclosure gathers operators Class, based on the clustering class group to obtain representative operators and their corresponding load test data to obtain the load test data for the final load test. In this embodiment, after obtaining one or more operator groups, first select a target operator in the operator group that meets a preset condition, where the preset condition refers to a preset representative point The condition is that the selected target operator is a representative operator in the corresponding operator group; then the frequent sequence of the target operator and its appearance frequency are obtained, wherein the frequent sequence can be obtained according to the method described above The present disclosure will not go into details here. While acquiring frequent sequences, the frequency of occurrence of frequent sequences can be acquired accordingly; the occurrence probability of the frequent sequences is calculated according to the frequency of occurrence of the frequent sequences, for example, the target operator’s The frequency of occurrence of the frequent sequence is divided by the total number of the target operator’s data sequence to obtain the occurrence probability of the target operator’s frequent sequence; then the load test data is generated according to the occurrence probability of the frequent sequence; finally the replay operation is performed Load test data to implement load test.
在本发明一实施方式中,可使用中心点算法(Partitioning Around Medoids,PAM)来识别每个操作方类组中的代表操作方,中心点算法是一种基于k-中心点的聚类算法,具有较强的鲁棒性和准确性。当然,也可以借助其他方法来选择代表操作方,本领域技术人员根据实际应用的需要以及操作方类组数据的特点选择合适的代表操作方识别方法,本公开对其不作具体限定。In an embodiment of the present invention, the central point algorithm (Partitioning Around Medoids, PAM) can be used to identify the representative operators in each operator class group. The central point algorithm is a clustering algorithm based on k-center points. Has strong robustness and accuracy. Of course, other methods can also be used to select the representative operator. Those skilled in the art can select an appropriate method for identifying the representative operator according to actual application requirements and characteristics of operator group data, which is not specifically limited in the present disclosure.
在本发明一实施方式中,在得到目标操作方的频繁序列的出现概率之后,由于所述目标操作方在其操作方类组中具有代表性,因此,可使用所述目标操作方的频繁序列及其出现概率来替换相应操作方类组中其他操作方的操作数据序列及其出现概率,进而生成将要实现负载测试工作的负载测试数据。比如,若某一操作方类组的目标操作方的频繁序列为搜索删除新增序列、搜索编辑序列和新增编辑序列,相应的出现概率为50%、25%和25%,则可使用上述频繁序列及其出现概率替换该操作方类组中其他操作方的操作数据序列及其出现概率,假设除了目标操作方还有两个操作方:操作方1和操作方2,那么最终生成的负载测试数据可以为:In an embodiment of the present invention, after the occurrence probability of the frequent sequence of the target operator is obtained, since the target operator is representative in its operator group, the frequent sequence of the target operator can be used And its occurrence probability to replace the operation data sequence and its occurrence probability of other operators in the corresponding operator class group, and then generate the load test data that will realize the load test work. For example, if the frequent sequence of the target operator of a certain operator group is search delete new sequence, search edit sequence and new edit sequence, the corresponding occurrence probability is 50%, 25% and 25%, then the above can be used The frequent sequence and its occurrence probability replace the operation data sequence and its occurrence probability of other operators in the operator class group. Assuming that there are two operators besides the target operator: operator 1 and operator 2, then the final load is generated The test data can be:
目标操作方:搜索删除新增序列,50%;搜索编辑序列,25%;新增编辑序列,25%;Target operator: search and delete new sequence, 50%; search and edit sequence, 25%; add and edit sequence, 25%;
操作方1:搜索删除新增序列,50%;搜索编辑序列,25%;新增编辑序列,25%;Operator 1: Search and delete new sequence, 50%; search and edit sequence, 25%; add and edit sequence, 25%;
操作方2搜索删除新增序列,50%;搜索编辑序列,25%;新增编辑序列,25%。Operator 2 searches and deletes new sequences, 50%; searches and edits sequences, 25%; adds and edits sequences, 25%.
在本发明一实施方式中,可借助回放工具FIO或者JMeter回放运行所述负载测试数据来实现负载测试,在负载测试过程中,回放运行所述负载测试数据之后,记录测试系统的测试性能数据,将其与原始负载运行的性能数据相比较,即可得到负载测试结果。In one embodiment of the present invention, the load test data can be replayed and run with the help of the replay tool FIO or JMeter to realize the load test. In the load test process, after the load test data is replayed and run, the test performance data of the test system is recorded, Compare it with the performance data of the original load operation to get the load test result.
图6示出根据本发明一实施方式的负载聚类方法的流程图,如图6所示,所述负载聚类方法包括以下步骤S601-S602:Fig. 6 shows a flowchart of a load clustering method according to an embodiment of the present invention. As shown in Fig. 6, the load clustering method includes the following steps S601-S602:
在步骤S601中,获取负载数据,并基于所述负载数据确定操作方操作数据序列;In step S601, obtain load data, and determine an operator operation data sequence based on the load data;
在步骤S602中,对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组。In step S602, perform clustering on the operator operation data sequence to obtain one or more operator clusters.
该实施方式中的上述技术特征已在上文中进行详细解释,本公开在此不再赘述。The above-mentioned technical features in this embodiment have been explained in detail above, and the present disclosure will not repeat them here.
下述为本发明装置实施例,可以用于执行本发明方法实施例。The following are device embodiments of the present invention, which can be used to implement the method embodiments of the present invention.
图7示出根据本发明一实施方式的负载测试装置的结构框图,该装置可以通过软件、硬件或者两者的结合实现成为电子设备的部分或者全部。如图7所示,所述负载测试装置包括:Fig. 7 shows a structural block diagram of a load testing device according to an embodiment of the present invention. The device can be implemented as part or all of an electronic device through software, hardware, or a combination of both. As shown in Figure 7, the load test device includes:
确定模块701,被配置为获取负载数据,并基于所述负载数据确定操作方操作数据 序列;The determining module 701 is configured to obtain load data, and determine an operator operation data sequence based on the load data;
第一聚类模块702,被配置为对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组;The first clustering module 702 is configured to cluster the operator operation data sequence to obtain one or more operator clusters;
生成模块703,被配置为确定所述操作方类组中满足预设条件的目标操作方,并根据所述目标操作方的数据序列生成负载测试数据进行负载测试。The generating module 703 is configured to determine a target operator in the operator group that meets a preset condition, and generate load test data for load test according to the data sequence of the target operator.
上文提及,随着数据技术和互联网技术的发展,越来越多的服务提供商通过软件系统为用户提供服务,为了保障软件系统的稳定运行,在实践中通常使用负载测试来确保软件系统在负载下的运行质量。负载测试的目标是确保软件系统在现实工作负载下表现良好,为了达到这个目标,首先需要恢复工作负载,然后根据恢复的工作负载设计负载测试。恢复负载工作量是一项挑战性的任务,因为需要在工作负载的粒度级别与使用此类工作负载进行负载测试的成本之间取得平衡。若恢复的工作负载太粗糙,即工作负载过于简化,则该恢复的工作负载无法捕获用户行为的差异,从而导致负载测试丧失代表性;若工作负载逐步重放确切的现场工作负载,虽然能够重现确切的用户行为,但该方法维持工作负载的成本非常高。现有技术通常基于少量群集的代表性用户行为来设计工作负载,而在聚集用户时,通常是基于不同用户操作的频率来实现。但是,由于大型软件系统中用户存在高度可变性,仅仅考虑事件的发生频率不免过于粗糙,若添加有关用户操作的更多详细信息又将会导致恢复、执行和维护成本过高。As mentioned above, with the development of data technology and Internet technology, more and more service providers provide services to users through software systems. In order to ensure the stable operation of software systems, load testing is usually used to ensure software systems in practice. The running quality under load. The goal of load testing is to ensure that the software system performs well under actual workloads. In order to achieve this goal, the workload needs to be restored first, and then the load test is designed according to the restored workload. Recovering load workloads is a challenging task because of the need to strike a balance between the level of granularity of the workload and the cost of using such workloads for load testing. If the restored workload is too rough, that is, the workload is too simplified, the restored workload cannot capture the difference in user behavior, resulting in the loss of representativeness of the load test; if the workload gradually replays the exact field workload, it can be reproduced. Exact user behavior, but the cost of maintaining the workload is very high. In the prior art, workloads are usually designed based on representative user behaviors of a small number of clusters, and when users are aggregated, they are usually implemented based on the frequency of operations of different users. However, due to the high variability of users in large-scale software systems, it is too rough to only consider the frequency of events. Adding more detailed information about user operations will lead to excessive recovery, execution, and maintenance costs.
考虑到上述问题,在该实施方式中,提出一种负载测试装置,该装置基于负载数据来获取操作方操作数据序列,并通过对于操作方操作数据序列进行聚类,得到一个或多个操作方类组,根据操作方类组中满足预设条件的目标操作方的数据序列生成负载测试数据进行负载测试。该技术方案使用操作方事件和上下文信息组合的序列来帮助恢复负载工作量,从而实现在不同操作方行为粒度级别上的工作负载恢复,另外通过聚类方法来挖掘具有代表性的操作方,从而借助少量用户就可以实现工作负载的生成。Considering the above problems, in this embodiment, a load testing device is proposed. The device obtains an operator's operation data sequence based on load data, and obtains one or more operators by clustering the operator's operation data sequence. The class group generates load test data for load test according to the data sequence of the target operator that meets the preset conditions in the operator class group. This technical solution uses a sequence of operator events and context information to help restore the workload, so as to achieve workload recovery at different levels of operator behavior granularity. In addition, clustering methods are used to mine representative operators, thereby With a small number of users, workload generation can be achieved.
在本发明一实施方式中,所述负载数据指的是在预设时间段内基于操作方的操作产生或生成的负载数据。其中,所述操作方指的是负载操作用户、负载操作机器或者负载操作资源等操作方。其中,所述操作方的操作比如可以为搜索操作、删除操作、新增操作、编辑操作等操作。In an embodiment of the present invention, the load data refers to load data generated or generated based on the operation of the operator within a preset time period. Wherein, the operator refers to an operator such as a load operation user, a load operation machine, or a load operation resource. Wherein, the operation of the operator may be, for example, a search operation, a deletion operation, a new operation, an editing operation, and the like.
在本发明一实施方式中,所述负载数据可包括以下数据中的一种或多种:操作方标识信息和与所述操作方标识信息对应的负载数据,包括负载工作量、负载内容、负载处理结果等等。其中,所述操作方标识信息用于对于操作方进行唯一性标识。In an embodiment of the present invention, the load data may include one or more of the following data: operator identification information and load data corresponding to the operator identification information, including load workload, load content, load Processing results and so on. Wherein, the operator identification information is used to uniquely identify the operator.
在本发明一实施方式中,所述操作方操作数据序列指的是由某一特定操作方产生的负载数据组成的数据序列,所述操作方操作数据序列能够体现操作方操作事件特征,能够体现与操作方操作事件存在某种关联的上下文信息,其中,所述操作方操作事件特征比如可以包括以下特征中的一种或多种:操作方操作事件目的、操作方操作事件内容、操作方操作事件效果等等,其中,所述上下文信息比如可以包括以下信息中的一种或多种:与所述操作方相关的其他操作方的操作事件信息、与所述操作方操作事件存在时间先后顺序的该操作方的其他操作事件信息等等。基于所述数据序列中丰富的信息,可根据实际应用的需要获取具有不同操作方行为粒度级别的数据信息。其中,所述操作方操作数据序列可按照某一预设规律排列,比如,按照时间先后的顺序,按照某一字段出现的顺序或者某一字段出现频率的顺序等等。比如,按照某一字段出现的顺序,所述操作方操作数据序列可以为搜索删除新增序列、搜索编辑序列、新增编辑序列等等。In an embodiment of the present invention, the operator's operation data sequence refers to a data sequence composed of load data generated by a specific operator, and the operator's operation data sequence can reflect the characteristics of the operator's operation event and can reflect There is some contextual information associated with the operation event of the operator, where the characteristics of the operation event of the operator may include, for example, one or more of the following characteristics: the purpose of the operation event of the operator, the content of the operation event of the operator, and the operation of the operator Event effects, etc., where the context information may include, for example, one or more of the following information: operation event information of other operators related to the operator, and time sequence of the existence of the operation event with the operator Other operation event information of the operator and so on. Based on the abundant information in the data sequence, data information with different levels of operator behavior granularity can be obtained according to actual application requirements. Wherein, the operator's operation data sequence may be arranged according to a certain preset rule, for example, in chronological order, in the order of appearance of a certain field, or the order of appearance of a certain field, and so on. For example, according to the order in which a certain field appears, the operator can manipulate the data sequence for searching and deleting a new sequence, searching for an editing sequence, adding an editing sequence, and so on.
考虑到日志数据是一种能够详细记录事件过程和经历的数据,具有完备性和完整性,从日志数据中能够更完整、更准确地获取操作方负载数据,因此,在本发明一实施方式中,所述负载数据为负载日志数据。当然,所述负载数据也可以为模拟负载数据或者实时负载数据。Considering that the log data is a kind of data that can record the event process and experience in detail, it is complete and complete, and the operator load data can be obtained more completely and accurately from the log data. Therefore, in an embodiment of the present invention , The load data is load log data. Of course, the load data may also be simulated load data or real-time load data.
在本发明一实施方式中,如图8所示,所述确定模块701包括:In an embodiment of the present invention, as shown in FIG. 8, the determining module 701 includes:
第一获取子模块801,被配置为获取负载日志数据;The first obtaining submodule 801 is configured to obtain load log data;
第一确定子模块802,被配置为确定所述负载日志数据中的操作方标识信息;The first determining submodule 802 is configured to determine the operator identification information in the load log data;
第二获取子模块803,被配置为基于所述操作方标识信息获取与所述操作方标识信息对应的负载日志数据,得到与所述操作方标识信息对应的操作方操作数据序列。The second acquisition submodule 803 is configured to acquire load log data corresponding to the operator identification information based on the operator identification information, and obtain an operator operation data sequence corresponding to the operator identification information.
为了获取操作方操作数据序列,进而得到操作方的行为信息,在该实施方式中,从所述负载数据中分析并挖掘出操作方操作数据序列。具体地,第一获取子模块801获取负载日志数据;第一确定子模块802确定所述负载日志数据中出现的操作方标识信息;第二获取子模块803基于所述操作方标识信息获取与所述操作方标识信息对应的负载日志数据,将与所述操作方标识信息对应的负载日志数据组合起来,即可得到与所述操作方标识信息对应的操作方操作数据序列。In order to obtain the operator's operation data sequence, and then obtain the operator's behavior information, in this embodiment, the operator's operation data sequence is analyzed and mined from the load data. Specifically, the first obtaining sub-module 801 obtains load log data; the first determining sub-module 802 determines the operator identification information appearing in the load log data; the second obtaining sub-module 803 obtains the data based on the operator identification information. The load log data corresponding to the operator identification information is combined with the load log data corresponding to the operator identification information to obtain the operator operation data sequence corresponding to the operator identification information.
在本发明一实施方式中,第二获取子模块803在组合与所述操作方标识信息对应的负载日志数据时,既可按照时间先后的顺序进行组合,也可按照某一字段出现的顺序进行组合,再或者按照某一字段出现频率的顺序进行组合。本领域技术人员可根据实际应用的需要以及被组合数据的特点选择合适的数据组合方式,本公开对其不作具体限定。In an embodiment of the present invention, when the second obtaining submodule 803 combines the load log data corresponding to the operator identification information, the combination may be performed in chronological order or in the order in which a certain field appears. Combine, or combine in the order of frequency of occurrence of a certain field. Those skilled in the art can select an appropriate data combination method according to the needs of the actual application and the characteristics of the combined data, which is not specifically limited in the present disclosure.
在本发明一实施方式中,如图9所示,所述第一聚类模块702包括:In an embodiment of the present invention, as shown in FIG. 9, the first clustering module 702 includes:
第一计算子模块901,被配置为计算所述操作方操作数据序列的距离矩阵;The first calculation sub-module 901 is configured to calculate the distance matrix of the operation data sequence of the operator;
聚类子模块902,被配置为根据所述距离矩阵对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组。The clustering sub-module 902 is configured to cluster the operator operation data sequence according to the distance matrix to obtain one or more operator clusters.
为了在保证负载测试数据正确性的前提下,使用尽可能少的操作方操作数据生成负载测试数据,减少负载测试运算量,提高负载测试工作效率,在该实施方式中,所述第一聚类模块702通过对于操作方进行聚类的方法以便于后续基于聚类类组获取具有代表性的操作方及其对应的负载测试数据,最终得到进行负载测试的负载测试数据。其中,聚类指的是将物理或抽象对象的集合分成由类似的对象组成的多个类的过程,也即由聚类操作所生成的类组是一组数据对象的集合,这些对象与同一个类组中的对象彼此相似,与其他类组中的对象彼此相异。In order to use as few operator operation data as possible to generate load test data under the premise of ensuring the correctness of the load test data, reduce the amount of load test calculations, and improve the work efficiency of the load test, in this embodiment, the first clustering The module 702 uses a method of clustering operators to facilitate subsequent acquisition of representative operators and their corresponding load test data based on clustering groups, and finally obtains load test data for load testing. Among them, clustering refers to the process of dividing a collection of physical or abstract objects into multiple classes composed of similar objects, that is, the cluster generated by the clustering operation is a collection of a set of data objects. Objects in one cluster are similar to each other, but different from objects in other clusters.
在本发明一实施方式中,所述第一聚类模块702使用层次聚类的方法,基于皮尔孙距离对于所述操作方操作数据序列进行聚类,最终聚类的结果可通过树状图展示出来,在聚类过程中,使用Calinski-Harabasz停止规则来切割树状图并确定最终的类组数目。具体地,采用以下方案实现聚类:第一计算子模块901计算所述操作方操作数据序列的距离矩阵;聚类子模块902根据所述距离矩阵对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组,位于同一操作方类组中的操作方可认为操作行为比较类似,进而可获取每一操作方类组中具有代表性的操作方来代表该操作方类组的行为。In an embodiment of the present invention, the first clustering module 702 uses a hierarchical clustering method to cluster the operator operation data sequence based on the Pearson distance, and the final clustering result can be displayed in a tree diagram. In the clustering process, the Calinski-Harabasz stopping rule is used to cut the dendrogram and determine the final number of clusters. Specifically, the following solution is adopted to implement clustering: the first calculation submodule 901 calculates the distance matrix of the operator operation data sequence; the clustering submodule 902 performs clustering on the operator operation data sequence according to the distance matrix, Obtain one or more operator class groups. Operators in the same operator class group can be considered to have similar operating behaviors, and then representative operators in each operator class group can be obtained to represent the operator class group the behavior of.
在本发明一实施方式中,在聚类之前,还可对于聚类方法所使用的参数进行初始化。In an embodiment of the present invention, before clustering, the parameters used by the clustering method can also be initialized.
当然,上述层次聚类的方法仅为示例性说明,在实际应用中,也可以使用其他聚类方法,比如,基于划分的聚类算法、基于密度的聚类算法、以及基于分布的聚类算法等等。在聚类停止判断时,除了上文示例的Calinski-Harabasz停止规则,还可以使用其他聚类停止判断方法,比如Silhouette轮廓系数停止规则,Davies-Bouldin停止规则等等。本领域技术人员可根据实际应用的需要以及待聚类对象的特点选择合适的聚类方法及聚类停止判断规则,本公开对其不作特别限定。Of course, the above-mentioned hierarchical clustering method is only illustrative. In practical applications, other clustering methods can also be used, such as partition-based clustering algorithms, density-based clustering algorithms, and distribution-based clustering algorithms. and many more. In the cluster stop judgment, in addition to the Calinski-Harabasz stop rule in the example above, other cluster stop judgment methods can also be used, such as the Silhouette contour coefficient stop rule, the Davies-Bouldin stop rule and so on. Those skilled in the art can select an appropriate clustering method and cluster stop judgment rule according to actual application requirements and the characteristics of the objects to be clustered, which are not particularly limited in the present disclosure.
在本发明一实施方式中,如图10所示,所述第一计算子模块901包括:In an embodiment of the present invention, as shown in FIG. 10, the first calculation submodule 901 includes:
构建子模块1001,被配置为基于所述操作方操作数据序列构建操作方操作数据序列矩阵;The construction sub-module 1001 is configured to construct an operator operation data sequence matrix based on the operator operation data sequence;
第一生成子模块1002,被配置为基于所述操作方操作数据序列生成操作方操作数据序列相似度矩阵;The first generating sub-module 1002 is configured to generate an operator operation data sequence similarity matrix based on the operator operation data sequence;
相乘子模块1003,被配置为将所述操作方操作数据序列矩阵与所述操作方操作数据序列相似度矩阵相乘,得到所述操作方的距离矩阵。The multiplication sub-module 1003 is configured to multiply the operation data sequence matrix of the operator and the similarity matrix of the operation data sequence of the operator to obtain the distance matrix of the operator.
在该实施方式中,在所述第一计算子模块901计算所述操作方操作数据序列的距离矩阵时,构建子模块1001首先基于所述操作方操作数据序列构建操作方操作数据序列矩阵,比如,可按照操作方标识信息将对应的操作方操作数据序列组合成操作方操作数据序列矩阵;第一生成子模块1002然后基于所述操作方操作数据序列生成操作方操作数据序列相似度矩阵,其中,所述操作方操作数据序列相似度矩阵用于表征操作方操作数据序列彼此之间的相似度;相乘子模块1003最后将所述操作方操作数据序列矩阵与所述操作方操作数据序列相似度矩阵相乘,即可得到所述操作方的距离矩阵,使用该方法得到的距离矩阵不仅考虑了操作方与操作方之间的相似度,同时也考虑了所有操作方操作数据序列之间的相似度。In this embodiment, when the first calculation submodule 901 calculates the distance matrix of the operator operation data sequence, the construction submodule 1001 first constructs the operator operation data sequence matrix based on the operator operation data sequence, such as , The corresponding operator operation data sequence can be combined into an operator operation data sequence matrix according to the operator identification information; the first generation sub-module 1002 then generates the operator operation data sequence similarity matrix based on the operator operation data sequence, where The operator operation data sequence similarity matrix is used to characterize the similarity between the operator operation data sequences; the multiplication sub-module 1003 finally makes the operator operation data sequence matrix similar to the operator operation data sequence The distance matrix of the operator can be obtained by multiplying the degree matrix. The distance matrix obtained by this method not only considers the similarity between the operator and the operator, but also considers the difference between the operation data sequences of all operators. Similarity.
在本发明一实施方式中,所述第一生成子模块1002可被配置为:In an embodiment of the present invention, the first generation submodule 1002 may be configured as:
确定所述操作方操作数据序列中的频繁序列;Determine frequent sequences in the operator's operation data sequence;
计算所述频繁序列之间的编辑距离;Calculating the edit distance between the frequent sequences;
根据所述频繁序列之间的编辑距离确定所述操作方操作数据序列之间的相似度,并生成所述操作方操作数据序列相似度矩阵。The similarity between the operator's operation data sequences is determined according to the edit distance between the frequent sequences, and the operator's operation data sequence similarity matrix is generated.
考虑到频繁序列在一定程度上具有一定的代表性,能够代表数据序列集合中数据出现的特点,因此,在该实施方式中,所述第一生成子模块1002基于频繁序列来生成操作方操作数据序列相似度矩阵。具体地,首先确定所述操作方操作数据序列中的频繁序列,其中,所述频繁序列指的是出现频率高于预设频率阈值的数据序列,能够在一定程度上代表数据的特点;然后计算所述频繁序列之间的编辑距离,比如,可使用Levenshtein方法来计算所述频繁序列之间的编辑距离,当然也可以使用其他编辑距离计算方法,本公开对于具体的编辑距离计算方法不作具体限定;最后根据所述频繁序列之间的编辑距离计算所述操作方操作数据序列之间的相似度,编辑距离越近的频繁序列,其相似度就越高,基于所述操作方操作数据序列之间的相似度即可生成所述操作方操作数据序列相似度矩阵。Considering that frequent sequences are representative to a certain extent and can represent the characteristics of the data in the data sequence set, therefore, in this embodiment, the first generation sub-module 1002 generates operator operation data based on frequent sequences Sequence similarity matrix. Specifically, first determine the frequent sequence in the operator's operation data sequence, where the frequent sequence refers to a data sequence whose frequency of occurrence is higher than a preset frequency threshold, which can represent the characteristics of the data to a certain extent; and then calculate For the edit distance between the frequent sequences, for example, the Levenshtein method may be used to calculate the edit distance between the frequent sequences, of course, other edit distance calculation methods may also be used, and the present disclosure does not specifically limit the specific edit distance calculation method ; Finally, the similarity between the operating data sequences of the operator is calculated according to the editing distance between the frequent sequences, the closer the editing distance is, the higher the similarity is, based on the operating data sequence of the operator The similarity between the two can generate the similarity matrix of the operation data sequence of the operator.
在上述示例中,采用的是Levenshtein编辑距离相似度计算方法,当然也可以采用其他字符串相似度计算方法,比如余弦相似度计算方法和杰卡德系数相似度计算方法等等,具体可根据实际应用的需要以及数据序列的特点选择合适的相似度计算方法,本公开对其不作具体限定。In the above example, the Levenshtein edit distance similarity calculation method is used. Of course, other string similarity calculation methods can also be used, such as the cosine similarity calculation method and the Jaccard coefficient similarity calculation method, etc., depending on the actual situation. The application needs and the characteristics of the data sequence select an appropriate similarity calculation method, which is not specifically limited in the present disclosure.
在本发明一实施方式中,如图11所示,所述生成模块703包括:In an embodiment of the present invention, as shown in FIG. 11, the generating module 703 includes:
第三确定子模块1101,被配置为确定所述操作方类组中满足预设条件的目标操作方;The third determining submodule 1101 is configured to determine a target operator in the operator group that meets a preset condition;
第三获取子模块1102,被配置为获取所述目标操作方的频繁序列及其出现频率;The third obtaining submodule 1102 is configured to obtain the frequent sequence of the target operator and its appearance frequency;
第三计算子模块1103,被配置为根据所述频繁序列的出现频率计算所述频繁序列的出现概率;The third calculation submodule 1103 is configured to calculate the occurrence probability of the frequent sequence according to the occurrence frequency of the frequent sequence;
第三生成子模块1104,被配置为根据所述频繁序列的出现概率生成负载测试数据;The third generation submodule 1104 is configured to generate load test data according to the occurrence probability of the frequent sequence;
测试子模块1105,被配置为回放运行所述负载测试数据进行负载测试。The test sub-module 1105 is configured to replay and run the load test data to perform a load test.
上文提及,为了在保证负载测试数据正确性的前提下,使用尽可能少的操作方操作数据生成负载测试数据,减少负载测试运算量,提高负载测试工作效率,本公开对于操作方进行聚类,基于聚类类组获取具有代表性的操作方及其对应的负载测试数据以得到最终进行负载测试的负载测试数据。在该实施方式中,在得到一个或多个操作方类组之后,第三确定子模块1101首先选择所述操作方类组中满足预设条件的目标操作方,其中,所述预设条件指的是预设代表点条件,即选择得到的目标操作方为对应操作方类组中具有代表性的操作方;第三获取子模块1102然后获取所述目标操作方的频繁序列及其出现频率,其中,可根据上文描述方法来获取频繁序列,本公开在此不再赘述,在获取频繁序列的同时可相应获取频繁序列的出现频率;第三计算子模块1103根据所述频繁序列的出现频率计算所述频繁序列的出现概率,比如,将所述目标操作方的频繁序列的出现频率除以所述目标操作方数据序列的总数量即可得到所述目标操作方频繁序列的出现概率;第三生成子模块1104然后根据所述频繁序列的出现概率生成负载测试数据;测试子模块1105最后回放运行所述负载测试数据来实现负载测试。As mentioned above, in order to use as little operator operation data as possible to generate load test data under the premise of ensuring the correctness of the load test data, reduce the amount of load test calculations, and improve the efficiency of load test work, the present disclosure gathers operators Class, based on the clustering class group to obtain representative operators and their corresponding load test data to obtain the load test data for the final load test. In this embodiment, after obtaining one or more operator class groups, the third determining submodule 1101 first selects a target operator in the operator class group that meets a preset condition, where the preset condition refers to Is the preset representative point condition, that is, the selected target operator is a representative operator in the corresponding operator class group; the third acquisition submodule 1102 then acquires the frequent sequence of the target operator and its appearance frequency, Among them, the frequent sequence can be obtained according to the method described above. This disclosure will not repeat it here. While obtaining the frequent sequence, the frequency of the frequent sequence can be obtained accordingly; the third calculation sub-module 1103 is based on the frequency of the frequent sequence Calculate the occurrence probability of the frequent sequence, for example, divide the occurrence frequency of the frequent sequence of the target operator by the total number of data sequences of the target operator to obtain the occurrence probability of the frequent sequence of the target operator; The three generation sub-module 1104 then generates load test data according to the occurrence probability of the frequent sequence; the test sub-module 1105 finally replays and runs the load test data to implement the load test.
在本发明一实施方式中,所述第三确定子模块1101可使用中心点算法(Partitioning Around Medoids,PAM)来识别每个操作方类组中的代表操作方,中心点算法是一种基于k-中心点的聚类算法,具有较强的鲁棒性和准确性。当然,也可以借助其他方法来选择代表操作方,本领域技术人员根据实际应用的需要以及操作方类组数据的特点选择合适的代表操作方识别方法,本公开对其不作具体限定。In an embodiment of the present invention, the third determining sub-module 1101 may use a central point algorithm (Partitioning Around Medoids, PAM) to identify the representative operator in each operator class group. The central point algorithm is based on k -The clustering algorithm of the center point has strong robustness and accuracy. Of course, other methods can also be used to select the representative operator. Those skilled in the art can select an appropriate method for identifying the representative operator according to actual application requirements and characteristics of operator group data, which is not specifically limited in the present disclosure.
在本发明一实施方式中,在得到目标操作方的频繁序列的出现概率之后,由于所述目标操作方在其操作方类组中具有代表性,因此,所述第三生成子模块1104可使用所述目标操作方的频繁序列及其出现概率来替换相应操作方类组中其他操作方的操作数据序列及其出现概率,进而生成将要实现负载测试工作的负载测试数据。比如,若某一操作方类组的目标操作方的频繁序列为搜索删除新增序列、搜索编辑序列和新增编辑序列, 相应的出现概率为50%、25%和25%,则可使用上述频繁序列及其出现概率替换该操作方类组中其他操作方的操作数据序列及其出现概率,假设除了目标操作方还有两个操作方:操作方1和操作方2,那么最终生成的负载测试数据可以为:In an embodiment of the present invention, after obtaining the occurrence probability of the frequent sequence of the target operator, since the target operator is representative in its operator class group, the third generation submodule 1104 can use The frequent sequence and the occurrence probability of the target operator replace the operation data sequence and the occurrence probability of other operators in the corresponding operator group, thereby generating load test data that will implement the load test work. For example, if the frequent sequence of the target operator of a certain operator group is search delete new sequence, search edit sequence and new edit sequence, the corresponding occurrence probability is 50%, 25% and 25%, then the above can be used The frequent sequence and its occurrence probability replace the operation data sequence and its occurrence probability of other operators in the operator class group. Assuming that there are two operators besides the target operator: operator 1 and operator 2, then the final load is generated The test data can be:
目标操作方:搜索删除新增序列,50%;搜索编辑序列,25%;新增编辑序列,25%;Target operator: search and delete new sequence, 50%; search and edit sequence, 25%; add and edit sequence, 25%;
操作方1:搜索删除新增序列,50%;搜索编辑序列,25%;新增编辑序列,25%;Operator 1: Search and delete new sequence, 50%; search and edit sequence, 25%; add and edit sequence, 25%;
操作方2搜索删除新增序列,50%;搜索编辑序列,25%;新增编辑序列,25%。Operator 2 searches and deletes new sequences, 50%; searches and edits sequences, 25%; adds and edits sequences, 25%.
在本发明一实施方式中,所述测试子模块1105可借助回放工具FIO或者JMeter回放运行所述负载测试数据来实现负载测试,在负载测试过程中,回放运行所述负载测试数据之后,记录测试系统的测试性能数据,将其与原始负载运行的性能数据相比较,即可得到负载测试结果。In one embodiment of the present invention, the test sub-module 1105 can use the playback tool FIO or JMeter to replay and run the load test data to implement the load test. During the load test, after the load test data is replayed and run, the test is recorded. The test performance data of the system is compared with the performance data of the original load operation to obtain the load test result.
图12示出根据本发明一实施方式的负载聚类装置的结构框图,该装置可以通过软件、硬件或者两者的结合实现成为电子设备的部分或者全部。如图12所示,所述负载聚类装置包括:Fig. 12 shows a structural block diagram of a load clustering device according to an embodiment of the present invention. The device can be implemented as part or all of an electronic device through software, hardware, or a combination of the two. As shown in FIG. 12, the load clustering device includes:
获取模块1201,被配置为获取负载数据,并基于所述负载数据确定操作方操作数据序列;The obtaining module 1201 is configured to obtain load data, and determine an operator operation data sequence based on the load data;
第二聚类模块1202,被配置为对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组。The second clustering module 1202 is configured to perform clustering on the operator operation data sequence to obtain one or more operator clusters.
该实施方式中的上述技术特征已在上文中进行详细解释,本公开在此不再赘述。The above-mentioned technical features in this embodiment have been explained in detail above, and the present disclosure will not repeat them here.
接下来以一应用场景为例对于本发明技术方案进行进一步的说明,如图13所示,在该应用场景中,所述负载测试装置可布设于分布式数据系统中,对于所述分布式数据系统中的一个或多个分布式数据设备比如客户端1301进行负载测试。在所述分布式数据系统中,多个客户端1301分别与数据库1302连接,所述负载测试装置1303从所述数据库1302中获取负载数据,所述负载测试装置1303中的确定模块1304基于所述负载数据确定得到操作方操作数据序列,所述负载测试装置1303中的第一聚类模块1305对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组,所述负载测试装置1303中的生成模块1306确定所述操作方类组中满足预设条件的目标操作方,并根据所述目标操作方的数据序列生成负载测试数据进行负载测试,最终得到负载测试结果。Next, an application scenario is taken as an example to further illustrate the technical solution of the present invention. As shown in FIG. 13, in this application scenario, the load test device can be deployed in a distributed data system. One or more distributed data devices in the system, such as client 1301, perform load testing. In the distributed data system, multiple clients 1301 are respectively connected to a database 1302, the load test device 1303 obtains load data from the database 1302, and the determination module 1304 in the load test device 1303 is based on the The load data is determined to obtain the operator operation data sequence, the first clustering module 1305 in the load test device 1303 clusters the operator operation data sequence to obtain one or more operator class groups, the load test The generating module 1306 in the device 1303 determines the target operator in the operator class group that meets the preset conditions, and generates load test data according to the data sequence of the target operator to perform the load test, and finally obtains the load test result.
本发明实施例还公开了一种电子设备,图14示出根据本发明一实施方式的电子设备的结构框图,如图14所示,所述电子设备1400包括存储器1401和处理器1402;其中,The embodiment of the present invention also discloses an electronic device. FIG. 14 shows a structural block diagram of an electronic device according to an embodiment of the present invention. As shown in FIG. 14, the electronic device 1400 includes a memory 1401 and a processor 1402; among them,
所述存储器1401用于存储一条或多条计算机指令,其中,所述一条或多条计算机指 令被所述处理器1402执行以实现上述任一方法步骤。The memory 1401 is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor 1402 to implement any of the above method steps.
图15适于用来实现根据本发明实施方式的负载测试方法的计算机系统的结构示意图。FIG. 15 is a schematic structural diagram of a computer system suitable for implementing the load test method according to the embodiment of the present invention.
如图15所示,计算机系统1500包括处理单元1501,其可以根据存储在只读存储器(ROM)1502中的程序或者从存储部分1508加载到随机访问存储器(RAM)1503中的程序而执行上述实施方式中的各种处理。在RAM1503中,还存储有系统1500操作所需的各种程序和数据。处理单元1501、ROM1502以及RAM1503通过总线1504彼此相连。输入/输出(I/O)接口1505也连接至总线1504。As shown in FIG. 15, the computer system 1500 includes a processing unit 1501, which can execute the above-mentioned implementation according to a program stored in a read-only memory (ROM) 1502 or a program loaded from a storage portion 1508 into a random access memory (RAM) 1503 Various treatments in the way. In the RAM 1503, various programs and data required for the operation of the system 1500 are also stored. The processing unit 1501, ROM 1502, and RAM 1503 are connected to each other through a bus 1504. An input/output (I/O) interface 1505 is also connected to the bus 1504.
以下部件连接至I/O接口1505:包括键盘、鼠标等的输入部分1506;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分1507;包括硬盘等的存储部分1508;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分1509。通信部分1509经由诸如因特网的网络执行通信处理。驱动器1510也根据需要连接至I/O接口1505。可拆卸介质1511,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1510上,以便于从其上读出的计算机程序根据需要被安装入存储部分1508。其中,所述处理单元1501可实现为CPU、GPU、FPAG、NPU等处理单元。The following components are connected to the I/O interface 1505: an input part 1506 including a keyboard, a mouse, etc.; an output part 1507 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and speakers, etc.; a storage part 1508 including a hard disk, etc. ; And a communication section 1509 including a network interface card such as a LAN card, a modem, and the like. The communication section 1509 performs communication processing via a network such as the Internet. The driver 1510 is also connected to the I/O interface 1505 as needed. A removable medium 1511, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 1510 as required, so that the computer program read therefrom is installed into the storage portion 1508 as required. Wherein, the processing unit 1501 may be implemented as a processing unit such as CPU, GPU, FPAG, and NPU.
特别地,根据本发明的实施方式,上文描述的方法可以被实现为计算机软件程序。例如,本发明的实施方式包括一种计算机程序产品,其包括有形地包含在及其可读介质上的计算机程序,所述计算机程序包含用于执行所述负载测试方法的程序代码。在这样的实施方式中,该计算机程序可以通过通信部分1509从网络上被下载和安装,和/或从可拆卸介质1511被安装。In particular, according to the embodiments of the present invention, the method described above may be implemented as a computer software program. For example, the embodiment of the present invention includes a computer program product, which includes a computer program tangibly contained on a readable medium thereof, and the computer program includes program code for executing the load test method. In such an embodiment, the computer program may be downloaded and installed from the network through the communication part 1509, and/or installed from the removable medium 1511.
附图中的流程图和框图,图示了按照本发明各种实施方式的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,路程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the drawings illustrate the possible implementation architecture, functions, and operations of the system, method, and computer program product according to various embodiments of the present invention. In this regard, each block in the route diagram or block diagram may represent a module, program segment, or part of the code, and the module, program segment, or part of the code contains one or more functions for realizing the specified logic function. Executable instructions. It should also be noted that in some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
描述于本发明实施方式中所涉及到的单元或模块可以通过软件的方式实现,也可以 通过硬件的方式来实现。所描述的单元或模块也可以设置在处理器中,这些单元或模块的名称在某种情况下并不构成对该单元或模块本身的限定。The units or modules involved in the embodiments described in the present invention can be implemented in software or hardware. The described units or modules may also be provided in the processor, and the names of these units or modules do not constitute a limitation on the units or modules themselves under certain circumstances.
作为另一方面,本发明实施例还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施方式中所述装置中所包含的计算机可读存储介质;也可以是单独存在,未装配入设备中的计算机可读存储介质。计算机可读存储介质存储有一个或者一个以上程序,所述程序被一个或者一个以上的处理器用来执行描述于本发明实施例的方法。As another aspect, the embodiments of the present invention also provide a computer-readable storage medium. The computer-readable storage medium may be the computer-readable storage medium included in the device described in the above-mentioned embodiment; or it may exist alone. , A computer-readable storage medium that is not installed in the device. The computer-readable storage medium stores one or more programs, and the programs are used by one or more processors to execute the methods described in the embodiments of the present invention.
以上描述仅为本发明的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本发明实施例中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离所述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本发明实施例中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present invention and an explanation of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in the embodiments of the present invention is not limited to the technical solution formed by the specific combination of the above technical features, and should also cover the above-mentioned technical solutions without departing from the inventive concept. Other technical solutions formed by any combination of technical features or their equivalent features. For example, the above-mentioned features and the technical features disclosed in the embodiments of the present invention (but not limited to) having similar functions are replaced with each other to form a technical solution.

Claims (18)

  1. 一种负载测试方法,其特征在于,包括:A load testing method is characterized in that it comprises:
    获取负载数据,并基于所述负载数据确定操作方操作数据序列;Acquiring load data, and determining an operator operation data sequence based on the load data;
    对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组;Perform clustering on the operator operation data sequence to obtain one or more operator clusters;
    确定所述操作方类组中满足预设条件的目标操作方,并根据所述目标操作方的数据序列生成负载测试数据进行负载测试。Determine the target operator in the operator class group that meets the preset condition, and generate load test data according to the data sequence of the target operator to perform the load test.
  2. 根据权利要求1所述的方法,其特征在于,所述负载数据为负载日志数据或者模拟负载数据或者实时负载数据。The method according to claim 1, wherein the load data is load log data or simulated load data or real-time load data.
  3. 根据权利要求2所述的方法,其特征在于,所述获取负载数据,并基于所述负载数据确定操作方操作数据序列,包括:The method according to claim 2, wherein said obtaining load data and determining an operator operation data sequence based on said load data comprises:
    获取负载日志数据;Obtain load log data;
    确定所述负载日志数据中的操作方标识信息;Determine the operator identification information in the load log data;
    基于所述操作方标识信息获取与所述操作方标识信息对应的负载日志数据,得到与所述操作方标识信息对应的操作方操作数据序列。The load log data corresponding to the operator identification information is acquired based on the operator identification information, and the operator operation data sequence corresponding to the operator identification information is obtained.
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组,包括:The method according to any one of claims 1 to 3, wherein the clustering the operator operation data sequence to obtain one or more operator clusters comprises:
    计算所述操作方操作数据序列的距离矩阵;Calculating the distance matrix of the operation data sequence of the operator;
    根据所述距离矩阵对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组。Perform clustering on the operator operation data sequence according to the distance matrix to obtain one or more operator class groups.
  5. 根据权利要求4所述的方法,其特征在于,所述计算所述操作方操作数据序列的距离矩阵,包括:The method according to claim 4, wherein the calculating the distance matrix of the operation data sequence of the operator comprises:
    基于所述操作方操作数据序列构建操作方操作数据序列矩阵;Constructing an operator's operation data sequence matrix based on the operator's operation data sequence;
    基于所述操作方操作数据序列生成操作方操作数据序列相似度矩阵;Generating a similarity matrix of the operator's operation data sequence based on the operator's operation data sequence;
    将所述操作方操作数据序列矩阵与所述操作方操作数据序列相似度矩阵相乘,得到所述操作方的距离矩阵。The operator's operation data sequence matrix and the operator's operation data sequence similarity matrix are multiplied to obtain the distance matrix of the operator.
  6. 根据权利要求5所述的方法,其特征在于,所述基于所述操作方操作数据序列生成操作方操作数据序列相似度矩阵,被实施为:The method according to claim 5, wherein said generating an operator operation data sequence similarity matrix based on said operator operation data sequence is implemented as:
    确定所述操作方操作数据序列中的频繁序列;Determine frequent sequences in the operator's operation data sequence;
    计算所述频繁序列之间的编辑距离;Calculating the edit distance between the frequent sequences;
    根据所述频繁序列之间的编辑距离确定所述操作方操作数据序列之间的相似度,并 生成所述操作方操作数据序列相似度矩阵。The similarity between the operator's operation data sequences is determined according to the edit distance between the frequent sequences, and the operator's operation data sequence similarity matrix is generated.
  7. 根据权利要求1所述的方法,其特征在于,所述确定所述操作方类组中满足预设条件的目标操作方,并根据所述目标操作方的数据序列生成负载测试数据进行负载测试,包括:The method according to claim 1, wherein the determining a target operator in the operator class group that meets a preset condition, and generating load test data according to the data sequence of the target operator to perform the load test, include:
    确定所述操作方类组中满足预设条件的目标操作方;Determine a target operator in the operator group that meets a preset condition;
    获取所述目标操作方的频繁序列及其出现频率;Acquiring the frequent sequence and frequency of occurrence of the target operator;
    根据所述频繁序列的出现频率计算所述频繁序列的出现概率;Calculating the occurrence probability of the frequent sequence according to the occurrence frequency of the frequent sequence;
    根据所述频繁序列的出现概率生成负载测试数据;Generating load test data according to the occurrence probability of the frequent sequence;
    回放运行所述负载测试数据进行负载测试。Play back and run the load test data to perform a load test.
  8. 一种负载聚类方法,其特征在于,包括:A load clustering method is characterized in that it comprises:
    获取负载数据,并基于所述负载数据确定操作方操作数据序列;Acquiring load data, and determining an operator operation data sequence based on the load data;
    对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组。Perform clustering on the operator operation data sequence to obtain one or more operator clusters.
  9. 一种负载测试装置,其特征在于,包括:A load testing device is characterized in that it comprises:
    确定模块,被配置为获取负载数据,并基于所述负载数据确定操作方操作数据序列;A determining module, configured to obtain load data, and determine an operator operation data sequence based on the load data;
    第一聚类模块,被配置为对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组;The first clustering module is configured to perform clustering on the operator operation data sequence to obtain one or more operator clusters;
    生成模块,被配置为确定所述操作方类组中满足预设条件的目标操作方,并根据所述目标操作方的数据序列生成负载测试数据进行负载测试。The generating module is configured to determine a target operator in the operator group that meets a preset condition, and generate load test data according to the data sequence of the target operator to perform the load test.
  10. 根据权利要求9所述的装置,其特征在于,所述负载数据为负载日志数据或者模拟负载数据或者实时负载数据。The device according to claim 9, wherein the load data is load log data or simulated load data or real-time load data.
  11. 根据权利要求10所述的装置,其特征在于,所述确定模块包括:The device according to claim 10, wherein the determining module comprises:
    第一获取子模块,被配置为获取负载日志数据;The first obtaining submodule is configured to obtain load log data;
    第一确定子模块,被配置为确定所述负载日志数据中的操作方标识信息;The first determining submodule is configured to determine the operator identification information in the load log data;
    第二获取子模块,被配置为基于所述操作方标识信息获取与所述操作方标识信息对应的负载日志数据,得到与所述操作方标识信息对应的操作方操作数据序列。The second acquisition submodule is configured to acquire load log data corresponding to the operator identification information based on the operator identification information, and obtain an operator operation data sequence corresponding to the operator identification information.
  12. 根据权利要求9-11任一所述的装置,其特征在于,所述第一聚类模块包括:The device according to any one of claims 9-11, wherein the first clustering module comprises:
    第一计算子模块,被配置为计算所述操作方操作数据序列的距离矩阵;The first calculation sub-module is configured to calculate the distance matrix of the operation data sequence of the operator;
    聚类子模块,被配置为根据所述距离矩阵对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组。The clustering sub-module is configured to perform clustering on the operator operation data sequence according to the distance matrix to obtain one or more operator clusters.
  13. 根据权利要求12所述的装置,其特征在于,所述第一计算子模块包括:The device according to claim 12, wherein the first calculation sub-module comprises:
    构建子模块,被配置为基于所述操作方操作数据序列构建操作方操作数据序列矩阵;A construction sub-module configured to construct an operator operation data sequence matrix based on the operator operation data sequence;
    第一生成子模块,被配置为基于所述操作方操作数据序列生成操作方操作数据序列相似度矩阵;The first generating sub-module is configured to generate an operator operation data sequence similarity matrix based on the operator operation data sequence;
    相乘子模块,被配置为将所述操作方操作数据序列矩阵与所述操作方操作数据序列相似度矩阵相乘,得到所述操作方的距离矩阵。The multiplication sub-module is configured to multiply the operator's operation data sequence matrix and the operator's operation data sequence similarity matrix to obtain the distance matrix of the operator.
  14. 根据权利要求13所述的装置,其特征在于,所述第一生成子模块被配置为:The device according to claim 13, wherein the first generating submodule is configured to:
    第二确定子模块,被配置为确定所述操作方操作数据序列中的频繁序列;The second determining submodule is configured to determine frequent sequences in the operator's operation data sequence;
    第二计算子模块,被配置为计算所述频繁序列之间的编辑距离;The second calculation sub-module is configured to calculate the edit distance between the frequent sequences;
    第二生成子模块,被配置为根据所述频繁序列之间的编辑距离确定所述操作方操作数据序列之间的相似度,并生成所述操作方操作数据序列相似度矩阵。The second generation sub-module is configured to determine the similarity between the operator operation data sequences according to the edit distance between the frequent sequences, and generate the operator operation data sequence similarity matrix.
  15. 根据权利要求9所述的装置,其特征在于,所述生成模块包括:The device according to claim 9, wherein the generating module comprises:
    第三确定子模块,被配置为确定所述操作方类组中满足预设条件的目标操作方;The third determining submodule is configured to determine a target operator in the operator class group that meets a preset condition;
    第三获取子模块,被配置为获取所述目标操作方的频繁序列及其出现频率;The third obtaining sub-module is configured to obtain the frequent sequence of the target operator and its appearance frequency;
    第三计算子模块,被配置为根据所述频繁序列的出现频率计算所述频繁序列的出现概率;The third calculation sub-module is configured to calculate the occurrence probability of the frequent sequence according to the occurrence frequency of the frequent sequence;
    第三生成子模块,被配置为根据所述频繁序列的出现概率生成负载测试数据;The third generation sub-module is configured to generate load test data according to the occurrence probability of the frequent sequence;
    测试子模块,被配置为回放运行所述负载测试数据进行负载测试。The test sub-module is configured to replay and run the load test data for load test.
  16. 一种负载聚类装置,其特征在于,包括:A load clustering device is characterized in that it comprises:
    获取模块,被配置为获取负载数据,并基于所述负载数据确定操作方操作数据序列;An obtaining module configured to obtain load data, and determine an operator operation data sequence based on the load data;
    第二聚类模块,被配置为对于所述操作方操作数据序列进行聚类,得到一个或多个操作方类组。The second clustering module is configured to cluster the operator operation data sequence to obtain one or more operator clusters.
  17. 一种电子设备,其特征在于,包括存储器和处理器;其中,An electronic device, which is characterized by comprising a memory and a processor; wherein,
    所述存储器用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令被所述处理器执行以实现权利要求1-8任一项所述的方法步骤。The memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of any one of claims 1-8.
  18. 一种计算机可读存储介质,其上存储有计算机指令,其特征在于,该计算机指令被处理器执行时实现权利要求1-8任一项所述的方法步骤。A computer-readable storage medium having computer instructions stored thereon, wherein the computer instructions implement the method steps of any one of claims 1-8 when the computer instructions are executed by a processor.
PCT/CN2020/114411 2019-09-12 2020-09-10 Load testing method and apparatus, and electronic device and computer-readable storage medium WO2021047575A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910866125.5 2019-09-12
CN201910866125.5A CN112486738B (en) 2019-09-12 2019-09-12 Load testing method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2021047575A1 true WO2021047575A1 (en) 2021-03-18

Family

ID=74867273

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/114411 WO2021047575A1 (en) 2019-09-12 2020-09-10 Load testing method and apparatus, and electronic device and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN112486738B (en)
WO (1) WO2021047575A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110314341A1 (en) * 2010-06-21 2011-12-22 Salesforce.Com, Inc. Method and systems for a dashboard testing framework in an online demand service environment
CN103092751A (en) * 2012-12-13 2013-05-08 华中科技大学 Web application performance test system based on customer behavior model in cloud environment
CN103207804A (en) * 2013-04-07 2013-07-17 杭州电子科技大学 MapReduce load simulation method based on cluster job logging
CN107193744A (en) * 2017-05-25 2017-09-22 中央民族大学 A kind of Web application performance test flows based on daily record describe method
CN107491384A (en) * 2016-06-12 2017-12-19 富士通株式会社 Information processor, information processing method and message processing device
CN107665165A (en) * 2016-07-27 2018-02-06 中兴通讯股份有限公司 Ambient noise generation method and device, method for testing pressure and device

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0020488D0 (en) * 2000-08-18 2000-10-11 Hewlett Packard Co Trusted status rollback
CN101441595B (en) * 2007-11-21 2010-11-03 英业达股份有限公司 Load monitoring apparatus and test structure and load monitoring method and test method thereof
CN103176973B (en) * 2011-12-20 2016-04-20 国际商业机器公司 For generating the system and method for the test job load of database
CN103530190B (en) * 2013-10-14 2016-08-17 北京邮电大学 A kind of load predicting method and device
CN107450968B (en) * 2016-05-31 2020-09-08 华为技术有限公司 Load reduction method, device and equipment
CN107480015B (en) * 2017-07-04 2020-12-01 网易(杭州)网络有限公司 Load testing method, device and system, storage medium and pressure testing server
CN108376982B (en) * 2017-11-24 2021-03-26 上海泰豪迈能能源科技有限公司 Load phase sequence identification method and device
CN108021509B (en) * 2017-12-27 2020-08-18 西安交通大学 Test case dynamic sequencing method based on program behavior network aggregation
CN108415777A (en) * 2018-03-21 2018-08-17 常州信息职业技术学院 A kind of cloud computing cluster task load predicting method based on cluster feature extraction
CN109558315B (en) * 2018-11-14 2022-02-15 泰康保险集团股份有限公司 Method, device and equipment for determining test range

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110314341A1 (en) * 2010-06-21 2011-12-22 Salesforce.Com, Inc. Method and systems for a dashboard testing framework in an online demand service environment
CN103092751A (en) * 2012-12-13 2013-05-08 华中科技大学 Web application performance test system based on customer behavior model in cloud environment
CN103207804A (en) * 2013-04-07 2013-07-17 杭州电子科技大学 MapReduce load simulation method based on cluster job logging
CN107491384A (en) * 2016-06-12 2017-12-19 富士通株式会社 Information processor, information processing method and message processing device
CN107665165A (en) * 2016-07-27 2018-02-06 中兴通讯股份有限公司 Ambient noise generation method and device, method for testing pressure and device
CN107193744A (en) * 2017-05-25 2017-09-22 中央民族大学 A kind of Web application performance test flows based on daily record describe method

Also Published As

Publication number Publication date
CN112486738B (en) 2022-04-26
CN112486738A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
US9959015B2 (en) Systems and methods for monitoring and analyzing performance in a computer system with node pinning for concurrent comparison of nodes
US10205643B2 (en) Systems and methods for monitoring and analyzing performance in a computer system with severity-state sorting
US10469344B2 (en) Systems and methods for monitoring and analyzing performance in a computer system with state distribution ring
US10552287B2 (en) Performance metrics for diagnosing causes of poor performing virtual machines
US10515469B2 (en) Proactive monitoring tree providing pinned performance information associated with a selected node
US8996452B2 (en) Generating a predictive model from multiple data sources
US20180365674A1 (en) Using a mixture model to generate simulated transaction information
CN104573124B (en) A kind of education cloud application statistical method based on parallelization association rule algorithm
US11036608B2 (en) Identifying differences in resource usage across different versions of a software application
US8683268B2 (en) Key based cluster log coalescing
US20110252018A1 (en) System and method for creating search index on cloud database
US10812551B1 (en) Dynamic detection of data correlations based on realtime data
CN110941554B (en) Method and device for reproducing faults
US20160140025A1 (en) Method and apparatus for producing a benchmark application for performance testing
US20170212930A1 (en) Hybrid architecture for processing graph-based queries
JP7103496B2 (en) Related score calculation system, method and program
US20110179013A1 (en) Search Log Online Analytic Processing
US11860887B2 (en) Scalable real-time analytics
WO2021047575A1 (en) Load testing method and apparatus, and electronic device and computer-readable storage medium
US20230177052A1 (en) Performance of sql execution sequence in production database instance
CN115269519A (en) Log detection method and device and electronic equipment
JP5156692B2 (en) Pseudo data generation device, pseudo data generation method, and computer program
CN114881521A (en) Service evaluation method, device, electronic equipment and storage medium
CN110851517A (en) Source data extraction method, device and equipment and computer storage medium
CN110750569A (en) Data extraction method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20862853

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20862853

Country of ref document: EP

Kind code of ref document: A1