CN112631922A

CN112631922A - Flow playback data selection method, system and storage medium

Info

Publication number: CN112631922A
Application number: CN202011584777.9A
Authority: CN
Inventors: 袁丽莉; 梁北才; 杨浩文
Original assignee: Guangzhou Pinwei Software Co Ltd
Current assignee: Guangzhou Pinwei Software Co Ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-04-09

Abstract

The invention discloses a method, a system and a storage medium for selecting flow playback data, wherein the method comprises the following steps: capturing interface request data according to a preset time interval; calculating a signature value of each interface request data according to a simhash algorithm, and marking all obtained signature values as an interface data aggregate; dividing the interface data total set into k interface data subsets according to a k-means algorithm; respectively selecting n/k signature values from each interface data subset as playback data; the invention can have larger coverage area compared with the traditional playback mode of randomly selecting the interface request data under the condition of the same amount of playback data, can effectively avoid the influence on test judgment caused by small coverage area of the playback data, and when the coverage area of the method is the same as the traditional playback mode of randomly selecting the interface request data, the amount of the playback data required to be selected by the method is far less than that of the traditional playback method, and can effectively improve the playback speed so as to improve the test efficiency.

Description

Flow playback data selection method, system and storage medium

Technical Field

The invention relates to the field of software testing, in particular to a flow playback data selection method, a flow playback data selection system and a storage medium.

Background

The flow playback is a vital method for monitoring the quality of the pre-release code before the software pre-release code version is online. The existing flow playback data mainly captures the existing interface request data from the line in a random mode so as to directly play back the data.

However, randomly selecting a certain amount of interface request data may cause insufficient coverage of the played back interface request data, and in order to compensate for the insufficient coverage, it is generally necessary to select as many interface request data as possible for playing back, so as to avoid that the test result of the pre-release code is affected due to insufficient coverage of the selected interface request data. However, if the data amount of the interface request data is too large, the playback time is too long, and the playback efficiency is seriously affected.

Disclosure of Invention

The invention aims to provide a flow playback data selection method, which can have a larger coverage area compared with the traditional playback mode of randomly selecting interface request data under the condition of the same amount of playback data, can effectively avoid the influence on test judgment caused by small coverage area of the playback data, and can effectively improve the playback speed to greatly improve the test efficiency because the amount of the playback data required to be selected by the method is far smaller than that of the traditional playback method when the coverage area of the method is the same as that of the traditional playback mode of randomly selecting the interface request data.

Another objective of the present invention is to provide a flow playback data selection system, which has a larger coverage area than the playback manner of the conventional random selection interface request data under the condition of the same amount of playback data, and can effectively avoid the influence on the test judgment due to a small coverage area of the playback data, and when the coverage area of the method is the same as the playback manner of the conventional random selection interface request data, the amount of the playback data required to be selected by the method is much smaller than that of the conventional playback method, and the playback speed can be effectively increased to greatly improve the test efficiency.

A further object of the present invention is to provide a storage medium, which has a larger coverage area than the conventional playback manner of randomly selecting interface request data under the condition of the same amount of playback data, and can effectively avoid the influence on test judgment due to a small coverage area of the playback data.

In order to achieve the aim, the invention discloses a flow playback data selection method, which comprises the following steps:

s1, capturing interface request data according to a preset time interval;

s2, calculating the signature value of each interface request data according to a simhash algorithm, and marking all the obtained signature values as an interface data aggregate;

s3, dividing the interface data total set into k interface data subsets according to a k-means algorithm;

and S4, respectively selecting n/k signature values from each interface data subset as playback data, wherein n is the total number of interface request data needing to be played back.

Compared with the prior art, the signature value of each interface request data is calculated through a simhash algorithm, all the obtained signature values are marked as an interface data aggregate, the interface data aggregate is divided into k interface data subsets according to a k-means algorithm, so that the k interface data subsets have the same or similar weight, and the data in each interface data subset can be considered to have the same playback value, so that when n/k signature values are respectively selected from each interface data subset as playback data, under the condition of the same number of playback data, the method has a larger coverage compared with the traditional playback mode of randomly selecting the interface request data, the method has more representative significance and stronger reliability, and the influence on test judgment caused by the small coverage of the playback data can be effectively avoided; in addition, when the coverage of the method is the same as the playback mode of the traditional random selection interface request data, the number of the playback data required to be selected by the method is far smaller than that of the traditional playback method, so that the playback speed is effectively increased to greatly improve the test efficiency.

Preferably, the step (2) specifically includes:

s21, dividing the current interface request data into a plurality of interface words according to the parameter values and the corresponding values of the current interface request data;

s22, respectively calculating the hash value of each interface word to obtain the vector characteristic value of the current interface word, and carrying out vector combination on all vector characteristic values corresponding to the current interface request data to obtain the signature value of the current interface request data;

and S23, marking all the obtained signature values as an interface data aggregate.

Preferably, the step (22) specifically comprises:

s221, respectively calculating the hash value of each interface word to obtain a first vector characteristic value of the current interface word;

and S222, carrying out vector combination on all first vector characteristic values corresponding to the current interface request data to obtain a signature value of the current interface request data.

Preferably, the step (222) specifically includes:

s2221, weighting each first vector characteristic value to obtain a second vector characteristic value;

s2222, vector combination is carried out on all second vector characteristic values corresponding to the current interface request data, and a signature value of the current interface request data is obtained.

Preferably, the step (2222) specifically includes:

s22221, vector combination is carried out on all second vector characteristic values corresponding to the current interface request data to obtain a third vector characteristic value;

s22222, the third vector characteristic value is subjected to dimensionality reduction to obtain a signature value of the current interface request data.

Preferably, the step (3) specifically includes:

s31, randomly selecting k signature values from the interface data total set as virtual center points;

s32, dividing the interface data total set into k interface data subsets, wherein each interface data subset only comprises a virtual center point;

s33, respectively calculating the Hamming distance from all signature values except the virtual center point in the interface data total set to each virtual center point, and classifying each signature value into an interface data subset in which the virtual center point with the smallest Hamming distance from the current signature value to all virtual center points is located;

s34, calculating the central point of each interface data subset, and taking the central point as a new virtual central point of the current interface data subset;

and S35, repeatedly iterating the interface data aggregate according to the new virtual central point to obtain k converged interface data subsets.

Preferably, the flow playback data selection method is respectively executed in the pre-release code and the comparison code;

the step (4) is followed by:

s5, acquiring playback data of the pre-issued codes and playback data of the comparison codes;

s6, performing differential analysis on the playback data of the pre-issued codes and the playback data of the comparison codes;

and S7, judging whether the pre-issued codes have problems according to the analysis result.

Preferably, the value of k is between A and B, wherein A, B and k are natural numbers.

Correspondingly, the invention also discloses a flow playback data selection system, which comprises:

the data capturing module is used for capturing the interface request data according to a preset time interval;

the first processing module is used for calculating a signature value of each interface request data according to a simhash algorithm and marking all obtained signature values as an interface data aggregate;

the second processing module is used for dividing the interface data total set into k interface data subsets according to a k-means algorithm;

and the execution module is used for respectively selecting n/k signature values from each interface data subset as playback data, wherein n is the total number of the interface request data needing to be played back.

Correspondingly, the invention also discloses a storage medium for storing a computer program, and the program realizes the flow playback data selection method when being executed by a processor.

Drawings

FIG. 1 is a block flow diagram of a flow playback data selection method of the present invention;

FIG. 2 is a block flow diagram of step (2) of the flow playback data selection method of the present invention;

fig. 3 is a flow chart of step (3) in the flow playback data selection method of the present invention;

fig. 4 is a block diagram of the flow playback data selection system of the present invention.

Detailed Description

In order to explain technical contents, structural features, and objects and effects of the present invention in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.

Referring to fig. 1 to fig. 3, the present invention discloses a method for selecting flow playback data, which includes the following steps:

and S1, capturing the interface request data according to the preset time interval.

It can be understood that the interface request data captured each time is all interface request data generated at two adjacent time intervals, and after multiple data captures, a plurality of numbers of interface request data are obtained. For example, the preset time interval is 100ms, and 100 times of capturing are performed, and assuming that the number of interface request data captured each time is one hundred thousand, ten million interface request data need to be processed this time. It should be noted that the interface request data is dynamically generated, and the number of request data captured each time is not necessarily fixed, such as capturing one hundred thousand pieces for the first time, capturing fifty thousand pieces for the second time, capturing twelve thousand pieces for the third time, and the like.

And S2, calculating the signature value of each interface request data according to a simhash algorithm, and marking all the obtained signature values as an interface data aggregate.

The simhash algorithm is an algorithm for calculating a locality-sensitive hash value. The partially sensitive hash value is understood as that, assuming that two strings have certain similarity, after the hash value is calculated, the two strings still can maintain the similarity, which is called the partially sensitive hash value. The common hash value does not have the local sensitivity attribute, and the simhash algorithm is mainly applied to the deduplication of mass data in a search engine.

Because the interface request data are inconvenient to directly process and calculate, the method calculates the signature value of each interface request data by using a simhash algorithm, not only can convert each interface request data into the signature value to facilitate calculation, but also can keep the similarity between the interface request data so as to reduce the influence on the similarity between the interface request data after the interface request data are converted into the signature values.

And S3, dividing the interface data total set into k interface data subsets according to a k-means algorithm.

The k-means algorithm is a clustering analysis algorithm for iterative solution, and iterative solution is performed through k interface data subsets to locally minimize the sum of squared errors between the k interface data subsets, so that the k interface data subsets have the same weight.

Since the k interface data subsets obtained in step (3) have the same weight, at this time, n/k signature values are respectively selected from each interface data subset as playback data, and the total number of the playback data is n, it can be understood that each piece of playback data has the same weight, and the selected n pieces of playback data can theoretically cover all interface request data.

Preferably, the step (2) specifically includes:

and S21, dividing the current interface request data into a plurality of interface words according to the parameter values and the corresponding values of the current interface request data.

It is understood that, for example, if the current interface request data is http:// mapi.v. com/viss-mobile/rest/favorite/store/status + 1577329909& user _ token + 12345, the current interface request data can be divided into two interface words, i.e., time + 1577329909 and user _ token + 12345. Of course, for other interface request data, the number of interface words may also be one, three, or four, and the actual number of each interface word is derived from the parameter and the corresponding value of the current interface request data.

And S22, respectively calculating the hash value of each interface word to obtain the vector characteristic value of the current interface word, and carrying out vector combination on all vector characteristic values corresponding to the current interface request data to obtain the signature value of the current interface request data.

The step of calculating the hash value of the interface word obtains the vector characteristic value of the current interface word, wherein the vector characteristic value is a two-dimensional vector characteristic value. For example, when HASH1(timestamp is 1577329909) is 100101, HASH2(user _ token is 12345) is 101011, vector eigenvalue T1 of HASH1 is (1, 0, 0, 1, 0, 1), and vector eigenvalue T2 of HASH2 is (1, 0, 1, 0, 1).

Preferably, the step (22) specifically comprises:

s221, respectively calculating the hash value of each interface word to obtain a first vector characteristic value of the current interface word.

Corresponding to step (22), in this case, the first vector eigenvalue T1 of HASH1 in step (221) is (1, 0, 0, 1, 0, 1), and the first vector eigenvalue T2 of HASH2 is (1, 0, 1, 0, 1, 1).

Preferably, the step (222) specifically includes:

s2221, weighting each first vector characteristic value to obtain a second vector characteristic value.

Specifically, each first vector feature is weighted according to the formula W ═ HASH × WEIGHT, and when 1 is encountered, the HASH value is multiplied by the WEIGHT positively, and when 0 is encountered, the HASH value is multiplied by the WEIGHT negatively, so as to obtain second vector feature values T1 'and T2'.

Preferably, the step (2222) specifically includes:

s22221, vector combination is carried out on all second vector characteristic values corresponding to the current interface request data, and a third vector characteristic value is obtained. I.e., the third vector eigenvalue T ' ═ T1 ' + T2 '.

Specifically, for each vector value in T ', if the vector value is greater than 0, the vector value is set to 1, otherwise, the vector value is set to 0, and a signature value T of the current interface request data is obtained, where the step is to process each vector value in T ' to make the value of each vector value in T ' be 0 or 1, so as to simplify subsequent calculation.

Preferably, the step (3) specifically includes:

and S31, randomly selecting k signature values from the interface data total set as virtual center points.

Preferably, the value of k is between A and B, wherein A, B and k are natural numbers. If A is constant 10 and B is constant 50, the value of k lies within the interval [10, 50], e.g., k is constant 15, 18, 20, 25 or 30, etc. It should be noted that the values of a and B are selected according to actual requirements, and are not limited herein.

And S32, dividing the interface data total set into k interface data subsets, wherein each interface data subset only comprises one virtual center point.

And S33, respectively calculating the Hamming distance from all signature values except the virtual center point in the interface data total set to each virtual center point, and classifying each signature value into the interface data subset in which the virtual center point with the smallest Hamming distance from the current signature value to all virtual center points is located.

And S34, calculating the central point of each interface data subset, and taking the central point as a new virtual central point of the current interface data subset.

It is understood that step (32), step (33) and step (34) are repeatedly performed until all k subsets of interface data converge, at which time the division of the total set of interface data forms k subsets of interface data, and the importance of each subset of interface data can be considered to be the same or similar, i.e. each subset of interface data has the same weight.

Preferably, the flow playback data selection method is executed in the pre-release code and the comparison code respectively.

The pre-release code refers to the code to be tested, the comparison code refers to the code which can normally run after the early-stage test, and corresponding playback data can be obtained by executing the method in the pre-release code and the comparison code.

The step (4) is followed by:

s5, obtaining the playback data of the pre-issued codes and the playback data of the comparison codes.

And S6, performing differential analysis on the playback data of the pre-issued codes and the playback data of the comparison codes.

It is understood that the differential analysis refers to comparing the corresponding signature values of the pre-release code and each interface in the comparison code, such as comparing the difference between the corresponding signatures of the pre-release code and each interface in the comparison code, manually or automatically. And when the difference value is larger than the preset threshold value, the problem of the pre-issued code can be judged, and a programmer is reminded of needing to modify the pre-issued code.

Referring to fig. 4, correspondingly, the present invention also discloses a flow playback data selecting system, which includes:

the data grabbing module 10 is configured to grab the interface request data according to a preset time interval;

the first processing module 20 is configured to calculate a signature value of each interface request data according to a simhash algorithm, and mark all obtained signature values as an interface data aggregate;

the second processing module 30 is configured to divide the total set of interface data into k interface data subsets according to a k-means algorithm;

and the execution module 40 is configured to select n/k signature values from each of the interface data subsets as playback data, where n is a total number of interface request data to be played back.

With reference to fig. 1 to 4, the signature value of each interface request data is calculated by a simhash algorithm, all the obtained signature values are marked as an interface data aggregate, the interface data aggregate is divided into k interface data subsets according to a k-means algorithm, so that the k interface data subsets have the same or similar weights, and it can be considered that the data in each interface data subset have the same playback value, so that when n/k signature values are respectively selected from each interface data subset as playback data, under the condition of the same number of playback data, the method has a larger coverage compared with the traditional playback mode of randomly selecting interface request data, has more representative significance and stronger reliability, and can effectively avoid influence on test judgment due to small coverage of the playback data; in addition, when the coverage of the method is the same as the playback mode of the traditional random selection interface request data, the number of the playback data required to be selected by the method is far smaller than that of the traditional playback method, so that the playback speed is effectively increased to greatly improve the test efficiency.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the scope of the present invention, therefore, the present invention is not limited by the appended claims.

Claims

1. A method for selecting flow playback data is characterized by comprising the following steps:

capturing interface request data according to a preset time interval;

calculating a signature value of each interface request data according to a simhash algorithm, and marking all obtained signature values as an interface data aggregate;

dividing the interface data total set into k interface data subsets according to a k-means algorithm;

and respectively selecting n/k signature values from each interface data subset as playback data, wherein n is the total number of interface request data needing to be played back.

2. The method for selecting flow playback data according to claim 1, wherein the calculating a signature value of each interface request data according to a simhash algorithm and marking all obtained signature values as an interface data aggregate specifically includes:

dividing the current interface request data into a plurality of interface words according to the input parameter value and the corresponding value of the current interface request data;

respectively calculating the hash value of each interface word to obtain a vector characteristic value of the current interface word, and carrying out vector combination on all vector characteristic values corresponding to the current interface request data to obtain a signature value of the current interface request data;

all the obtained signature values are marked as an interface data aggregate.

3. The method for selecting flow playback data according to claim 2, wherein the calculating the hash value of each interface word respectively to obtain a vector feature value of a current interface word, and performing vector combination on all vector feature values corresponding to current interface request data to obtain a signature value of the current interface request data specifically includes:

respectively calculating the hash value of each interface word to obtain a first vector characteristic value of the current interface word;

and carrying out vector combination on all first vector characteristic values corresponding to the current interface request data to obtain a signature value of the current interface request data.

4. The method for selecting flow playback data according to claim 3, wherein the vector merging is performed on all first vector feature values corresponding to the current interface request data to obtain a signature value of the current interface request data, and specifically includes:

weighting each first vector characteristic value to obtain a second vector characteristic value;

and carrying out vector combination on all second vector characteristic values corresponding to the current interface request data to obtain a signature value of the current interface request data.

5. The method for selecting flow playback data according to claim 4, wherein the vector merging is performed on all second vector feature values corresponding to the current interface request data to obtain a signature value of the current interface request data, and specifically includes:

vector combination is carried out on all second vector characteristic values corresponding to the current interface request data to obtain third vector characteristic values;

and performing dimensionality reduction processing on the third vector characteristic value to obtain a signature value of the current interface request data.

6. The method for selecting flow playback data according to claim 1, wherein the dividing the aggregate of interface data into k subsets of interface data according to a k-means algorithm specifically comprises:

randomly selecting k signature values from the interface data total set as virtual center points;

dividing the interface data total set into k interface data subsets, wherein each interface data subset only comprises one virtual center point;

respectively calculating the Hamming distance from all signature values except the virtual center point in the interface data total set to each virtual center point, and classifying each signature value into an interface data subset in which the virtual center point with the smallest Hamming distance from the current signature value to all the virtual center points is located;

calculating the central point of each interface data subset, and taking the central point as a new virtual central point of the current interface data subset;

and repeatedly iterating the interface data aggregate according to the new virtual central point to obtain k converged interface data subsets.

7. The traffic playback data selection method according to claim 1, wherein the traffic playback data selection method is performed in a pre-release code and a comparison code, respectively;

the method comprises the following steps of respectively selecting n/k signature values from each interface data subset as playback data, wherein n is the total number of interface request data needing to be played back, and then:

acquiring playback data of the pre-release codes and playback data of the comparison codes;

performing differential analysis on the playback data of the pre-release codes and the playback data of the comparison codes;

and judging whether the pre-issued code has a problem or not according to the analysis result.

8. The method for selecting streaming playback data according to claim 1, wherein k has a value between a and B, where a, B, and k are natural numbers.

9. A traffic playback data selection system, comprising:

10. A storage medium for storing a computer program, characterized in that: the program is executed by a processor to realize the flow playback data selection method according to any one of claims 1 to 8.