CN111552649A

CN111552649A - Packet testing method and device

Info

Publication number: CN111552649A
Application number: CN202010421737.6A
Authority: CN
Inventors: 程大曦; 梁琛; 蔡天驰; 刘子奇
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-05-18
Filing date: 2020-05-18
Publication date: 2020-08-18
Anticipated expiration: 2040-05-18
Also published as: CN111552649B

Abstract

The specification discloses a packet testing method and device. The method comprises the following steps: dividing the graph structure into a plurality of sub-graphs according to the determined graph segmentation algorithm and the initial parameter value of the algorithm; determining a parameter value updating strategy according to the current division result; circularly executing the following operations until a preset circulation stop condition is met: aiming at the current division result, calculating a current loss function value corresponding to the current division result by using a preset loss function; judging whether the current loss function value meets a preset circulation stopping condition or not; if not, updating the current parameter value according to the determined parameter value updating strategy; dividing the graph structure into a plurality of subgraphs according to the updated parameter values; and after the circulation is finished, determining the user subsets corresponding to the at least two sub-graphs as object users of the random test according to the current division result.

Description

Packet testing method and device

Technical Field

The embodiment of the specification relates to the field of testing, in particular to a packet testing method and device.

Background

At present, before random testing, part of users are selected from the total number of users as users to be tested, and the users to be tested are divided into a plurality of groups with the same or similar components, so that different software or process versions are used for testing aiming at different groups, user test index values corresponding to different versions can be objectively obtained as test results, and software or process versions with better effects are analyzed and evaluated. For example, for the test index of the preference degree of the new software user to the current software interface, which is needed to be tested by the experimental group and the comparison group, the new software user of the experimental group adopts the new software interface, and the new software user of the comparison group adopts the old software interface, and the analysis of the test index value can obtain which software interface the new software user prefers.

The current way of dividing the packets may be to directly divide a plurality of packets containing the same number of users.

However, in a random test involving functions such as communication and social contact, a situation may occur in which a plurality of users in different groups interact with each other, which may result in a large error in the obtained test result. For example, a user in the experimental group shares a new software interface with a user in the control group, which may cause the user in the control group to dislike an old software interface, thereby affecting the "favorite degree of the software interface" test index in the control group, and further causing the deviation of the final test result.

Disclosure of Invention

In order to reduce as much as possible the deviation caused by user grouping in random tests relating to functions such as communication, social contact and the like, the specification discloses a grouping test method and a grouping test device, and the technical scheme is as follows:

a grouping test method initializes a user set to be divided into a graph structure in advance, wherein each user corresponds to a node in the graph, and the association between the users corresponds to an edge in the graph; the method comprises the following steps:

determining a graph segmentation algorithm and initial parameter values of the algorithm, and dividing the graph structure into a plurality of sub-graphs according to the determined initial parameter values;

determining a parameter value updating strategy according to the current division result;

circularly executing the following operations until a preset circulation stop condition is met:

aiming at the current dividing result, calculating a current loss function value corresponding to the current dividing result by using a preset loss function, wherein aiming at any dividing result, the function value of the loss function is positively correlated with the user association degree between sub-graphs and negatively correlated with the user distribution uniformity degree in each sub-graph;

judging whether the current loss function value meets a preset circulation stopping condition or not; if not, then

Updating the current parameter value according to the determined parameter value updating strategy; dividing the graph structure into a plurality of subgraphs according to the updated parameter values;

and after the circulation is finished, determining the user subsets corresponding to the at least two sub-graphs as object users of the random test according to the current division result.

A grouping test device initializes a user set to be divided into graph structures in advance, wherein each user corresponds to a node in a graph, and the association between users corresponds to an edge in the graph; the device comprises:

the initialization module is used for determining a graph segmentation algorithm and initial parameter values of the algorithm and dividing the graph structure into a plurality of sub-graphs according to the determined initial parameter values;

the strategy determining module is used for determining a parameter value updating strategy according to the current division result;

the circulation module is used for circularly executing the following operations until a preset circulation stop condition is met:

and the test object determining module is used for determining the user subsets corresponding to the at least two sub-graphs as object users for random test according to the current dividing result after the circulation is finished.

According to the technical scheme, the division result of the graph structure is evaluated by constructing the loss function, so that the graph division algorithm iterates in the direction of reducing the loss function value, the division result with the loss function value as small as possible is determined, the interaction among the groups of users and the deviation caused by the fact that the distribution of the users among the groups on the test index value is uneven are reduced, and the random test is facilitated.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present specification, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a flowchart of a graph partitioning algorithm for user grouping according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart of a packet testing method provided in an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a packet testing apparatus according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an apparatus for configuring a method according to an embodiment of the present disclosure.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present specification, the technical solutions in the embodiments of the present specification will be described in detail below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of protection.

Currently, when version iteration of software, or replacement of an interface, or optimization of an interactive process, etc. is performed to update a product, there may be multiple versions of software, interfaces, processes, etc. In order to select a final version of a product for updating, the quality of each version of the product needs to be evaluated.

Since personal evaluation is too subjective, random tests (also called a/B tests) are usually used in practical applications to test different versions of products. The random test is described below.

Firstly, before random testing, part of the users in the total number of users are selected as the users to be tested for testing, and the selected users are grouped into at least one experimental group and one control group. Meanwhile, a quantitative index for analysis and evaluation needs to be determined so as to select the product version which performs best under the index. For example, in order to analyze the influence of product characteristics on user stickiness, the user stickiness can be used as an index, and quantification is performed by counting the frequency of using products by a user so as to select the version with the best user stickiness; another example may be that the index used for analyzing the evaluation is the user's preference degree for the current interface, and the percentile score is used for quantification, and by this index, the user's favorite interface may be selected.

In this specification, a quantitative index used for analysis and evaluation in a random test is referred to as a test index, and it should be noted that the test index may be an inherent attribute of a user, each user may have a test index value before the random test, and a change in the test index value before and after the random test may be used for analysis and evaluation. Of course, the test metrics may be more than one, such as user stickiness and user retention rate. For different test indexes, the same user group can be used for testing, and different user groups can also be used for testing.

In this specification, the user is further grouped by considering the degree of uniformity of distribution of the user on the test index value within the inter-group, and the related contents are described in detail later, so that the degree of uniformity of distribution of the same user group on different test index values is likely to be different for different test indexes.

Therefore, in this specification, different user groups are used for different test indexes to perform the test.

That is, the random test method in this specification performs user grouping for a single test index, and it can be understood that, for a plurality of test indexes, user grouping is performed for each test index by using the random test method provided in this specification, and an obtained user grouping result is used for testing. The grouping of users for different test metrics may not be the same.

Specifically, when the random test is performed, the product version of the control group can be kept unchanged, and the experimental group experiences the product of the new version, wherein different experimental groups can experience the product of different new versions.

And finally, obtaining test index values of the experimental group and the control group for analyzing and evaluating the quality degrees of different versions.

For example, there are now two versions a and b of the new interface, so as to evaluate which version of the new interface is better, the divided users in the experimental group 1 experience the new interface version a, the divided users in the experimental group 2 experience the new interface version b, and the users in the control group keep the original interface unchanged.

And then, scoring the love degrees of the current interface of all the users in the experimental groups 1 and 2 and the comparison group, and solving the average values of the love degrees of the current interface of the experimental groups 1 and 2 and the comparison group as 81, 74 and 92 respectively. The favorite interface version of the user is considered as the original interface.

Another example may be that there are now two versions a and b of a new product, so that the users in the divided experimental group 1 experience the new product version a and the users in the experimental group 2 experience the new product version b, and the users in the control group keep the original product unchanged in order to evaluate which version of the new product performs better in terms of user stickiness.

And then counting the opening frequency of all users in each of the experimental groups 1 and 2 and the control group to the currently experienced product, and solving the average opening frequency of each of the experimental groups 1 and 2 and the control group to the current interface to be 5.3 times/day, 9.5 times/day and 6.9 times/day respectively. It is believed that new product version b performs better on user stickiness.

Of course, the above examples use the average value of the test indexes of the experimental group and the control group to evaluate the quality of different versions, and there may be many specific analysis and evaluation methods.

For example, after user test index values of a set of experimental group and a set of control group are obtained, it is determined that the distribution of the test index values of the two groups approximately conforms to normal distribution, and then the significance of the difference between the two groups of test index values is detected by adopting a t-test mode or other test modes, so as to determine whether the reason for the difference between the two groups of test index values is the deviation caused by user selection or other factors, or the difference caused by testing with products of different versions.

For example, the mean value of the test indexes of the experimental group is 8.1 times/day, the mean value of the test indexes of the control group is 8.2 times/day, and in the reason that the mean value difference of the two groups of test indexes is judged through a t test mode or other test modes, the proportion of the product version difference is small, namely, the advantages and disadvantages of the new version and the old version of the product on the test indexes are not too large.

It should be noted that, in the random test, when dividing the user groups, it is necessary to make the distribution of the users in each group on the test index value as uniform as possible among the groups.

For example, due to the randomness of sampling, all users in the divided experimental group 1 have high viscosity to the original product, and all users in the comparison group have low viscosity to the original product, so that the probability that the user viscosity index of the experimental group 1 to the new product version has a low value may be higher, and the probability that the user viscosity index of the comparison group to the original product has a low value is itself higher, so that the evaluation on the user viscosity index of the new product version in the experimental group 1 is inaccurate compared with the comparison group, the generated deviation is larger, and the test result is inaccurate.

When the distribution of the users in the groups among the groups on the test index values is uniform, the deviation caused by user selection or other reasons is small when the test index values of the experimental group and the comparison group acquired in the random test are analyzed and evaluated, and therefore the advantages and the disadvantages of different versions of the product on the test index can be reflected more accurately.

For example, the distribution of users in experimental group 1, experimental group 2 and control group are the same and are: the opening frequency of the original product by 50% of users is 5 times/day, and the opening frequency of the original product by 50% of users is 9 times/day. In the random test, the test index values of the experimental group 1, the experimental group 2 and the control group are compared, the generated deviation is small, and the difference between the new product versions a and b and the user preference degree of the original product can be more accurately seen.

Based on the above analysis, it can be seen that the user grouping before the random test is very important, and if the test index values of the users in the groups are not uniformly distributed when the user grouping is divided, the subsequent analysis and evaluation steps may have great difficulty, and it is difficult to estimate the influence of different versions of the product on the test index.

In the practical application of the random test, the test index value of each user to be tested can be counted in advance, and then reasonable grouping is carried out, so that the distribution of the users in the groups among the groups on the test index value is as uniform as possible.

The users to be tested can also be directly divided into several groups with the same number, and the deviation caused by user distribution or other reasons can be estimated by adopting the t test or other significance test modes during analysis and evaluation.

The users to be tested can be divided into a large number of groups, for example, 200 groups, 100 groups are all used as experimental groups, the other 100 groups are all used as control groups, and the test indexes of the large number of groups are counted for analysis, so that the deviation caused by user distribution or other reasons is reduced.

In addition to the basic random test described above, there is also a random test having a network effect. The random test with the net effect is explained below on the basis of the random test described above.

When the random test relates to functions such as communication and social contact, that is, the test index has social attributes, a situation that a plurality of users in different groups interact with each other may occur, and further, the obtained test index value may have a large error.

For example, when the test index is "the number of times that the user shares the current page", the current page versions of the users in the experimental group and the comparison group are not the same, and a certain user in the experimental group shares the current page with a certain user in the comparison group, the user in the comparison group may dislike the current page due to preferring the page version of the experimental group, so that sharing of the current page is reduced, the test index value of the user in the comparison group is low, and a large error exists.

This error is due to user interaction between packets rather than user distribution between packets. There is user interaction between the groupings even if the intra-group user distribution between the groupings is again uniform.

Aiming at the problem of deviation caused by user interaction among groups in random tests related to functions of communication, social contact and the like, the technical scheme provided by the specification is as follows: user grouping is performed using a graph partitioning algorithm to reduce bias.

In the method, a graph structure corresponding to a user set to be divided is constructed in advance, wherein each node corresponds to one user, and edges among the nodes correspond to the relationship among the users. The relationship between users may be determined according to the test requirements of the random test, specifically may be a relationship reflecting the interaction possibility between users, and of course, may also be determined with reference to the situation of the test index. The relationship can also distinguish strength, i.e. the interaction possibility, and can be reflected by the weight on the edge in the graph structure.

For example, if two users are in a friend relationship on a certain application, an edge is constructed between nodes corresponding to the two users in the graph structure. As two users with a buddy relationship may interact.

Or, when the test index is 'the number of times that the user shares the current page', if two users share any page once, the two users are considered to have a strong relationship of sharing interaction, and an edge is constructed between nodes corresponding to the two users in the graph structure. Since two users with an overly shared history are likely to interact again.

Although two users without sharing history may also interact, the relationship between two users without sharing history may not be considered due to the consideration of computational complexity.

The users to be divided can be the full amount of users or part of users randomly selected from the full amount of users.

Then, image segmentation can be performed by using any graph segmentation algorithm for the graph structure corresponding to the constructed user set to be divided, and users corresponding to a plurality of sub-graphs in the division result are used as the divided groups.

Of course, when the users to be divided are the full number of users, the user set corresponding to each sub-graph in the partial sub-graphs can be determined from the division result as the divided group; when the users to be divided are partial users, the user set corresponding to each sub-graph in all sub-graphs can be directly determined from the division result as the divided group.

The graph segmentation algorithm may specifically be a threshold-based algorithm, or a region-based algorithm, etc. The specific graph partitioning algorithm implementation example is as follows:

fig. 1 is a schematic flow chart of a graph partitioning algorithm for user grouping provided in the present specification. It should be noted that the following is only an example of a user grouping process executed by using a specific graph partitioning algorithm, and the graph partitioning algorithm in the following example is not limited to the graph partitioning algorithm in this specification.

The method comprises the steps of initializing a user set to be divided into graph structures in advance, enabling each user to correspond to one node in a graph, enabling friend relationships among the users to correspond to edges in the graph, and enabling interaction possibility among the users to correspond to the weight of the edges in the graph, wherein the weight can be any one of 0-1.

And setting an evaluation function for evaluating the division result, wherein in the division result, the fewer edges connected with nodes between the subgraphs are, the higher the value of the evaluation function is.

The method comprises the following steps: and removing edges based on the weight values of the edges, removing edges with the weight values smaller than a specific threshold value, and dividing the rest graph structure into a plurality of sub-graphs.

Step two: when the evaluation function value of the division result is not less than the threshold value, the operation is stopped, and the current division result can be determined as the divided packet. Otherwise, executing step three.

Step three: and increasing the specific threshold value, and executing the step one.

The particular threshold may be slowly increased from an initial 0.1 to 0.2, 0.3, 0.4, etc.

However, in the graph partitioning algorithm, since the evaluation function only considers the edges of the nodes connected between the subgraphs, users without friends may be separately partitioned into one subgraph, there may be a node corresponding to only one user without friends in one subgraph, and another subgraph contains many nodes. This may cause a problem of uneven distribution of test index values for users within a group between packets.

In order to solve the problem, the present specification further provides a graph partitioning algorithm-based grouping test method, which evaluates a graph partitioning result by constructing a loss function, wherein a value of the loss function is positively correlated with a user association degree between sub-graphs and negatively correlated with a user distribution uniformity degree inside each sub-graph. A lower loss function value means less user interaction between the groups and a more uniform distribution of users across the test indicator values within the groups between the groups.

According to the method, the result of graph structure segmentation is evaluated through a loss function, and further the graph segmentation algorithm is iterated in the direction of reducing the loss function value, so that the segmentation result with the loss function value as small as possible is determined. The method is applied to the random test with the network effect, can ensure that the user grouping effect is as good as possible, reduces the user interaction among groups and the deviation caused by the uneven distribution of the users among the groups on the test index value, and is convenient for the random test.

Of course, the method can also be applied to random tests without net effect.

As shown in fig. 2, which is a schematic flow chart of a packet testing method provided in this specification, a set of users to be divided is initialized to a graph structure in advance, each user corresponds to a node in the graph, and an association between users corresponds to an edge in the graph.

In addition, a loss function is also preset, the loss function is used for balancing the quality of the division result of the quantity graph segmentation algorithm, the lower the loss function value is, the better the user grouping effect is, namely, the interaction of users among groups is less, and the distribution of the users in the groups among the groups on the test index values is more uniform.

Assuming that the division result obtained by each graph partitioning algorithm is represented by x, the loss function may be set to be f (x) in this embodiment.

The graph division algorithm may be any graph division algorithm, and the embodiment of the present specification is not limited thereto. The graph partitioning algorithm may have a parameter, and there may be more than one parameter, and for convenience of description in this embodiment, one or more parameters are referred to by a parameter. The plurality of parameters may be represented in a vector manner. According to different parameter values, the division results of the graph segmentation algorithm are different.

For example, in a graph partitioning algorithm based on a k-means clustering algorithm, the value of the parameter k is different, and the number of subgraphs in the partitioning result is also different.

The graph segmentation algorithm can obtain different division results by adjusting parameters, and continuously reduces the loss function value until the loss function value meets the test requirement, specifically, the loss function value can be converged or is not greater than a preset threshold value.

Assuming that the parameter of the graph partitioning algorithm is represented by z, the partitioning result obtained by partitioning the graph structure by the graph partitioning algorithm may be set to g (z).

Since the parameter value is continuously updated, the parameter value can be set as A_i,i∈[0,n]The update policy for the parameter value i ∈ n may be set to a_i+1＝A_i+d(i)。

The steps of the packet testing method may include:

s101: determining a graph segmentation algorithm and initial parameter values of the algorithm, and dividing the graph structure into a plurality of sub-graphs according to the determined initial parameter values.

Namely, the graph partitioning algorithm and A are determined₀Dividing the graph structure into a plurality of subgraphs to obtain a division result g (A)₀)。

S102: and determining a parameter value updating strategy according to the current division result.

I.e. according to the current division result g (A)₀) Determining A_i+1＝A_i+d(i)。

Specifically, the current parameter value may be increased or decreased within a fixed range, and the direction of updating the parameter value may be determined according to a change trend of the function value of the loss function when the parameter value is increased or decreased.

Namely to A₀Increase or decrease within a fixed range according to f (g (A)₀±nΔ))-f(g(A₀) Determining d (i) or d (0), wherein n Δ is within a fixed range.

The direction of parameter value update can be specifically determined in the following two ways:

mode 1: and determining the direction of the function value of the loss function which is the fastest to drop when the parameter is the current parameter value through numerical derivation, and determining the direction as the updating direction of the parameter value.

Since the loss function is usually used to measure the quality of the partition result of the graph partitioning algorithm, and does not directly include the parameters of the graph partitioning algorithm, that is, the loss function is an implicit function of the parameters of the graph partitioning algorithm, it is necessary to determine the direction in which the function value of the loss function decreases most quickly when the parameters are the current parameter values, by means of numerical derivation instead of directly solving the first derivative.

The numerical derivation is specifically to the initial value x₀Increase or decrease in small amplitude, directly find

Estimated at f (x) isx₀The first derivative value of (a).

For example, for x₀As determined directly when Δ x is 0.01, the fixed range is ± 0.03, and 4 is obtained

Values of (3) and (3) 6 were also determined

The values of (a) are 3.3 and 3.7,

is 3.2 and 3.8, the first derivative value of f (x) at x-4 is estimated to be 3.5.

Or, for x₀＝(4，5)，Δx₁＝(0.01，0.01)，Δx₂＝(0，0.01)，Δx₃＝(-0.01，0.01)，Δx₄...Δx₉The value was determined as if it was (-0.01 ) and the fixed range was ± 0.03

Estimate f (x) 9 first derivative values at x ═ 4, 5.

It should be noted that, in the above numerical derivation, f (x) is only an example and is not a loss function.

And determining the direction in which the function value of the loss function is reduced fastest when the parameter is the current parameter value according to the derivative value obtained by deriving the value.

For example, when the parameter comprises one parameter and the derivative value is equal to 5, the value of the function of the loss function decreases in the direction in which the value of the parameter decreases most rapidly when the parameter is the current value. The decrease of the parameter value is determined as the direction of the update of the parameter value. And the magnitude of the decrease in the parameter value may be determined as 5| p | where | p | is a preset magnitude, which is a fixed value. d (0) ═ 5| p |.

And when the parameters comprise two parameters, the maximum of the 9 first derivative values is determined, for example, as Δ x₉Corresponding first derivative value, determining (-0.01 ) as the function value of the loss function, and determining the parameters as the current parametersThe direction in which the value decreases the fastest, d (0) — (0.01 | p |, -0.01| p |).

Mode 2: increasing or decreasing the initial parameter value within a fixed range to obtain at least two alternative parameter values; determining an alternative parameter value from the alternative parameter value set, so that the difference value obtained by subtracting the loss function value corresponding to the alternative parameter value from the loss function value corresponding to the initial parameter value is the largest in the loss function values corresponding to all the alternative parameter values in the alternative parameter value set; and determining the direction of the initial parameter value changing to the determined alternative parameter value as the direction of the parameter value updating.

For example, for an initial parameter value of 5, within a fixed range of ± 0.3, at least six candidate parameter values are obtained, 4.7, 4.8, 4.9, 5.1, 5.2 and 5.3. The loss function values corresponding to the six candidate parameter values are respectively calculated to be f (g (4.7)), f (g (4.8)), f (g (4.9)), f (g (5.1)), f (g (5.2)), f (g (5.3)), and one candidate parameter value x is determined from the loss function values_nSuch that f (g (5)) is subtracted from f (g (x))_n) ) the obtained difference value is the largest among the loss function values corresponding to all candidate parameter values in the set of candidate parameter values. x is the number of_nWhich may be 5.3, the parameter value increase is determined as the direction of the parameter value update. I.e., (i) ═ p |.

As another example, for the initial parameter (5, 6), x is determined based on the same method_nAnd (6, 7), the direction of parameter value update is determined by increasing both parameter values contained in the parameter by the same value. d (i) ═ p |, | p |).

It should be noted that, in the method 2, after the updating direction of the parameter value is determined according to the initial parameter value, the updating direction of the parameter value is not changed.

S103: and aiming at the current division result, calculating a current loss function value corresponding to the current division result by using a preset loss function.

That is, f (g (A) is calculated_i) Wherein A) is_iMay be A in S101₀Or A obtained by S105 skipping_i,i≠0。

Aiming at any division result, the function value of the loss function is positively correlated with the user association degree between sub-graphs and negatively correlated with the user distribution uniformity degree in each sub-graph. The user association degree between the sub-graphs can be set as u, and the user distribution uniformity degree inside each sub-graph can be set as v.

And the formula for the loss function may be:

f(g(A_i))＝t₁*u+t₂*v

wherein, t₁Weight, t, representing degree of user association between subgraphs₂And the weight representing the uniform degree of the user distribution in each subgraph. By configuring different weights, the loss function can reach a balance between the user association degree between sub-graphs and the user distribution uniformity degree in each sub-graph, so that the final division result meets the requirement of random test.

The user association degree between sub-graphs and the user distribution uniformity degree inside each sub-graph are explained below.

1. Degree of user association between subgraphs:

and reflecting the interaction compactness of the users among the groups in the user groups corresponding to the subgraphs of the division results. Obviously, the smaller the user interaction tightness between the groups, the better the grouping effect, and the smaller the function value of the loss function. Namely: the value of the loss function is positively correlated with the degree of user correlation between subgraphs.

The user association degree u between subgraphs can be determined in the following two ways:

1) and calculating the ratio of the number of edges between the sub-graphs in the current partitioning result to the number of all edges in the graph structure, and determining the ratio as the user association degree between the sub-graphs.

The specific formula may be:

wherein, W represents the node corresponding to the user, W represents the node set corresponding to all the users,

representing the number of edges connecting the user's corresponding node to nodes in other subgraphs, a_wRepresenting the number of edges connecting the user's corresponding node to other nodes.

In the above formula

Indicating that in the node set corresponding to all the users, for the node corresponding to each user, the number of edges connecting the node to the nodes in other subgraphs is accumulated, namely twice the number of edges between the subgraphs in the current partitioning result.

And ∑_w∈Wa_wIt means that in the node set corresponding to all users, for each node corresponding to the user, the number of edges connecting the node to other nodes is added up, i.e. twice the number of all edges in the graph structure.

2) And calculating the ratio of the number of nodes at two ends of the edge between the sub-graphs in the current partitioning result to the number of all nodes in the graph structure, and determining the ratio as the user association degree between the sub-graphs.

Obviously, the lower the ratio, the fewer edges between subgraphs in the partitioning result, and the lower the interaction tightness of users between groups in the corresponding user group.

2. The user distribution uniformity degree inside each sub-graph is as follows:

and in the user groups corresponding to the subgraphs reflecting the division results, the distribution uniformity degree of the users in the groups among the groups on the test index values. Obviously, the more uniform the distribution of the users on the test index values in the inter-grouping group, the better the grouping effect, and the smaller the function value of the loss function. Namely: the loss function value is inversely related to the user distribution uniformity degree inside each sub-graph;

and the degree of uniformity of the user distribution inside each sub-graph can be determined by the variance or variance estimation value of the distribution statistic of each sub-graph.

The distribution statistical quantity is specifically a test index mean value of the user corresponding to all nodes in a single sub-graph, and may be a ratio of a sum of the test index values of the user corresponding to all nodes in the single sub-graph to the number of nodes in the sub-graph.

Because the users are randomly drawn, the distribution of the test index values of the users in a single subgraph is in a normal distribution, and has a mean value and a variance.

When the mean values of the two sub-graphs, namely the test index mean values are the same, and the overlapping part of the two normal distributions is larger, the test index value distributions of the users are approximately similar.

Thus, for multiple subgraphs, the degree to which the user distribution is uniform within each subgraph is measured, which can be determined by the variance or variance estimate of the distribution statistic.

The smaller the variance is, the smaller the dispersion degree of the distribution statistics is, the closer the test index mean of each sub-graph is, and the closer the distribution of the user on the test index value is, that is, the more uniform the distribution of the user on the test index value in the inter-group is.

The smaller the variance, the more uniform the distribution, and therefore, the loss function value is positively correlated with the variance or variance estimate of the distribution statistic.

The variance of the distribution statistics may be calculated after obtaining the distribution statistics corresponding to all the subgraphs in the partitioning result. However, when there are many subgraphs in the partitioning result, it is difficult to count the distribution statistics of all the subgraphs, so for convenience of statistics and calculation, the variance v of the distribution statistics can be estimated, and the estimation method is not limited in the present specification, and the following two methods are provided in the present specification:

1) and calculating the sample variance among the distribution statistics of part of subgraphs in the current partitioning result, and determining the sample variance as the estimated value of the variance of the distribution statistics of each subgraph.

The discrete degree of the distribution statistic of each subgraph can be estimated to a certain extent only by counting partial subgraphs in the division result and calculating the sample variance. Of course, the more the number of sub-graphs counted, the more accurate the estimation of the degree of dispersion.

2) And inputting the number of users of each sub-graph in a part of sub-graphs, the sum of the test indexes of all the users of each sub-graph and the number of all the sub-graphs in the part of sub-graphs into a preset model, and determining the output of the preset model as the estimated value of the variance of the distribution statistics of each sub-graph.

In this way, through a preset overall variance estimation model, the variance of the distribution statistics of each sub-graph can be simply estimated by using some attributes corresponding to a small number of sub-graphs.

The preset model may include the following formula:

wherein v represents the variance of the mean value of the test indexes of the users corresponding to all nodes in each subgraph, k represents the number of the subgraphs, and mu_NMean, σ, representing the number of users contained in a single sub-graph between sub-graphs_NIndicating that a single sub-graph between sub-graphs contains the variance, mu, of the number of users_SMean, σ, representing the sum of the test index values for all users contained in a single subgraph between subgraphs_SThe variance, σ, representing the sum of the test index values of all users contained in a single subgraph between subgraphs_SNAnd representing the covariance of the sum of the number of users in a single sub-graph and the test index values of all the users in the single sub-graph.

Wherein, mu_N、σ_N、μ_S、σ_SAnd σ_SNThe number of users in each sub-graph of a part of sub-graphs in the current partitioning result, the sum of the test indexes of all users in each sub-graph and the number of all sub-graphs in a part of sub-graphs can be calculated.

Therefore, the input of the preset model is the number of users in each sub-graph of a part of sub-graphs in the current partitioning result, the sum of the test indexes of all users in each sub-graph and the number of all sub-graphs in the part of sub-graphs, and mu is obtained through calculation_N、σ_N、μ_S、σ_SAnd σ_SNThe variance of the distribution statistics for each subgraph can then be estimated using the above formula.

The output of the preset model is the variance of the estimated distribution statistics of each sub-graph.

S104: and judging whether the current loss function value meets a preset circulation stopping condition or not. If not, executing S105; if so, S106 is performed.

The preset cycle stop condition may be the following two cases.

1. And if the current loss function value is not larger than the preset threshold value, stopping circulation.

A threshold value can be preset according to the requirement of the random test, and when the loss function value is reduced to be not more than the preset threshold value, the current division result can already meet the requirement of the random test. Thus, the cycle may be stopped.

2. And if the loss function value corresponding to the parameter value obtained after the strategy is updated according to the parameter value updating is not less than the current loss function value, stopping circulation.

Since the parameter value updating strategy includes two kinds, the method can be used for updating the parameter value

1) For the numerical derivation strategy, after each update, a parameter value update direction is determined for the current partitioning result, and the parameter value update direction may change between two updates.

For example, a is obtained after d (0) — 5| p |, is determined₁＝A₀+ d (0), and then a-a at x according to the loss function f (g (x)), (x)₁The direction in which the loss function value decreases the fastest is determined, and this direction may change to d (1) ═ 3| p |.

And until the loss function value corresponding to the parameter value obtained after updating according to the parameter value updating strategy is equal to the current loss function value, considering that the current loss function value is the minimum value or the minimum value and cannot be reduced, and stopping circulation.

2) For the strategy of determining the direction according to the initial parameter value, since the change direction of the parameter value is determined to be constant, it may happen that the loss function value is first decreased and then increased as the parameter value is gradually changed. When the loss function value begins to increase, indicating that the loss function value corresponding to the current parameter value is already close to the minimum value or minimum value of the loss function, changing the loss function value again increases, thus stopping the loop.

Therefore, as long as it is determined that the current loss function value corresponding to the current parameter value is not greater than the preset threshold value, and the requirement of the random test is met, or is close to the minimum value or minimum value of the loss function, the loop may be stopped.

S105: updating the current parameter value according to the determined parameter value updating strategy; and dividing the graph structure into a plurality of subgraphs according to the updated parameter values, and executing S103.

I.e. calculate A_i+1＝A_i+ d (i) and according to A_i+1Obtaining a division result g (A)_i+1) For returning to execution S103, completing one loop.

S106: and determining the user subsets corresponding to the at least two sub-graphs as object users of the random test according to the current partitioning result.

After the loop is stopped, each user set corresponding to each subgraph in all subgraphs in the partitioning result can be used as a user group for random testing.

Or using a part of subgraphs, at least two subgraphs and each corresponding user set in the division result as a user group for random test.

By the embodiment of the method, combined with the requirement of random test with network effect, the loss function comprehensively considering the user association degree between the sub-graphs and the user distribution uniformity degree in each sub-graph is adopted, and the division result of the graph structure is evaluated through the loss function, so that the graph division algorithm iterates the parameter values in the direction of reducing the loss function values, and the division result with the loss function value as small as possible is determined.

The method is applied to the random test with the network effect, can ensure that the user grouping effect is as good as possible, reduces the interaction of users among groups and the deviation caused by the uneven distribution of the users among the groups on the test index value, is convenient for the random test with the network effect to be carried out, and can accurately reflect the influence caused by different versions of products.

In addition, the method embodiment also adopts a variance estimation mode to simplify the calculation operation during user grouping, can estimate the user distribution uniformity degree in each subgraph by using a small number of subgraphs, and can acquire the user grouping result more quickly.

Fig. 3 is a schematic structural diagram of a packet testing apparatus according to an embodiment of the present disclosure. The user set to be divided is initialized into a graph structure in advance, each user corresponds to one node in the graph, and the association between the users corresponds to an edge in the graph.

The packet testing device may include the following modules:

the initialization module 201: the method is used for determining a graph segmentation algorithm and initial parameter values of the algorithm, and dividing a graph structure into a plurality of sub-graphs according to the determined initial parameter values.

The policy determination module 202: the device is used for determining a parameter value updating strategy according to the current division result;

the loop module 203: the method is used for circularly executing the following operations until a preset circulation stop condition is met:

and aiming at the current division result, calculating a current loss function value corresponding to the current division result by using a preset loss function, wherein aiming at any division result, the function value of the loss function is positively correlated with the user association degree between the sub-images and negatively correlated with the user distribution uniformity degree in each sub-image.

Updating the current parameter value according to the determined parameter value updating strategy; and dividing the graph structure into a plurality of subgraphs according to the updated parameter values.

Test object determination module 204: and determining the user subsets corresponding to the at least two sub-graphs as object users of the random test according to the current partitioning result after the cycle is ended.

For detailed explanation and effects of the embodiments of the apparatus, reference may be made to the embodiments of the method described above, which are not described herein again.

The embodiments of the present specification also provide a computer device, which at least includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement one of the packet testing methods in the above-mentioned method embodiments.

Fig. 4 is a schematic diagram illustrating a more specific hardware structure of a computing device according to an embodiment of the present disclosure, where the computing device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

Embodiments of the present specification also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements one of the packet testing methods in the above-described method embodiments.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

From the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present disclosure can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described apparatus embodiments are merely illustrative, and the modules described as separate components may or may not be physically separate, and the functions of the modules may be implemented in one or more software and/or hardware when implementing the embodiments of the present disclosure. And part or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The foregoing is only a detailed description of the embodiments of the present disclosure, and it should be noted that, for those skilled in the art, many modifications and decorations can be made without departing from the principle of the embodiments of the present disclosure, and these modifications and decorations should also be regarded as protection for the embodiments of the present disclosure.

Claims

1. A grouping test method initializes a user set to be divided into a graph structure in advance, wherein each user corresponds to a node in the graph, and the association between the users corresponds to an edge in the graph; the method comprises the following steps:

2. The method of claim 1, wherein determining a parameter value update policy according to the current partitioning result comprises:

and increasing and decreasing the current parameter value within a fixed range, and determining the updating direction of the parameter value according to the change trend of the function value of the loss function when the parameter value is increased or decreased.

3. The method of claim 2, wherein determining the direction of the parameter value update according to the variation trend of the function value of the loss function when increasing or decreasing comprises:

determining the direction of the function value of the loss function which is the fastest to drop when the parameter is the current parameter value through numerical derivation, and determining the direction as the direction of updating the parameter value; or

Increasing or decreasing the initial parameter value within a fixed range to obtain at least two alternative parameter values; determining an alternative parameter value from the alternative parameter value set, so that the difference value obtained by subtracting the loss function value corresponding to the alternative parameter value from the loss function value corresponding to the initial parameter value is the largest in the loss function values corresponding to all the alternative parameter values in the alternative parameter value set; and determining the direction of the initial parameter value changing to the determined alternative parameter value as the direction of the parameter value updating.

4. The method of any one of claims 1 to 3, the preset loop stop condition, comprising:

if the current loss function value is not larger than a preset threshold value, stopping circulation; or

And if the loss function value corresponding to the parameter value obtained after updating according to the updating strategy is not less than the current loss function value, stopping circulation.

5. The method of claim 1, wherein the degree of uniformity of the distribution of users within each sub-graph is determined according to the variance or variance estimation of the distribution statistics of each sub-graph;

wherein the distribution statistics of the subgraphs are: all nodes in the subgraph correspond to the average value of the test indexes of the users; the test index is predetermined according to a test requirement.

6. The method of claim 5, the estimate of the variance of the distribution statistics for each subgraph determined by:

calculating the sample variance between the distribution statistics of partial subgraphs in the current division result, and determining the sample variance as the estimated value of the variance of the distribution statistics of each subgraph, or

And inputting the number of sub-graph users of a part of sub-graphs, the sum of the test index values of all the users of each sub-graph and the number of all the sub-graphs in the part of sub-graphs into a preset model, and determining the output of the preset model as the estimated value of the variance of the distribution statistics of each sub-graph.

7. The method of claim 1, wherein the degree of user association between subgraphs is determined by:

calculating the ratio of the number of edges between sub-graphs in the current partitioning result to the number of all edges in the graph structure, and determining the ratio as the user association degree between the sub-graphs;

or

And calculating the ratio of the number of nodes at two ends of the edge between the sub-graphs in the current partitioning result to the number of all nodes in the graph structure, and determining the ratio as the user association degree between the sub-graphs.

8. A grouping test device initializes a user set to be divided into graph structures in advance, wherein each user corresponds to a node in a graph, and the association between users corresponds to an edge in the graph; the device comprises:

9. The apparatus of claim 8, the policy determination module to be specifically configured to:

10. The apparatus of claim 9, the policy determination module being specifically configured to:

11. The apparatus of any one of claims 8 to 10, the preset loop stop condition comprising:

12. The apparatus of claim 8, wherein the degree of uniformity of the distribution of users within each sub-graph is determined according to a variance or variance estimate of the distribution statistics of each sub-graph;

13. The apparatus of claim 12, the estimate of the variance of the distribution statistics for each sub-graph is determined by:

14. The apparatus of claim 8, wherein the degree of inter-sub-graph user association is determined by:

or

15. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when executing the program.