CN113535552A

CN113535552A - AB experiment shunting method based on multi-pipeline matching

Info

Publication number: CN113535552A
Application number: CN202110735712.8A
Authority: CN
Inventors: 史灵
Original assignee: Hangzhou Index Technology Co ltd
Current assignee: Hangzhou Index Technology Co ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-10-22

Abstract

A flow distribution request of a current user for an AB experiment is firstly obtained, anchoring of the quantity of flow sub-buckets is carried out according to different user flows, the current distributable user flows are scattered and distributed to a preset number of flow sub-buckets, flow sub-bucket coding is carried out after Hash access is carried out on each user identification according to the user flow proportion of each target experiment version, the flow sub-bucket coding interval of the corresponding user of each target experiment version is determined, and modulus of the quantity of the AB experiment version is carried out. According to the invention, the modulus of the AB experiment version number is taken as the AB experiment version of the user route through the barrel-dividing code of each user, the method solves the problem of uneven distribution of the AB experiment version caused by Hash modulus taking of a simple UI D or I D user identifier, and the problem of uneven distribution is leveled through a multi-pipeline matching route mode.

Description

AB experiment shunting method based on multi-pipeline matching

Technical Field

The invention relates to the technical field of computer data processing application, in particular to an AB experiment shunting method based on multi-pipeline matching.

Background

The AB experiment is that two (A/B) or a plurality of (A/B/N) versions are manufactured in product application or a page or a flow, visitor groups with the same or similar components are divided from the whole user flow in the same time dimension to randomly access different experimental versions, different experimental schemes are set for different experimental versions, user experience data and service data of each group are collected, experimental index effects are observed and analyzed, product iteration is promoted through data driving, algorithm effects are verified, service output is obtained, and the like, an experimental conclusion is obtained, finally, the optimal version is analyzed and evaluated, and the experimental method is formally adopted, and is widely applied to iterative optimization of products in internet products.

The modern internet product can not quickly decide the correctness and the optimal scheme of a certain function under a huge user group, so a quick and effective AB experimental scheme plays a crucial role in the alternating optimization of the whole product, generally, two schemes are made for the same optimization target, one part of users in the same user group hit the scheme A, the other part of users hit the scheme B, data indexes such as click rate, conversion rate and the like under different schemes are counted and compared, the final scheme is decided after the data expression is confirmed to pass hypothesis test through the data expression of different schemes, the AB experiment is applied to confirm the project of the scheme result, including multi-version activity landing pages, multi-version marketing coupons and the like, and the step of shunting the user group in the AB experimental process is the key for deciding whether the AB experiment can verify the optimal index or not, the shunting of the AB experiment needs to ensure that the user traffic allocated in each experimental version meets expectations, and that the allocated user traffic meets the requirements of consistency, uniformity, and independence, so as to ensure the effectiveness of the AB experiment.

The existing shunting method of AB experiment is to carry out AB experiment after Hash number taking and modulus taking are carried out on each user identification, the user identification comprises UID (user identification) automatically distributed by a system and ID identity identification set by a user, Hash (Hash) is a Hash algorithm, input with any length is converted into output with fixed length through the Hash algorithm, the shunting method of AB experiment is limited by the generation rule of UID or ID, if UID or ID is not uniform, the version shunting of AB experiment is not uniform after Hash number taking and modulus taking, and the effectiveness of AB experiment can not be ensured.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an AB experiment shunting method which is uniform in flow and is obtained by converting a user into a barrel number and then performing AB experiment version number extraction and modulus extraction.

The technical problem to be solved by the invention is realized by adopting the following technical scheme:

an AB experiment shunting method based on multi-pipeline matching comprises the following steps:

s1, obtaining a flow distribution request of a current user for the AB experiment, wherein the flow distribution request of the current user comprises a target experiment version of a target experiment of the current user flow distribution and a user flow proportion of each target experiment version, anchoring the flow distribution barrel number according to different user flows, scattering the current distributable user flows and distributing the distributable user flows to a preset number of flow distribution barrels, wherein the anchoring of the user flow distribution barrel number is as follows:

when the user flow number is more than 1 hundred million, the flow bucket number is 1000, and when the user flow number is less than 1 hundred million, the flow bucket number is 100;

s2, according to the user flow proportion of each target experiment version, performing Hash access on each user identification, and then performing flow barrel coding to determine the flow barrel coding interval of the corresponding user of each target experiment version, wherein the determination rule of the flow barrel coding interval is as follows:

when the user flow is more than 1 hundred million, the flow bucket coding interval is 1-1000, and when the user flow is less than 1 hundred million, the flow bucket coding interval is 1-100;

and S3, according to the determined corresponding user flow barrel coding interval of each target experiment version, performing modulus extraction on the AB experiment version number through the flow barrel coding of each user in the step S2, and taking the modulus as the AB experiment version of the user route.

Preferably, the user identifier in step S2 includes a UID identifier set by the system and a user-specific ID identifier.

Preferably, in the step S2, the Hash access is performed by using an MD5 algorithm, and the access step of the MD5 algorithm is:

the method comprises the steps that firstly, for plaintext with any length, MD5 groups the plaintext, adds bits to enable the length of each group of input to be 512 bits, adds bits behind the plaintext in a way that the first added bit is 1 and the rest are 0, then expresses the length of the real plaintext by 64 bits, adds the length of the real plaintext to the plaintext with the previous added bits, the length of the plaintext at the moment is just a multiple of 512 bits, and only fills with lower 64 bits when the length of the plaintext is more than the power of 64 times of 2 and adds the bits to the end of the last group;

secondly, repeatedly processing the plaintext blocks, dividing 512-bit plaintext blocks into 16 sub-plaintext blocks, applying for 4 32-bit link variables marked as A, B, C, D, performing 4 rounds of operations on the sub-plaintext blocks and the link variables sequentially, summing the link variables and the initial link variables, and repeating the operations when the link variables are used as the input of the next plaintext block;

in the third step, 4 32-bit words are output, and the data of 4 linked variables is the MD5 data summary with 128 bits.

Preferably, the modulus extracting method of the AB experimental version number in step S3 is as follows:

configuring the number M of AB experimental versions, and then performing modulo operation on the value obtained after Hash access on each user identifier according to the number M of the AB experimental versions, wherein each user can fall into one of the 1-M AB experimental versions, namely the corresponding project version.

The invention has the advantages and positive effects that:

according to the invention, all user flows needing AB experiments are subjected to barrel coding according to a specific barrel standard, and modulus taking of AB experiment version numbers is carried out through the barrel coding of each user to serve as the AB experiment version of the user route.

Drawings

Fig. 1 is a schematic flow chart of the steps of the flow splitting method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When a component is referred to as being "connected" to another component, it can be directly connected to the other component or intervening components may also be present. When a component is referred to as being "disposed on" another component, it can be directly on the other component or intervening components may also be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The embodiments of the invention will be described in further detail below with reference to the accompanying drawings:

as shown in fig. 1, the AB experiment shunting method based on multi-channel matching according to the present invention includes the following steps:

s1, obtaining a flow distribution request of a current user for the AB experiment, wherein the flow distribution request of the current user comprises a target experiment version of a target experiment of the current user flow distribution and a user flow proportion of each target experiment version, anchoring the flow distribution barrel number according to different user flows, scattering the current distributable user flows and distributing the distributed user flows to a preset number of flow distribution barrels, wherein the anchoring of the user flow distribution barrel number is as follows:

s2, according to the user flow proportion of each target experiment version, performing Hash access on each user identification, and then performing flow barrel coding, wherein the user identification comprises a UID identification code set by a system and a user-defined ID identification code, and determining the flow barrel coding interval of the corresponding user of each target experiment version, and the determination rule of the flow barrel coding interval is as follows:

s3, according to the determined corresponding user traffic barrel coding interval of each target experiment version, performing modulus extraction of AB experiment version number through the traffic barrel coding of each user in the step S2, and taking the modulus as the AB experiment version of the user route, wherein the modulus extraction method of the AB experiment version number is as follows:

The Hash is a Hash algorithm, namely a Hash algorithm, the basic principle is that an input with any length is changed into an output with a fixed length through the Hash algorithm, a binary string after the original data is mapped is a Hash value, namely a Hash value, the conversion is a compression mapping, namely the space of the Hash value is usually much smaller than that of the input, different inputs can be hashed into the same output, in other words, the conversion is a function capable of compressing a message with any length to a message digest with a fixed length.

In this embodiment, in step S2, the Hash access is performed by using MD5 algorithm, MD5 belongs to one of Hash algorithms, for plaintext with any length, MD5 firstly groups the plaintext, adds bits to make each group input length 512 bits, adds bits after the plaintext by the method that the first added bit is 1 and the rest is 0, then represents the length of the true plaintext by 64 bits, and adds the length to the plaintext with bits added previously, at this time, the length of the plaintext is exactly a multiple of 512 bits, when the length of the plaintext is greater than 64 times of 2, only the lower 64 bits are used for filling and adding to the end of the last group, then repeatedly processes the plaintext groups, divides the plaintext group with 512 bits into 16 plaintext sub-groups, each plaintext sub-group is 32 bits, applies 4 32-bit chaining variables, which are labeled A, B, C, D, and the plaintext sub-groups perform 4 rounds of operations successively with the chaining variables, and then summing the link variable and the initial link variable, taking the link variable as the input of the next plaintext packet, repeating the operation, and finally outputting the cascade of 4 32-bit words, wherein the data of the 4 link variables is the MD5 abstract generating 128 bits.

In the specific implementation, in the AB experiment, such as the AB experiment of a multi-version activity landing page and the AB experiment of a multi-version marketing coupon, a targeted population of the experiment is selected, the amount of the multi-version activity landing page or the multi-version marketing coupon is configured, assuming that there are two models, a and B, two disjoint samples are created, the sample selection mode based on the identification (ID or UID) of the user of the targeted population is created or the sample selection mode based on a request, for the first sample, model a is used, for the second sample, model B is used, each sample in the samples is called a bucket, the invention performs bucket coding on all user flows needing the AB experiment according to a specific bucket standard, performs modulus taking on the version number of the AB experiment version number through the bucket coding of each user as the AB experiment version of the user route, and evaluating the optimal option facing the business target in the multi-version activity landing page or the multi-version marketing coupon sum. The method solves the problem of uneven distribution of AB experimental versions caused by Hash number extraction and modulo operation of simple UID or ID user identification, and the problem of uneven distribution is leveled up by a multi-pipeline matching routing mode.

It should be emphasized that the embodiments described herein are illustrative rather than restrictive, and thus the present invention is not limited to the embodiments described in the detailed description, but other embodiments derived from the technical solutions of the present invention by those skilled in the art are also within the scope of the present invention.

Claims

1. An AB experiment shunting method based on multi-pipeline matching is characterized in that: the method comprises the following steps:

s1, acquiring a flow distribution request of a current user for the AB experiment, wherein the flow distribution request of the current user comprises a target experiment version of a target experiment of the current user flow distribution and a user flow proportion of each target experiment version, anchoring the flow distribution barrel number according to different user flows, and scattering and distributing the current distributable user flows to a preset number of flow distribution barrels;

s2, according to the user flow proportion of each target experiment version, performing Hash access on each user identification, and then performing flow barrel coding to determine the flow barrel coding interval of the corresponding user of each target experiment version;

and S3, according to the determined corresponding user traffic barrel coding interval of each target experiment version, performing modulus extraction on the AB experiment version number through the traffic barrel coding of each user in the step S2, and taking the modulus as the AB experiment version of the user route.

2. The AB experiment shunting method based on multi-pipeline matching as claimed in claim 1, characterized in that: the anchoring of the user traffic bucket number in step S1 is:

when the user flow number is more than 1 hundred million, the flow bucket number is 1000, and when the user flow number is less than 1 hundred million, the flow bucket number is 100.

3. The AB experiment shunting method based on multi-pipeline matching as claimed in claim 1, characterized in that: the rule for determining the coding interval of the traffic bucket in step S2 is:

when the user flow is more than 1 hundred million, the flow bucket coding interval is 1-1000, and when the user flow is less than 1 hundred million, the flow bucket coding interval is 1-100.

4. The AB experiment shunting method based on multi-pipeline matching as claimed in claim 3, characterized in that: the user identifier in the step S2 includes the UID identifier set by the system and the user-specified ID identifier.

5. The AB experiment shunting method based on multi-pipeline matching as claimed in claim 1, characterized in that: and in the step S2, taking the Hash number by adopting an MD5 algorithm.

6. The AB experiment shunting method based on multi-pipeline matching as claimed in claim 5, characterized in that: the number taking step of the MD5 algorithm comprises the following steps:

7. The AB experiment shunting method based on multi-pipeline matching as claimed in claim 1, characterized in that: the modulus taking method of the AB experimental version number in the step S3 comprises the following steps: