CN113535552A - AB experiment shunting method based on multi-pipeline matching - Google Patents

AB experiment shunting method based on multi-pipeline matching Download PDF

Info

Publication number
CN113535552A
CN113535552A CN202110735712.8A CN202110735712A CN113535552A CN 113535552 A CN113535552 A CN 113535552A CN 202110735712 A CN202110735712 A CN 202110735712A CN 113535552 A CN113535552 A CN 113535552A
Authority
CN
China
Prior art keywords
user
experiment
flow
plaintext
version
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110735712.8A
Other languages
Chinese (zh)
Inventor
史灵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Index Technology Co ltd
Original Assignee
Hangzhou Index Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Index Technology Co ltd filed Critical Hangzhou Index Technology Co ltd
Priority to CN202110735712.8A priority Critical patent/CN113535552A/en
Publication of CN113535552A publication Critical patent/CN113535552A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A flow distribution request of a current user for an AB experiment is firstly obtained, anchoring of the quantity of flow sub-buckets is carried out according to different user flows, the current distributable user flows are scattered and distributed to a preset number of flow sub-buckets, flow sub-bucket coding is carried out after Hash access is carried out on each user identification according to the user flow proportion of each target experiment version, the flow sub-bucket coding interval of the corresponding user of each target experiment version is determined, and modulus of the quantity of the AB experiment version is carried out. According to the invention, the modulus of the AB experiment version number is taken as the AB experiment version of the user route through the barrel-dividing code of each user, the method solves the problem of uneven distribution of the AB experiment version caused by Hash modulus taking of a simple UI D or I D user identifier, and the problem of uneven distribution is leveled through a multi-pipeline matching route mode.

Description

AB experiment shunting method based on multi-pipeline matching
Technical Field
The invention relates to the technical field of computer data processing application, in particular to an AB experiment shunting method based on multi-pipeline matching.
Background
The AB experiment is that two (A/B) or a plurality of (A/B/N) versions are manufactured in product application or a page or a flow, visitor groups with the same or similar components are divided from the whole user flow in the same time dimension to randomly access different experimental versions, different experimental schemes are set for different experimental versions, user experience data and service data of each group are collected, experimental index effects are observed and analyzed, product iteration is promoted through data driving, algorithm effects are verified, service output is obtained, and the like, an experimental conclusion is obtained, finally, the optimal version is analyzed and evaluated, and the experimental method is formally adopted, and is widely applied to iterative optimization of products in internet products.
The modern internet product can not quickly decide the correctness and the optimal scheme of a certain function under a huge user group, so a quick and effective AB experimental scheme plays a crucial role in the alternating optimization of the whole product, generally, two schemes are made for the same optimization target, one part of users in the same user group hit the scheme A, the other part of users hit the scheme B, data indexes such as click rate, conversion rate and the like under different schemes are counted and compared, the final scheme is decided after the data expression is confirmed to pass hypothesis test through the data expression of different schemes, the AB experiment is applied to confirm the project of the scheme result, including multi-version activity landing pages, multi-version marketing coupons and the like, and the step of shunting the user group in the AB experimental process is the key for deciding whether the AB experiment can verify the optimal index or not, the shunting of the AB experiment needs to ensure that the user traffic allocated in each experimental version meets expectations, and that the allocated user traffic meets the requirements of consistency, uniformity, and independence, so as to ensure the effectiveness of the AB experiment.
The existing shunting method of AB experiment is to carry out AB experiment after Hash number taking and modulus taking are carried out on each user identification, the user identification comprises UID (user identification) automatically distributed by a system and ID identity identification set by a user, Hash (Hash) is a Hash algorithm, input with any length is converted into output with fixed length through the Hash algorithm, the shunting method of AB experiment is limited by the generation rule of UID or ID, if UID or ID is not uniform, the version shunting of AB experiment is not uniform after Hash number taking and modulus taking, and the effectiveness of AB experiment can not be ensured.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an AB experiment shunting method which is uniform in flow and is obtained by converting a user into a barrel number and then performing AB experiment version number extraction and modulus extraction.
The technical problem to be solved by the invention is realized by adopting the following technical scheme:
an AB experiment shunting method based on multi-pipeline matching comprises the following steps:
s1, obtaining a flow distribution request of a current user for the AB experiment, wherein the flow distribution request of the current user comprises a target experiment version of a target experiment of the current user flow distribution and a user flow proportion of each target experiment version, anchoring the flow distribution barrel number according to different user flows, scattering the current distributable user flows and distributing the distributable user flows to a preset number of flow distribution barrels, wherein the anchoring of the user flow distribution barrel number is as follows:
when the user flow number is more than 1 hundred million, the flow bucket number is 1000, and when the user flow number is less than 1 hundred million, the flow bucket number is 100;
s2, according to the user flow proportion of each target experiment version, performing Hash access on each user identification, and then performing flow barrel coding to determine the flow barrel coding interval of the corresponding user of each target experiment version, wherein the determination rule of the flow barrel coding interval is as follows:
when the user flow is more than 1 hundred million, the flow bucket coding interval is 1-1000, and when the user flow is less than 1 hundred million, the flow bucket coding interval is 1-100;
and S3, according to the determined corresponding user flow barrel coding interval of each target experiment version, performing modulus extraction on the AB experiment version number through the flow barrel coding of each user in the step S2, and taking the modulus as the AB experiment version of the user route.
Preferably, the user identifier in step S2 includes a UID identifier set by the system and a user-specific ID identifier.
Preferably, in the step S2, the Hash access is performed by using an MD5 algorithm, and the access step of the MD5 algorithm is:
the method comprises the steps that firstly, for plaintext with any length, MD5 groups the plaintext, adds bits to enable the length of each group of input to be 512 bits, adds bits behind the plaintext in a way that the first added bit is 1 and the rest are 0, then expresses the length of the real plaintext by 64 bits, adds the length of the real plaintext to the plaintext with the previous added bits, the length of the plaintext at the moment is just a multiple of 512 bits, and only fills with lower 64 bits when the length of the plaintext is more than the power of 64 times of 2 and adds the bits to the end of the last group;
secondly, repeatedly processing the plaintext blocks, dividing 512-bit plaintext blocks into 16 sub-plaintext blocks, applying for 4 32-bit link variables marked as A, B, C, D, performing 4 rounds of operations on the sub-plaintext blocks and the link variables sequentially, summing the link variables and the initial link variables, and repeating the operations when the link variables are used as the input of the next plaintext block;
in the third step, 4 32-bit words are output, and the data of 4 linked variables is the MD5 data summary with 128 bits.
Preferably, the modulus extracting method of the AB experimental version number in step S3 is as follows:
configuring the number M of AB experimental versions, and then performing modulo operation on the value obtained after Hash access on each user identifier according to the number M of the AB experimental versions, wherein each user can fall into one of the 1-M AB experimental versions, namely the corresponding project version.
The invention has the advantages and positive effects that:
according to the invention, all user flows needing AB experiments are subjected to barrel coding according to a specific barrel standard, and modulus taking of AB experiment version numbers is carried out through the barrel coding of each user to serve as the AB experiment version of the user route.
Drawings
Fig. 1 is a schematic flow chart of the steps of the flow splitting method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When a component is referred to as being "connected" to another component, it can be directly connected to the other component or intervening components may also be present. When a component is referred to as being "disposed on" another component, it can be directly on the other component or intervening components may also be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The embodiments of the invention will be described in further detail below with reference to the accompanying drawings:
as shown in fig. 1, the AB experiment shunting method based on multi-channel matching according to the present invention includes the following steps:
s1, obtaining a flow distribution request of a current user for the AB experiment, wherein the flow distribution request of the current user comprises a target experiment version of a target experiment of the current user flow distribution and a user flow proportion of each target experiment version, anchoring the flow distribution barrel number according to different user flows, scattering the current distributable user flows and distributing the distributed user flows to a preset number of flow distribution barrels, wherein the anchoring of the user flow distribution barrel number is as follows:
when the user flow number is more than 1 hundred million, the flow bucket number is 1000, and when the user flow number is less than 1 hundred million, the flow bucket number is 100;
s2, according to the user flow proportion of each target experiment version, performing Hash access on each user identification, and then performing flow barrel coding, wherein the user identification comprises a UID identification code set by a system and a user-defined ID identification code, and determining the flow barrel coding interval of the corresponding user of each target experiment version, and the determination rule of the flow barrel coding interval is as follows:
when the user flow is more than 1 hundred million, the flow bucket coding interval is 1-1000, and when the user flow is less than 1 hundred million, the flow bucket coding interval is 1-100;
s3, according to the determined corresponding user traffic barrel coding interval of each target experiment version, performing modulus extraction of AB experiment version number through the traffic barrel coding of each user in the step S2, and taking the modulus as the AB experiment version of the user route, wherein the modulus extraction method of the AB experiment version number is as follows:
configuring the number M of AB experimental versions, and then performing modulo operation on the value obtained after Hash access on each user identifier according to the number M of the AB experimental versions, wherein each user can fall into one of the 1-M AB experimental versions, namely the corresponding project version.
The Hash is a Hash algorithm, namely a Hash algorithm, the basic principle is that an input with any length is changed into an output with a fixed length through the Hash algorithm, a binary string after the original data is mapped is a Hash value, namely a Hash value, the conversion is a compression mapping, namely the space of the Hash value is usually much smaller than that of the input, different inputs can be hashed into the same output, in other words, the conversion is a function capable of compressing a message with any length to a message digest with a fixed length.
In this embodiment, in step S2, the Hash access is performed by using MD5 algorithm, MD5 belongs to one of Hash algorithms, for plaintext with any length, MD5 firstly groups the plaintext, adds bits to make each group input length 512 bits, adds bits after the plaintext by the method that the first added bit is 1 and the rest is 0, then represents the length of the true plaintext by 64 bits, and adds the length to the plaintext with bits added previously, at this time, the length of the plaintext is exactly a multiple of 512 bits, when the length of the plaintext is greater than 64 times of 2, only the lower 64 bits are used for filling and adding to the end of the last group, then repeatedly processes the plaintext groups, divides the plaintext group with 512 bits into 16 plaintext sub-groups, each plaintext sub-group is 32 bits, applies 4 32-bit chaining variables, which are labeled A, B, C, D, and the plaintext sub-groups perform 4 rounds of operations successively with the chaining variables, and then summing the link variable and the initial link variable, taking the link variable as the input of the next plaintext packet, repeating the operation, and finally outputting the cascade of 4 32-bit words, wherein the data of the 4 link variables is the MD5 abstract generating 128 bits.
In the specific implementation, in the AB experiment, such as the AB experiment of a multi-version activity landing page and the AB experiment of a multi-version marketing coupon, a targeted population of the experiment is selected, the amount of the multi-version activity landing page or the multi-version marketing coupon is configured, assuming that there are two models, a and B, two disjoint samples are created, the sample selection mode based on the identification (ID or UID) of the user of the targeted population is created or the sample selection mode based on a request, for the first sample, model a is used, for the second sample, model B is used, each sample in the samples is called a bucket, the invention performs bucket coding on all user flows needing the AB experiment according to a specific bucket standard, performs modulus taking on the version number of the AB experiment version number through the bucket coding of each user as the AB experiment version of the user route, and evaluating the optimal option facing the business target in the multi-version activity landing page or the multi-version marketing coupon sum. The method solves the problem of uneven distribution of AB experimental versions caused by Hash number extraction and modulo operation of simple UID or ID user identification, and the problem of uneven distribution is leveled up by a multi-pipeline matching routing mode.
It should be emphasized that the embodiments described herein are illustrative rather than restrictive, and thus the present invention is not limited to the embodiments described in the detailed description, but other embodiments derived from the technical solutions of the present invention by those skilled in the art are also within the scope of the present invention.

Claims (7)

1. An AB experiment shunting method based on multi-pipeline matching is characterized in that: the method comprises the following steps:
s1, acquiring a flow distribution request of a current user for the AB experiment, wherein the flow distribution request of the current user comprises a target experiment version of a target experiment of the current user flow distribution and a user flow proportion of each target experiment version, anchoring the flow distribution barrel number according to different user flows, and scattering and distributing the current distributable user flows to a preset number of flow distribution barrels;
s2, according to the user flow proportion of each target experiment version, performing Hash access on each user identification, and then performing flow barrel coding to determine the flow barrel coding interval of the corresponding user of each target experiment version;
and S3, according to the determined corresponding user traffic barrel coding interval of each target experiment version, performing modulus extraction on the AB experiment version number through the traffic barrel coding of each user in the step S2, and taking the modulus as the AB experiment version of the user route.
2. The AB experiment shunting method based on multi-pipeline matching as claimed in claim 1, characterized in that: the anchoring of the user traffic bucket number in step S1 is:
when the user flow number is more than 1 hundred million, the flow bucket number is 1000, and when the user flow number is less than 1 hundred million, the flow bucket number is 100.
3. The AB experiment shunting method based on multi-pipeline matching as claimed in claim 1, characterized in that: the rule for determining the coding interval of the traffic bucket in step S2 is:
when the user flow is more than 1 hundred million, the flow bucket coding interval is 1-1000, and when the user flow is less than 1 hundred million, the flow bucket coding interval is 1-100.
4. The AB experiment shunting method based on multi-pipeline matching as claimed in claim 3, characterized in that: the user identifier in the step S2 includes the UID identifier set by the system and the user-specified ID identifier.
5. The AB experiment shunting method based on multi-pipeline matching as claimed in claim 1, characterized in that: and in the step S2, taking the Hash number by adopting an MD5 algorithm.
6. The AB experiment shunting method based on multi-pipeline matching as claimed in claim 5, characterized in that: the number taking step of the MD5 algorithm comprises the following steps:
the method comprises the steps that firstly, for plaintext with any length, MD5 groups the plaintext, adds bits to enable the length of each group of input to be 512 bits, adds bits behind the plaintext in a way that the first added bit is 1 and the rest are 0, then expresses the length of the real plaintext by 64 bits, adds the length of the real plaintext to the plaintext with the previous added bits, the length of the plaintext at the moment is just a multiple of 512 bits, and only fills with lower 64 bits when the length of the plaintext is more than the power of 64 times of 2 and adds the bits to the end of the last group;
secondly, repeatedly processing the plaintext blocks, dividing 512-bit plaintext blocks into 16 sub-plaintext blocks, applying for 4 32-bit link variables marked as A, B, C, D, performing 4 rounds of operations on the sub-plaintext blocks and the link variables sequentially, summing the link variables and the initial link variables, and repeating the operations when the link variables are used as the input of the next plaintext block;
in the third step, 4 32-bit words are output, and the data of 4 linked variables is the MD5 data summary with 128 bits.
7. The AB experiment shunting method based on multi-pipeline matching as claimed in claim 1, characterized in that: the modulus taking method of the AB experimental version number in the step S3 comprises the following steps:
configuring the number M of AB experimental versions, and then performing modulo operation on the value obtained after Hash access on each user identifier according to the number M of the AB experimental versions, wherein each user can fall into one of the 1-M AB experimental versions, namely the corresponding project version.
CN202110735712.8A 2021-06-30 2021-06-30 AB experiment shunting method based on multi-pipeline matching Withdrawn CN113535552A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110735712.8A CN113535552A (en) 2021-06-30 2021-06-30 AB experiment shunting method based on multi-pipeline matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110735712.8A CN113535552A (en) 2021-06-30 2021-06-30 AB experiment shunting method based on multi-pipeline matching

Publications (1)

Publication Number Publication Date
CN113535552A true CN113535552A (en) 2021-10-22

Family

ID=78097367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110735712.8A Withdrawn CN113535552A (en) 2021-06-30 2021-06-30 AB experiment shunting method based on multi-pipeline matching

Country Status (1)

Country Link
CN (1) CN113535552A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049327A (en) * 2022-08-17 2022-09-13 阿里巴巴(中国)有限公司 Data processing method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049327A (en) * 2022-08-17 2022-09-13 阿里巴巴(中国)有限公司 Data processing method and device, electronic equipment and storage medium
CN115049327B (en) * 2022-08-17 2022-11-15 阿里巴巴(中国)有限公司 Data processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11625390B2 (en) Methods for extending a proof-of-space-time blockchain
CN106649831B (en) Data filtering method and device
CN105049287A (en) Log processing method and log processing devices
CN109325118B (en) Unbalanced sample data preprocessing method and device and computer equipment
CN104036187B (en) Method and system for determining computer virus types
CN106227881B (en) Information processing method and server
CN112199412B (en) Payment bill processing method based on block chain and block chain bill processing system
CN103067363A (en) Index conversion method for public data integrity checking
CN113535552A (en) AB experiment shunting method based on multi-pipeline matching
CN115884110B (en) Method and system for judging short message verification code
Kadianakis et al. Extrapolating network totals from hidden-service statistics
CN112613601A (en) Neural network model updating method, device and computer storage medium
CN115276969A (en) Wireless channel key generation method and device, computer equipment and storage medium
CN112463784A (en) Data deduplication method, device, equipment and computer readable storage medium
CN111629063A (en) Block chain based distributed file downloading method and electronic equipment
Arslan et al. Automatic performance analysis of cloud based load testing of web-application & its comparison with traditional load testing
CN116760528A (en) Multiparty asset delivery method and device based on multi-key homomorphic sharing
CN114692201B (en) Multi-party security calculation method and system
CN113704624B (en) Policy recommendation method, device, equipment and medium based on user distribution
KR101703880B1 (en) Forward-secure aggregate sequential signature apparatus for secure logging and method of the same
JP7160205B2 (en) Extraction device, extraction method and extraction program
CN112434231A (en) Data processing method and device and electronic equipment
CN110099117B (en) Method and device for issuing full amount of multi-version DNS zone files
KR102377535B1 (en) Anonymization of big data personal information and method of combining anonymized data
Nakayama et al. Efficient estimation of the mean hitting time to a set of a regenerative system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20211022