CN118212033A - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN118212033A
CN118212033A CN202410339365.0A CN202410339365A CN118212033A CN 118212033 A CN118212033 A CN 118212033A CN 202410339365 A CN202410339365 A CN 202410339365A CN 118212033 A CN118212033 A CN 118212033A
Authority
CN
China
Prior art keywords
product
experimental
control
index data
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410339365.0A
Other languages
Chinese (zh)
Inventor
王瑛
刘刚
朱弘哲
王轶凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202410339365.0A priority Critical patent/CN118212033A/en
Publication of CN118212033A publication Critical patent/CN118212033A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure provides a data processing method, a device, equipment and a storage medium, relates to the technical field of computers, and particularly relates to the technical fields of Internet, communication and data processing. The specific implementation scheme is as follows: grouping product users of the product to be inspected to obtain at least two product user groups; based on the observation indexes, respectively carrying out time dimension observation on the experimental products corresponding to the products to be inspected and the control products corresponding to the products to be inspected by the product user group to obtain experimental index data of the experimental products by the product user group and control index data of the control products by the product user group; according to the experimental index data and the control index data, respectively determining a first test result and a second test result between the experimental product and the control product; and determining whether the product to be inspected has a long-term effect according to the first inspection result and the second inspection result. Through the technical scheme, the long-term effect test efficiency of the online product experiment can be improved.

Description

Data processing method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the technical fields of internet, communications, and data processing.
Background
The online experiment is a gold standard for evaluating the experience of a product user, so that the rapid iteration of the product is promoted. In general, the duration of online experiments is limited, and the observed index changes are not always stable, sometimes showing patterns of increase or decrease over time. Such as a novelty effect, which means that the product user has strong curiosity on the product change in a short period and the index returns to the previous level in a long period; as another primary effect, a process that the product user gradually adapts to the experimental change of the product is described, namely, the experimental index is gradually increased and stabilized in a long term.
The prediction error and the prediction difficulty of the long-term behavior of the product user are large, so that the experience of the product user is estimated most accurately based on the real long-term experimental observation data. But the cost of online experiments is generally high, and the characteristic of rapid iteration of the internet also determines that long-term observation of each experiment is not possible. Therefore, there is a need for an efficient and accurate on-line experimental long-term effect verification method for product renewal.
Disclosure of Invention
The present disclosure provides a data processing method, apparatus, device, and storage medium.
According to an aspect of the present disclosure, there is provided a data processing method, the method including:
Grouping product users of the product to be inspected to obtain at least two product user groups;
based on the observation indexes, respectively carrying out time dimension observation on the experimental product corresponding to the product to be inspected and the control product corresponding to the product to be inspected by the product user group to obtain experimental index data of the experimental product by the product user group and control index data of the control product by the product user group;
According to the experimental index data and the control index data, respectively determining a first test result and a second test result between the experimental product and the control product;
And determining whether the product to be inspected has a long-term effect according to the first inspection result and the second inspection result.
According to another aspect of the present disclosure, there is provided a data processing apparatus comprising:
The user group determining module is used for grouping the product users of the product to be tested to obtain at least two product user groups;
The index data determining module is used for respectively carrying out time dimension observation on the experimental product corresponding to the product to be inspected and the control product corresponding to the product to be inspected by the product using party group based on the observation index to obtain experimental index data of the experimental product by the product using party group and control index data of the control product by the product using party group;
The test result determining module is used for respectively determining a first test result and a second test result between the test product and the control product according to the test index data and the control index data;
and the long-term effect checking module is used for determining whether the product to be checked has a long-term effect according to the first checking result and the second checking result.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method of any one embodiment of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the data processing method according to any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a data processing method as described in any of the embodiments of the present disclosure.
According to the technology disclosed by the disclosure, the long-term effect test efficiency of the online product test can be improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of a data processing method provided in accordance with an embodiment of the present disclosure;
FIG. 2 is a flow chart of another data processing method provided in accordance with an embodiment of the present disclosure;
FIG. 3 is a flow chart of yet another data processing method provided in accordance with an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a data processing apparatus provided according to an embodiment of the present disclosure;
fig. 5 is a block diagram of an electronic device for implementing a data processing method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In addition, it should be noted that, in the technical scheme of the invention, the related data such as related data and index data of the product to be inspected are collected, stored, used, processed, transmitted, provided, disclosed and the like, which are all in accordance with the regulations of related laws and regulations, and do not violate the popular regulations of the public order.
The solutions for the long-term effect of the online experiment in the related art mainly comprise the following three types: 1) Complicated experimental design: the long-term effect of the experiment is observed by the methods of design, CCD (Cookie-Cookie-Day) design, PP (Post-Period Learning) effect design and the like. The design is reserved, namely after the short-term observation experiment is pushed to the whole, small batches of users, such as 10%, are left, original contrast is kept, long-term experiment operation is carried out, and therefore long-term effects of the experiment are observed. The CCD design comprises 3 groups, namely a long-term experiment group, a control group and a CCD experiment group, wherein the CCD experiment group randomly divides a user sample into a plurality of parts, one part is extracted every day and is influenced by a strategy, and the rest time is not influenced by the strategy any more. PP design, after experimental run time t, the experiment was stopped, the difference between the users of the experimental group and the control group was measured during the period t+1, and the experimental effect during the period t+1 was rapidly leveled if there was no long-term effect. 2) The substitution index method comprises the following steps: when the experimental duration is limited and the experimental effect of the long-term polar star index cannot be observed (such as 14 days of retention and 30 days of retention), a short-term agent index capable of representing the long-term effect can be selected. The change of the short-term agent index can be directly focused during the optimization and the evaluation of the effect of the strategy, and the short-term agent index can be used as the basis of whether the experimental strategy has obvious influence on the North Star index. 3) The model predicts the long-term behavior of the user: and constructing a complex machine learning model such as a Life Time Value (LTV) prediction model, a survival analysis model, a Markov model and the like according to the historical behavior data of the user and the behavior data observed in a short period of the experiment, so as to predict the long-term behavior index of the user.
However, the three solutions above have the following drawbacks: 1) The problem of complicated experimental design is mainly that the experimental design cost is large, and the problem of long-term observation is unavoidable. The design is set aside to be carried out only for a few experiments, and long-term observation is set aside after all experiments cannot be pushed; the CCD design has higher requirements on user samples, and the experimental scheme is complex, and a new experimental group is needed every day; the PP design needs short-term observation and then offline experimental observation, the experimental scheme is complex, and the experimental difference value of how long after offline cannot be quantified is zero, so that no long-term effect is shown. 2) The main disadvantage of the alternative index method is that it is not necessarily possible to find alternative indexes that are consistent with the long-term index performance. This approach relies mainly on correlations between the indices, and does not take care of causality, which is highly complex and data demanding if causal modeling is performed. And the proxy index obtained by modeling may not be enough to represent the long-term effect index, so that the problem that the proxy index is obviously improved, but the long-term index is not obviously improved easily occurs. 3) The model directly predicts the long-term behavior of the user and mainly has the problems of large prediction difficulty, large feature selection difficulty and over ideal model assumption. Wherein, the prediction difficulty is big: the average processing effect (AVERAGE TREATMENT EFFEC, ATE) needs to be calculated again for the direct prediction of each individual effect index, so that the prediction model needs to have stronger precision, and the characteristics are depended on the selection and the training of the model, such as serious long tail and zero expansion problems of some income indexes Y. The difficulty of feature selection is large: confounding factors need to be avoided and it is difficult to introduce as many appropriate proxy variables as possible. The model assumes that it is too ideal: relying on only historical data and short-term experimental data requires homogeneity of the model data and long-term experimental data, which is a high requirement that the strategy may affect the mind of the user, resulting in a change in the correlation between the historical and short-term and long-term performance metrics.
Fig. 1 is a flowchart of a data processing method provided according to an embodiment of the present disclosure. The method is suitable for the condition of how to perform on-line experiment long-term effect inspection on the internet product updating experiment. The method may be performed by a data processing apparatus, which may be implemented in software and/or hardware, and may be integrated in an electronic device, such as a server, carrying data processing functions. As shown in fig. 1, the data processing method of the present embodiment may include:
s101, grouping product users of products to be inspected to obtain at least two product user groups.
In this embodiment, the product to be inspected refers to a product for which an update effect is estimated after the product is updated; alternatively, the product to be inspected is an internet application for searching or browsing. By product consumer is meant the party that is using the product to be inspected. The product using party group is composed of product using parties, and comprises at least one product using party.
Alternatively, the product users may be grouped based on attribute information of the product users of the product to be inspected, resulting in at least two product user groups. The attribute information may include information such as a region to which the product user belongs, and information of the user device.
S102, based on the observation indexes, respectively carrying out time dimension observation on the experimental product corresponding to the product to be inspected and the control product corresponding to the product to be inspected by the product using party group to obtain experimental index data of the experimental product by the product using party group and control index data of the control product by the product using party group.
In this embodiment, the observation index is an index for judging whether the product to be inspected has a long-term effect after being updated; alternatively, the observation index may include a single Page View (PV), an independent visitor (Unique Visitor, UV), a browsing duration, an advertisement click-through amount, revenue, and the like. The experimental product is a product updated by adopting an updating mode to the product to be inspected. The reference product is a product which is not updated with the product to be inspected. The experimental index data refers to index data corresponding to an observation index in the use process of the experimental product by the product using party group. The reference index data refers to index data corresponding to an observation index in the process of using the reference product by the product using party group.
Specifically, for each product using party group, the product using party group performs time dimension observation on the experimental product corresponding to the product to be inspected and the control product corresponding to the product to be inspected, namely, performs n-day online experimental observation, so as to obtain experimental index data of the product using party group to the experimental product every day and control index data of the product using party group to the control product every day. Wherein n is a natural number greater than 1.
S103, respectively determining a first test result and a second test result between the experimental product and the control product according to the experimental index data and the control index data.
In this embodiment, the first test result refers to a difference result between an experimental product and a control product, which is directly obtained based on experimental index data and control index data, and includes significant and non-significant, wherein the significant difference is significant, and the non-significant difference is not significant. The second test result is a difference result between the test product and the control product obtained based on the data obtained by processing the test index data and the control index data.
Alternatively, the index relative difference between the experimental index data and the control index data corresponding to each product user every day may be calculated, and the first test result between the experimental product and the control product is determined according to the index relative difference, for example, if the index relative difference of the set number is greater than the set value, the difference between the experimental product and the control product is determined to be significant, otherwise, the difference between the experimental product and the control product is determined to be not significant. Wherein the set number and set value can be set by a person skilled in the art according to actual service requirements.
Alternatively, the experiment index data and the comparison index data may be subjected to a daily level normalization process to obtain normalized usage index data and normalized comparison index data, so as to calculate an index relative difference value between the normalized experiment index data and the normalized comparison index data corresponding to each product user every day, and determine a second test result between the experiment product and the comparison product according to the index relative difference value, for example, if the index relative difference value of the set number is greater than a set value, then it is determined that the difference between the experiment product and the comparison product is significant, otherwise, the difference between the experiment product and the comparison product is not significant. Wherein the set number and set value can be set by a person skilled in the art according to actual service requirements.
S104, determining whether the product to be inspected has a long-term effect according to the first inspection result and the second inspection result.
In this embodiment, the long-term effect means that the product using party is stable for long-term use after updating the product to be tested.
Alternatively, the first test result and the second test result may be combined to determine whether the product to be tested has a long-term effect; for example, if the first test result and the second test result are both significant, then the product to be tested is determined to have a long-term effect, otherwise the product to be tested does not have a long-term effect.
Further, if the product to be inspected has a long-term effect, further performing long-term observation on an experimental product corresponding to the product to be inspected.
According to the technical scheme, the product users of the product to be tested are grouped to obtain at least two product user groups, then based on the observation indexes, the time dimension observation is respectively carried out on the experimental product corresponding to the product to be tested and the comparison product corresponding to the product to be tested on the basis of the observation indexes, the experimental index data of the experimental product by the product user groups and the comparison index data of the comparison product by the product user groups are obtained, further, according to the experimental index data and the comparison index data, the first test result and the second test result between the experimental product and the comparison product are respectively determined, and whether the product to be tested has a long-term effect or not is determined according to the first test result and the second test result. Compared with a complex experimental design method, the method has the advantages that experiments with long-term effects can be accurately identified through combination of multiple detection results, so that long-term observation can be carried out on the experiments, and the cost can be greatly saved; compared with the index replacement method, the method adopts a real online experiment to observe and calculate the experimental effect, namely index data, and is consistent with the full pushing effect on the premise of random sampling; compared with the model for predicting the long-term behavior of the user, the method measures the long-term effect through the real online experimental effect, and the long-term effect is not predicted, so that the problem of model prediction precision loss is avoided; in conclusion, the technical scheme of the present disclosure can improve the efficiency of testing the long-term effect of the online experiment after the product is updated.
On the basis of the above embodiment, as an optional manner of the present disclosure, product users of a product to be inspected are grouped to obtain at least two product user groups, including: carrying out random coding on a product user to obtain user coding information; and carrying out barrel mapping on the user coding information to obtain at least two product user groups.
The user coding information refers to coding information obtained after coding the product user.
Specifically, attribute information of a product user can be randomly encoded to obtain user encoding information, then barrel mapping is performed on the product user based on the user encoding information, namely, a random number is generated as a barrel identifier based on the user encoding information, and the product user is distributed into a barrel corresponding to the barrel identifier, namely, a product user group.
It can be understood that the product users are divided into barrels, so that mutual independence among product users can be ensured, the basic condition that samples are mutually independent is met, and a foundation is laid for determining the subsequent first test result and second test result; meanwhile, the product using formula group is adopted for observation experiments, so that the sample size is more sufficient, and the inspection result is more accurate.
On the basis of the above-described embodiments, as an alternative manner of the present disclosure, determining whether a product to be inspected has a long-term effect according to the first inspection result and the second inspection result includes: if the first inspection result is not obvious, determining that the product to be inspected does not have a long-term effect; and if the first inspection result is obvious, determining whether the product to be inspected has a long-term effect according to the second inspection result.
Specifically, if the first inspection result is not significant, it is determined that the product to be inspected does not have a long-term effect. If the first test result is significant, combining with the second test result to further determine whether the product to be tested has a long-term effect, e.g., if the second test result is significant in the difference between the experimental product and the control product for the first n days, determining that the product to be tested has a long-term effect; if the difference between the experimental product and the control product is not obvious on the first n days, judging whether the second inspection result on the nth day or the second inspection result on the set days (such as the last three days) before the nth day is obvious, and if so, determining that the product to be inspected has a long-term effect.
It is understood that by combining the first inspection result and the second inspection result to determine whether the product to be inspected has a long-term effect, the reliability of inspection can be improved.
Fig. 2 is a flow chart of another data processing method provided in accordance with an embodiment of the present disclosure. This example provides an alternative embodiment based on the above examples for further optimizing the "determining the first test result and the second test result between the test product and the control product, respectively, based on the test index data and the control index data. As shown in fig. 2, the data processing method of the present embodiment may include:
S201, grouping product users of the product to be inspected to obtain at least two product user groups.
S202, based on the observation indexes, respectively carrying out time dimension observation on the experimental products corresponding to the products to be inspected and the comparison products corresponding to the products to be inspected by the product using party group, and obtaining experimental index data of the experimental products by the product using party group and comparison index data of the comparison products by the product using party group.
S203, determining a first test result between the experimental product and the control product according to the experimental index data and the control index data.
Alternatively, an index relative difference between the experimental index data and the control index data corresponding to each product user every day may be calculated, and a first test result between the experimental product and the control product may be determined according to the index relative difference.
S204, respectively carrying out time dimension aggregation on the experimental index data and the control index data to obtain group experimental index data of the experimental product and group control index data of the control product.
In this embodiment, the group experimental index data refers to index data obtained by performing time dimension aggregation on experimental index data of a product user group for n days. The group control index data refers to index data of the n-day product user group after time dimension aggregation.
Specifically, for each product using party group, adding the experimental index data of the product using party group for n days to obtain the group experimental index data of the experimental product corresponding to the product using party group. And adding the comparison index data of the product using party groups for n days for each product using party group to obtain group comparison index data of the comparison products corresponding to the product using group.
S205, determining a second test result between the test product and the control product according to the group test index data and the group control index data.
Alternatively, a relative index difference between the set of experimental index data and the set of control index data may be calculated, and a second test result between the experimental product and the control product may be determined based on the relative index difference. For example, if the index relative differences of the set numbers are all larger than the set value, the difference between the experimental product and the control product is determined to be obvious, otherwise, the difference between the experimental product and the control product is not obvious. Wherein the set number and set value can be set by a person skilled in the art according to actual service requirements.
S206, determining whether the product to be inspected has a long-term effect according to the first inspection result and the second inspection result.
According to the technical scheme, at least two product user groups are obtained by grouping product users of the product to be tested, then based on observation indexes, time dimension observation is conducted on the experimental product corresponding to the product to be tested and the comparison product corresponding to the product to be tested by the product user groups respectively to obtain experimental index data of the experimental product by the product user groups and comparison index data of the comparison product by the product user groups, further, according to the experimental index data and the comparison index data, a first test result between the experimental product and the comparison product is determined, and time dimension aggregation is conducted on the experimental index data and the comparison index data respectively to obtain group experimental index data of the experimental product and group comparison index data of the comparison product, a second test result between the experimental product and the comparison product is determined according to the group experimental index data and the group comparison index data, and whether the product to be tested has a long-term effect is determined according to the first test result and the second test result. According to the technical scheme, whether the product to be inspected has a long-term effect or not is judged through the inspection results obtained through determination from different dimensions, and the reliability and the accuracy of inspection can be improved.
On the basis of the above embodiment, as an alternative manner of the present disclosure, determining, from the set of experimental index data and the set of control index data, a second test result between the experimental product and the control product includes: determining second index difference data between the experimental product and the control product according to the group experimental index data of the product user group and the group control index data of the product user group; determining a second t statistic according to second index difference data corresponding to the m product user groups; and carrying out statistical significance analysis on the second t statistic to obtain a second test result.
The second index difference data refers to difference data between the group experiment index data and the group control index data. Wherein m represents the number of product usage groups, and m is a natural number greater than 1.
Specifically, for each product using party group, calculating a relative difference value between the group experiment index data and the group comparison index data of the product using party group, and taking the relative difference value as second index difference data between the experiment product and the comparison product corresponding to the product using party group. The second index difference data may be determined specifically by the following formula: wherein, delta i represents the second index difference data corresponding to the ith product user group; b i represents group control index data corresponding to the ith product user group; a i represents the group experimental index data corresponding to the ith product user group.
Further, determining a second t statistic according to second index difference data corresponding to the m product user groups; and carrying out statistical significance analysis on the second t statistic to obtain a second test result. Specifically, the mean value and standard deviation of second index difference data corresponding to m product user groups are calculated, and a second t statistic is determined according to the mean value and standard deviation and the degree of freedom m-1, namelyWherein/>
According to the second t statistic, a p value can be calculated, wherein the p value is smaller than 0.05 and represents statistical significance, which indicates that significant difference exists between the experimental product and the control product, namely the difference is significant. Otherwise, the statistics are not significant, namely the difference between the experimental product and the control product is not significant.
It can be understood that the difference between the experimental product and the control product can be accurately identified through the quantity statistics, so as to accurately judge whether the product to be tested has a long-term effect.
Fig. 3 is a flow chart of yet another data processing method provided in accordance with an embodiment of the present disclosure. This example provides an alternative embodiment to further optimize the "determining a first test result between an experimental product and a control product based on experimental index data, and control index data" based on the above examples. As shown in fig. 3, the data processing method of the present embodiment may include:
S301, grouping product users of the product to be inspected to obtain at least two product user groups.
S302, based on the observation indexes, respectively carrying out time dimension observation on the experimental product corresponding to the product to be inspected and the control product corresponding to the product to be inspected by the product using party group to obtain experimental index data of the experimental product by the product using party group and control index data of the control product by the product using party group.
S303, determining first index difference data between the experimental product and the control product in n days according to experimental index data of the group of n-day m product users corresponding to the experimental product and control index data of the group of n-day m product users corresponding to the control product.
Wherein n is a natural number greater than 1; m is a natural number greater than 1. The first index difference data refers to difference data between experimental index data and control index data.
Specifically, for each product user group every day, calculating a relative difference value between experimental index data and control index data corresponding to the product user group every day, and taking the relative difference value as first index difference data. For example, it can be determined by the following formula:
Wherein i represents the number of days, and i has a value of 1,2, … … n; j represents the number of product users, and the value of j is 1,2 and … … m.
S304, determining a first test result according to the first index difference data of the 1 st day and the first index difference data of the n th day.
Alternatively, for each product user group, the difference between the first index difference data of the product user group on the 1 st day and the first index difference data of the product user group on the n th day may be calculated, so as to obtain m differences, and if the set number of differences in the m differences is greater than the set value, it is determined that the difference between the experimental product and the control product is significant, that is, the first test result is significant. Wherein the set number and the set value can be set by a person skilled in the art according to actual requirements.
In yet another alternative, the first t statistic may be determined according to the first index difference data of the m product users on day 1 and the first index difference data of the m product users on day n; and carrying out statistical significance analysis on the first t statistic to obtain a first test result.
Specifically, the first index difference data according to the m product users on day 1 are respectively: Δ 1={Δ1112,……,Δ1m }; the first index difference data of the m product users on the nth day are respectively as follows: delta n={Δn1n2,……,Δnm, calculate the first t statisticWherein,/>Is the mean of the m Delta 1 data on day 1, i.e Is the mean of the m Δ n data on day n, i.e./> Is the standard deviation of the m Δ 1 data of the first day, i.e./> Is the standard deviation of the m delta n data on day n, i.e
Thus, when the sample size m >45, the t statistic approximately follows a normal distribution
And then, according to the first t statistic, a p value can be calculated, wherein the p value is smaller than 0.05 and represents statistical significance, which indicates that the experimental relative difference between the short term and the long term is different, namely the difference between the experimental product and the control product is significant. Otherwise, the statistics are not significant, namely, the relative difference between the short-term experiment and the long-term experiment is not different, namely, the difference between the experiment product and the control product is not significant.
It can be appreciated that the quantity statistics can accurately identify the difference between the experimental product and the control product, thereby accurately judging whether the product to be inspected has a long-term effect.
And S305, respectively carrying out time dimension aggregation on the experimental index data and the control index data to obtain group experimental index data of the experimental product and group control index data of the control product.
S306, determining a second test result between the test product and the control product according to the group test index data and the group control index data.
S307, determining whether the product to be inspected has a long-term effect according to the first inspection result and the second inspection result.
According to the technical scheme provided by the embodiment of the disclosure, at least two product user groups are obtained by grouping product users of a product to be tested, then based on observation indexes, time dimension observation is respectively carried out on an experimental product corresponding to the product to be tested and a control product corresponding to the product to be tested on the basis of the product user groups, experimental index data of the experimental product by the product user groups and control index data of the control product by the product user groups are obtained, further according to the experimental index data of the experimental product corresponding to n-day m product user groups and the control index data of the control product corresponding to n-day m product user groups, first index difference data between the experimental product and the control product in n-day is determined, a first test result is determined according to the first index difference data in 1-day and the first index difference data in n-day, and time dimension polymerization is carried out on the experimental index data and the control index data respectively, and finally, whether a second test result to be tested and a second test result have a long-term effect or not is determined according to the experimental index data of the group experimental product and the control index data. According to the technical scheme, whether the product to be inspected has a long-term effect or not is judged through the first inspection result and the second inspection result which are obtained through processing from different dimensions, so that the reliability and the accuracy of inspection can be improved.
Fig. 4 is a schematic structural view of a data processing apparatus according to an embodiment of the present disclosure. The embodiment of the disclosure is suitable for the situation of how to perform online experimental long-term effect inspection on internet product updating experiments. The apparatus may be implemented in software and/or hardware and may be integrated in an electronic device carrying data processing functions, such as a server. As shown in fig. 4, the data processing apparatus 400 includes:
The user group determining module 401 is configured to group product users of a product to be inspected to obtain at least two product user groups;
The index data determining module 402 is configured to perform time dimension observation on an experimental product corresponding to a product to be tested and a control product corresponding to the product to be tested by the product using party group based on the observation index, so as to obtain experimental index data of the experimental product by the product using party group and control index data of the control product by the product using party group;
the test result determining module 403 is configured to determine a first test result and a second test result between the test product and the control product according to the test index data and the control index data, respectively;
The long-term effect checking module 404 is configured to determine whether the product to be checked has a long-term effect according to the first checking result and the second checking result.
According to the technical scheme, the product users of the product to be tested are grouped to obtain at least two product user groups, then based on the observation indexes, the time dimension observation is respectively carried out on the experimental product corresponding to the product to be tested and the comparison product corresponding to the product to be tested on the basis of the observation indexes, the experimental index data of the experimental product by the product user groups and the comparison index data of the comparison product by the product user groups are obtained, further, according to the experimental index data and the comparison index data, the first test result and the second test result between the experimental product and the comparison product are respectively determined, and whether the product to be tested has a long-term effect or not is determined according to the first test result and the second test result. Compared with a complex experimental design method, the method has the advantages that experiments with long-term effects can be accurately identified through combination of multiple detection results, so that long-term observation can be carried out on the experiments, and the cost can be greatly saved; compared with the index replacement method, the method adopts a real online experiment to observe and calculate the experimental effect, namely index data, and is consistent with the full pushing effect on the premise of random sampling; compared with the model for predicting the long-term behavior of the user, the method measures the long-term effect through the real online experimental effect, and the long-term effect is not predicted, so that the problem of model prediction precision loss is avoided; in summary, the technical scheme disclosed by the invention can improve the inspection efficiency of the long-term effect of the online experiment after the product is updated.
Further, the user group determining module 401 is specifically configured to:
carrying out random coding on a product user to obtain user coding information;
and carrying out barrel mapping on the user coding information to obtain at least two product user groups.
Further, the test result determining module 403 includes:
the first test result determining unit is used for determining a first test result between the experimental product and the control product according to the experimental index data and the control index data;
The group index data determining unit is used for respectively carrying out time dimension aggregation on the experimental index data and the control index data to obtain group experimental index data of the experimental product and group control index data of the control product;
and the second test result determining unit is used for determining a second test result between the experimental product and the control product according to the group experimental index data and the group control index data.
Further, the first test result determination unit includes:
the first difference data determining subunit is used for determining first index difference data between the experimental product and the control product in n days according to the experimental index data of the n-day m product using party groups corresponding to the experimental product and the control index data of the n-day m product using party groups corresponding to the control product; wherein n is a natural number greater than 1; m is a natural number greater than 1;
and the first test result determining subunit is used for determining a first test result according to the first index difference data of the 1 st day and the first index difference data of the n th day.
Further, the first test result determining subunit is specifically configured to:
determining a first t statistic according to the first index difference data of the m product using party groups on the 1 st day and the first index difference data of the m product using party groups on the n th day;
and carrying out statistical significance analysis on the first t statistic to obtain a first test result.
Further, the second test result determining unit is specifically configured to:
Determining second index difference data between the experimental product and the control product according to the group experimental index data of the product user group and the group control index data of the product user group;
Determining a second t statistic according to second index difference data corresponding to the m product user groups;
And carrying out statistical significance analysis on the second t statistic to obtain a second test result.
Further, the long-term effect checking module 404 is specifically configured to:
If the first inspection result is not obvious, determining that the product to be inspected does not have a long-term effect;
and if the first inspection result is obvious, determining whether the product to be inspected has a long-term effect according to the second inspection result.
Further, the product to be inspected is an internet application for searching or browsing.
Further, the experimental product is a product updated by adopting an updating mode to the product to be detected; the control product is a product that has not been updated for the product to be tested.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 5 is a block diagram of an electronic device for implementing a data processing method of an embodiment of the present disclosure. Fig. 5 illustrates a schematic block diagram of an example electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the electronic device 500 includes a computing unit 501 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic device 500 may also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in electronic device 500 are connected to I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the respective methods and processes described above, such as a data processing method. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM 502 and/or the communication unit 509. When a computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the data processing method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligent software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.
Cloud computing (cloud computing) refers to a technical system that a shared physical or virtual resource pool which is elastically extensible is accessed through a network, resources can comprise servers, operating systems, networks, software, applications, storage devices and the like, and resources can be deployed and managed in an on-demand and self-service mode. Through cloud computing technology, high-efficiency and powerful data processing capability can be provided for technical application such as artificial intelligence and blockchain, and model training.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (21)

1. A data processing method, comprising:
Grouping product users of the product to be inspected to obtain at least two product user groups;
based on the observation indexes, respectively carrying out time dimension observation on the experimental product corresponding to the product to be inspected and the control product corresponding to the product to be inspected by the product user group to obtain experimental index data of the experimental product by the product user group and control index data of the control product by the product user group;
According to the experimental index data and the control index data, respectively determining a first test result and a second test result between the experimental product and the control product;
And determining whether the product to be inspected has a long-term effect according to the first inspection result and the second inspection result.
2. The method of claim 1, wherein the grouping product users of the product to be inspected to obtain at least two product user groups comprises:
carrying out random coding on the product user to obtain user coding information;
And performing barrel mapping on the user coding information to obtain at least two product user groups.
3. The method of claim 1, wherein the determining a first test result and a second test result between the test product and the control product, respectively, from the test index data and the control index data comprises:
Determining a first test result between the experimental product and the control product according to the experimental index data and the control index data;
Respectively carrying out time dimension aggregation on the experimental index data and the control index data to obtain group experimental index data of the experimental product and group control index data of the control product;
And determining a second test result between the experimental product and the control product according to the group of experimental index data and the group of control index data.
4. A method according to claim 3, wherein said determining a first test result between the test product and the control product from the test index data and the control index data comprises:
Determining first index difference data between the experimental product and the control product in n days according to experimental index data of the experimental product corresponding to the n-day m product using party groups and control index data of the control product corresponding to the n-day m product using party groups; wherein n is a natural number greater than 1; m is a natural number greater than 1;
And determining a first test result according to the first index difference data of the 1 st day and the first index difference data of the n th day.
5. The method of claim 4, wherein the determining the first test result from the first index difference data on day 1 and the first index difference data on day n comprises:
determining a first t statistic according to the first index difference data of the m product using party groups on the 1 st day and the first index difference data of the m product using party groups on the n th day;
And carrying out statistical significance analysis on the first t statistic to obtain a first test result.
6. A method according to claim 3, wherein said determining a second test result between said test product and said control product from said set of test index data and said set of control index data comprises:
Determining second index difference data between the experimental product and a control product according to the group experimental index data of the product user group and the group control index data of the product user group;
Determining a second t statistic according to second index difference data corresponding to the m product user groups;
and carrying out statistical significance analysis on the second t statistic to obtain a second test result.
7. The method of claim 1, wherein the determining whether the product to be inspected has a long-term effect based on the first inspection result and the second inspection result comprises:
If the first inspection result is not obvious, determining that the product to be inspected does not have a long-term effect;
and if the first inspection result is obvious, determining whether the product to be inspected has a long-term effect according to the second inspection result.
8. The method according to any of claims 1-7, wherein the product to be inspected is a search or browsing internet application.
9. The method according to any one of claims 1-7, wherein an experimental product is a product updated by means of an update to the product to be inspected; the control product is a product which is not updated with the product to be tested.
10. A data processing apparatus comprising:
The user group determining module is used for grouping the product users of the product to be tested to obtain at least two product user groups;
The index data determining module is used for respectively carrying out time dimension observation on the experimental product corresponding to the product to be inspected and the control product corresponding to the product to be inspected by the product using party group based on the observation index to obtain experimental index data of the experimental product by the product using party group and control index data of the control product by the product using party group;
The test result determining module is used for respectively determining a first test result and a second test result between the test product and the control product according to the test index data and the control index data;
and the long-term effect checking module is used for determining whether the product to be checked has a long-term effect according to the first checking result and the second checking result.
11. The apparatus of claim 10, wherein the user group determination module is specifically configured to:
carrying out random coding on the product user to obtain user coding information;
And performing barrel mapping on the user coding information to obtain at least two product user groups.
12. The apparatus of claim 10, wherein the test result determination module comprises:
A first test result determining unit configured to determine a first test result between the experimental product and the control product according to the experimental index data and the control index data;
The group index data determining unit is used for respectively carrying out time dimension aggregation on the experimental index data and the control index data to obtain group experimental index data of the experimental product and group control index data of the control product;
and a second test result determining unit for determining a second test result between the experimental product and the control product according to the set of experimental index data and the set of control index data.
13. The apparatus of claim 12, wherein the first test result determination unit comprises:
The first difference data determining subunit is used for determining first index difference data between the experimental product and the control product in n days according to experimental index data of the n-day m product using party groups corresponding to the experimental product and control index data of the n-day m product using party groups corresponding to the control product; wherein n is a natural number greater than 1; m is a natural number greater than 1;
and the first test result determining subunit is used for determining a first test result according to the first index difference data of the 1 st day and the first index difference data of the n th day.
14. The apparatus of claim 13, wherein the first test result determination subunit is specifically configured to:
determining a first t statistic according to the first index difference data of the m product using party groups on the 1 st day and the first index difference data of the m product using party groups on the n th day;
And carrying out statistical significance analysis on the first t statistic to obtain a first test result.
15. The apparatus of claim 12, wherein the second test result determination unit is specifically configured to:
Determining second index difference data between the experimental product and a control product according to the group experimental index data of the product user group and the group control index data of the product user group;
Determining a second t statistic according to second index difference data corresponding to the m product user groups;
and carrying out statistical significance analysis on the second t statistic to obtain a second test result.
16. The apparatus of claim 10, wherein the long-term effect checking module is specifically configured to:
If the first inspection result is not obvious, determining that the product to be inspected does not have a long-term effect;
and if the first inspection result is obvious, determining whether the product to be inspected has a long-term effect according to the second inspection result.
17. The apparatus of any of claims 10-16, wherein the product to be inspected is a search or browsing internet application.
18. The device according to any one of claims 10-16, wherein an experimental product is a product updated with an update to the product to be inspected; the control product is a product which is not updated with the product to be tested.
19. An electronic device, comprising:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1-9.
20. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the data processing method according to any one of claims 1-9.
21. A computer program product comprising a computer program which, when executed by a processor, implements the data processing method according to any of claims 1-9.
CN202410339365.0A 2024-03-22 2024-03-22 Data processing method, device, equipment and storage medium Pending CN118212033A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410339365.0A CN118212033A (en) 2024-03-22 2024-03-22 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410339365.0A CN118212033A (en) 2024-03-22 2024-03-22 Data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN118212033A true CN118212033A (en) 2024-06-18

Family

ID=91450118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410339365.0A Pending CN118212033A (en) 2024-03-22 2024-03-22 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118212033A (en)

Similar Documents

Publication Publication Date Title
CN112560496A (en) Training method and device of semantic analysis model, electronic equipment and storage medium
US20220374678A1 (en) Method for determining pre-training model, electronic device and storage medium
CN114881129A (en) Model training method and device, electronic equipment and storage medium
JP7446359B2 (en) Traffic data prediction method, traffic data prediction device, electronic equipment, storage medium, computer program product and computer program
CN114494776A (en) Model training method, device, equipment and storage medium
CN113052063A (en) Confidence threshold selection method, device, equipment and storage medium
CN115603955B (en) Abnormal access object identification method, device, equipment and medium
CN115186738B (en) Model training method, device and storage medium
CN114792097B (en) Method and device for determining prompt vector of pre-training model and electronic equipment
CN112989797B (en) Model training and text expansion methods, devices, equipment and storage medium
CN113361621B (en) Method and device for training model
CN115601042A (en) Information identification method and device, electronic equipment and storage medium
CN114817476A (en) Language model training method and device, electronic equipment and storage medium
CN118212033A (en) Data processing method, device, equipment and storage medium
CN115840867A (en) Generation method and device of mathematical problem solving model, electronic equipment and storage medium
CN113313049A (en) Method, device, equipment, storage medium and computer program product for determining hyper-parameters
CN113807391A (en) Task model training method and device, electronic equipment and storage medium
CN116416500B (en) Image recognition model training method, image recognition device and electronic equipment
CN114816758B (en) Resource allocation method and device
CN113360798B (en) Method, device, equipment and medium for identifying flooding data
CN115983445A (en) PUE prediction method, and training method, device and equipment of PUE prediction model
CN117707899A (en) Micro-service abnormality detection method, device, equipment and storage medium
CN116342253A (en) Loan risk scoring method, device, equipment and storage medium
CN117575679A (en) User retention prediction method and device, electronic equipment and storage medium
CN113962382A (en) Training sample construction method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication