CN113688124A - Data interference elimination method and related device - Google Patents

Data interference elimination method and related device Download PDF

Info

Publication number
CN113688124A
CN113688124A CN202110961015.4A CN202110961015A CN113688124A CN 113688124 A CN113688124 A CN 113688124A CN 202110961015 A CN202110961015 A CN 202110961015A CN 113688124 A CN113688124 A CN 113688124A
Authority
CN
China
Prior art keywords
user
index
acquisition
difference information
acquisition index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110961015.4A
Other languages
Chinese (zh)
Other versions
CN113688124B (en
Inventor
陈友洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huya Technology Co Ltd
Original Assignee
Guangzhou Huya Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huya Technology Co Ltd filed Critical Guangzhou Huya Technology Co Ltd
Priority to CN202110961015.4A priority Critical patent/CN113688124B/en
Publication of CN113688124A publication Critical patent/CN113688124A/en
Application granted granted Critical
Publication of CN113688124B publication Critical patent/CN113688124B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3692Test management for test results analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides a data interference elimination method and a related device, wherein first acquisition index difference information and second acquisition index difference information are respectively obtained, the first acquisition index difference information represents difference information of a first user data set and a second user data set when a test strategy is not configured, and the second acquisition index difference information represents difference information of the first user data set and the second user data set when the test strategy is configured. And then, by calculating the difference value information between the difference information of the second acquisition index and the difference information of the first acquisition index, the data interference generated by the difference value information because the non-homogeneous data difference between the first user acquisition index and the second user acquisition index cannot be ensured when a test strategy is not configured is eliminated, so that the difference value information is used as a test weighing parameter, and the accuracy of data analysis is improved.

Description

Data interference elimination method and related device
Technical Field
The invention relates to the field of data analysis, in particular to a data interference elimination method and a related device.
Background
Aiming at the software products related to the Internet, the service condition of the product function and the user behavior can be more accurately analyzed so as to optimize the product. Statistical analysis of data in a product run is often required.
In the prior art, in order to analyze data, data acquisition and analysis are often performed by setting different experimental data groups. While data analysis is typically performed on a group-by-group basis, data collection is typically performed with different experimental strategies on a test-by-test basis. For example, in a certain stage, a test strategy is not issued to terminals of different groups, and the purpose of this stage is to acquire data of each terminal on the premise that the terminals of different groups are as homogeneous as possible; and then, in another stage, issuing a test strategy to terminals in different groups, introducing variables to be observed to a certain group of terminals according to the test strategy, and acquiring data of each terminal on the premise that the terminals in different groups are heterogeneous. And finally, analyzing results of the data in different homogeneous stages by taking the data in the homogeneous stages as a standard to obtain an analysis conclusion of the test.
However, in the above analysis based on different data groups, the data differences between different groups cannot be really eliminated in the homogeneous phase. Therefore, data differences in the homogeneous phase are brought in the subsequent data analysis process, and the accuracy of the analysis result is reduced.
Disclosure of Invention
The present invention provides a method and related device for eliminating data interference, which can ensure the accuracy of the final result of the test experiment by eliminating the difference between the control group and the experimental group when the test strategy is configured.
Embodiments of the invention may be implemented as follows:
in a first aspect, an embodiment of the present invention provides a data interference cancellation method, where the method includes: respectively obtaining first acquisition index difference information and second acquisition index difference information; the first acquisition index difference information represents difference information of a first user acquisition index contained in the first user data set and a second user acquisition index contained in the second user data set when a test strategy is not configured; the second acquisition index difference information represents difference information of a third user acquisition index contained in the first user data set and a fourth user acquisition index contained in the second user data set when the test strategy is configured; calculating difference value information of the second acquisition index difference information and the first acquisition index difference information; and taking the difference information as a test measurement parameter.
Optionally, the first user collection index is a first user collection index in a first historical period, and the second user collection index is a second user collection index in the first historical period; the third user acquisition index is a third user acquisition index in a second historical period, and the fourth user acquisition index is a fourth user acquisition index in the second historical period;
the step of respectively obtaining the difference information of the first acquisition index and the difference information of the second acquisition index includes:
acquiring a first user acquisition mean value in the first historical time period according to the first user acquisition index;
acquiring a second user acquisition mean value in the first historical time period according to the second user acquisition index;
taking the difference value of the first user collection mean value and the second user collection mean value as the first collection index difference information;
acquiring a third user acquisition mean value in the second historical time period according to the third user acquisition index;
acquiring a fourth user acquisition mean value in the second historical time period according to the fourth user acquisition index;
and taking the difference value between the third user collection mean value and the fourth user collection mean value as the second collection index difference information.
Optionally, before the step of obtaining the first acquisition indicator difference information and the second acquisition indicator difference information respectively, the method further includes:
respectively obtaining a first user prediction index corresponding to the first user acquisition index and a second user prediction index corresponding to the second user acquisition index through a prediction model;
determining whether the first user prediction index and the first user acquisition index, and the second user prediction index and the second user acquisition index both meet a stability condition;
if yes, executing the step of respectively obtaining first acquisition index difference information and second acquisition index difference information;
and if not, updating the parameters of the prediction model until the first user prediction index and the first user acquisition index obtained by the updated prediction model and the second user prediction index and the second user acquisition index obtained by the updated prediction model both meet the stable condition.
In a second aspect, an embodiment of the present invention provides a data interference cancellation apparatus, including: the information acquisition module is used for respectively acquiring first acquisition index difference information and second acquisition index difference information;
the first acquisition index difference information represents difference information of a first user acquisition index contained in the first user data set and a second user acquisition index contained in the second user data set when a test strategy is not configured; the second acquisition index difference information represents difference information of a third user acquisition index contained in the first user data set and a fourth user acquisition index contained in the second user data set when the test strategy is configured;
the difference value calculation module is used for calculating difference value information of the second acquisition index difference information and the first acquisition index difference information;
and the parameter acquisition module is used for taking the difference information as a test measurement parameter.
Optionally, the first user collection index is a first user collection index in a first historical period, and the second user collection index is a second user collection index in the first historical period; the third user acquisition index is a third user acquisition index in a second historical period, the fourth user acquisition index is a fourth user acquisition index in the second historical period, and the information acquisition module comprises:
the average value acquisition unit is used for acquiring a first user acquisition average value in the first historical time period according to the first user acquisition index; acquiring a second user acquisition mean value in the first historical time period according to the second user acquisition index; acquiring a third user acquisition mean value in the second historical time period according to the third user acquisition index; acquiring a fourth user acquisition mean value in the second historical time period according to the fourth user acquisition index;
a difference information obtaining unit, configured to use a difference value between the first user collection mean value and the second user collection mean value as the first collection index difference information; and taking the difference value between the third user collection mean value and the fourth user collection mean value as the second collection index difference information.
Optionally, the apparatus further comprises: a stability judgment module;
the stability judgment module includes:
the prediction index acquisition unit is used for respectively acquiring a first user prediction index corresponding to the first user acquisition index and a second user prediction index corresponding to the second user acquisition index through a prediction model;
the stability judgment unit is used for operating the information acquisition module to respectively acquire first acquisition index difference information and second acquisition index difference information when the first user prediction index and the first user acquisition index as well as the second user prediction index and the second user acquisition index meet stability conditions;
the stability judging unit is further configured to update parameters of the prediction model when it is determined that the first user prediction index and the first user collection index, and the second user prediction index and the second user collection index do not satisfy a stability condition, until both the first user prediction index and the first user collection index obtained through the updated prediction model, and the second user prediction index and the second user collection index obtained meet the stability condition.
In a third aspect, an embodiment of the present invention provides a data interference cancellation system, including: data acquisition equipment, the apparatus of any one of the preceding embodiments;
the data acquisition equipment is used for acquiring a first user data set and a second user data set when the test strategy is not configured and when the test strategy is configured; when the test strategy is not configured, the difference information of a first user acquisition index contained in the first user data set and a second user acquisition index contained in the second user data set is obtained; and when the test strategy is configured, the difference information of the third user acquisition index contained in the first user data set and the fourth user acquisition index contained in the second user data set.
In a fourth aspect, an embodiment of the present invention provides an electronic device, which includes a processor and a memory, where the memory stores a computer program, and the processor implements the method in any one of the foregoing embodiments when executing the computer program.
In a fifth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the method of any one of the foregoing embodiments.
Compared with the prior art, the invention has the following beneficial effects: by respectively obtaining the difference information of the first acquisition index and the difference information of the second acquisition index, when the test strategy is not configured, the difference information of the first acquisition index of the first user included in the first user data set and the difference information of the second acquisition index of the second user included in the second user data set are represented, and when the test strategy is configured, the difference information of the second acquisition index represents the difference information of the third acquisition index of the first user included in the first user data set and the fourth acquisition index of the second user included in the second user data set. And then, by calculating the difference value information between the difference information of the second acquisition index and the difference information of the first acquisition index, the data interference generated by the difference value information because the non-homogeneous data difference between the first user acquisition index and the second user acquisition index cannot be ensured when a test strategy is not configured is eliminated, so that the difference value information is used as a test weighing parameter, and the accuracy of data analysis is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic view of an application scenario of a data interference cancellation system according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of an electronic device according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a data interference cancellation method according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart illustrating the sub-steps of step S101 in FIG. 3;
fig. 5 is a schematic diagram of data interference cancellation according to an embodiment of the present invention;
fig. 6 is a schematic flowchart of a data interference cancellation method according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a data interference cancellation apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a data interference cancellation apparatus according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a data interference cancellation apparatus according to an embodiment of the present invention.
Icon: 10-a server; 20 a-terminal equipment; 20 b-terminal equipment; 120-a communication interface; 130-a processor; 110-a memory; 300-data interference cancellation means; 310-stability judgment module; 311-prediction index obtaining unit; 312-a stability determination unit; 320-an information acquisition module; 321-a mean value obtaining unit; 322-difference information obtaining unit; 340-difference calculation module; 360-parameter acquisition module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.
Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.
Referring to fig. 1, a schematic diagram of a network system is provided for an embodiment of the present invention, in this embodiment, the network system may include a server 10, a first user group formed by at least one terminal device 20a, and a second user group formed by at least one terminal device 20 b;
the server 10 may be configured to maintain a test policy related to statistical and analytical data, and issue the test policy to the terminals of the first user group and the second user group based on different stages; meanwhile, the server 10 may be configured to collect the related user data belonging to the first user group and the second user group respectively, so as to form a first user data set corresponding to the first user group and a second user data set corresponding to the second user group;
the server 10 may further analyze the data in the first user data set and the second user data set to obtain a related analysis result.
It should be noted that the analysis function may also be implemented by other devices, for example, the server 10 is only used for acquiring the first user data set and the second user data set, and then sending the data of the data sets to the device with the analysis function for analysis processing.
Alternatively, the network system described above may be used to provide a variety of possible services, including but not limited to: multimedia streaming services, cloud gaming, distributed storage, and the like. Taking live video as an example, the server 10 in the network system may be a server providing live video stream, and the terminal device 20a and the terminal device 20b may be installed with live video related Applications (APP). The server 10 may collect and analyze data related to the live video application on the terminal device 20a and the terminal device 20b for different analysis purposes.
The terminal device 20a and the terminal device 20b may obtain related data of the user when using the live video application, and report the data to the server 10.
Continuing with the example of live video, the dividing rules of the first user group and the second user group may be various, for example, the first user group and the second user group may be divided into groups based on ages of different users; or, the division is carried out based on the region position of the user; or, the partitioning is based on user activity; alternatively, the experimental group and the control group may be divided by the dimension of the experimental test based on the specific test requirement. In this regard, the present invention is not limited to the rule for dividing the user group.
It should be noted that the terminal device may include, but is not limited to: personal computers, notebook computers, tablet computers, mobile phones, and the like.
Referring to fig. 2, a schematic diagram of an electronic device is shown, the electronic device includes a memory 110, a communication interface 120 and a processor 130;
the memory 110, the communication interface 120, and the processor 130 are electrically connected to each other directly or indirectly to enable data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The data interference cancellation apparatus 300 includes at least one software functional module which may be stored in the memory 110 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the server 10. The processor 130 is configured to execute executable modules stored in the memory 110, such as software functional modules or computer programs included in the data interference cancellation device 300.
The Memory 110 may be, but is not limited to, a Random Access Memory 110 (RAM), a Read Only Memory 110 (ROM), a Programmable Read Only Memory 110 (PROM), an Erasable Read Only Memory 110 (EPROM), an electrically Erasable Read Only Memory 110 (EEPROM), and the like. The memory 110 is used for storing a program, and the processor 130 executes the program after receiving an execution instruction, and the method executed by the server 10 defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 130, or implemented by the processor 130.
The processor 130 may be an integrated circuit chip having signal processing capabilities. The Processor 130 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should be noted that the electronic device shown in fig. 2 may be configured to implement the server 10 or the terminal device 20a or 20b in fig. 1; when it acts as a server, it can perform the corresponding steps to achieve the corresponding technical effects; when it is a terminal device, in order to implement the corresponding functions of the terminal device, the electronic device may further include other modules, for example: radio frequency circuit, I/O interface, battery, touch screen, mic/speaker etc.. And are not limiting herein.
Further, continuing with the example of the network system shown in fig. 1, before configuring no test policy, the prior art may collect data of each terminal on the premise of being as homogeneous as possible for terminals of different groups (for example, terminals of an experimental group and a control group), in some application scenarios, this stage is also named as an AA stage, but in an actual test process, the experimental group and the control group in the stage of configuring no test policy are difficult to be completely homogeneous, and there always exists a data difference. And possibly at the stage of configuring the test strategy (this stage is also named as AB stage), the relevant data of the two groups are also heterogeneous, thereby reducing the accuracy of the final analysis result due to the unnecessary difference of the data of the two stages.
Based on the above problems, in order to eliminate the above differences and ensure the accuracy of the final result of the test experiment, an embodiment of the present invention provides a data interference elimination method, and fig. 3 is a schematic flow diagram of the data interference elimination method provided in the embodiment of the present invention. The method comprises the following steps:
step S101, respectively obtaining first acquisition index difference information and second acquisition index difference information.
Optionally, in order to eliminate the data difference, it is necessary to respectively obtain difference information between a first user acquisition index included in the first user data set represented by the first acquisition index difference information and a second user acquisition index included in the second user data set when the test policy is not configured, and difference information between a third user acquisition index included in the first user data set represented by the second acquisition index difference information and a fourth user acquisition index included in the second user data set when the test policy is configured.
For example, the first user data set may be a data set of a control group and the second user data set may be a data set of an experimental group. The user acquisition index can be a user watching duration index, a bullet screen quantity index, an attention quantity index and the like. Further, the test strategy may be:
taking the user acquisition index as the viewing duration of the user as an example, with reference to the server 10 shown in fig. 1, if the relevant operator sets the index to be acquired as the "viewing duration of the user" through the server 10, and sets different terminal devices to belong to the first user group and the second user group, as shown in fig. 1. At this time, the server 10 may issue a test policy to the terminal devices, for example, the test policy may set a viewing duration of each terminal device for collecting the user. When a user of a certain terminal device uses the video live broadcast application by using the terminal device, the operation time of the user entering the live broadcast room and the operation time of the user pushing out the live broadcast room are used as a counting point, and then the watching time length of the user is obtained.
Step S102, calculating difference value information of the second acquisition index difference information and the first acquisition index difference information.
Continuing with the example of S101 as an example, according to the second acquisition index difference information and the first acquisition index difference information, a difference between the control group and the experimental group when the test strategy has been configured is calculated to obtain difference information. The difference information obtained by subtracting the difference information of the second acquisition index from the difference information of the first acquisition index is the influence of the test strategy.
And step S103, taking the difference information as a test weighing parameter.
In the data interference elimination method provided by the embodiment of the present invention, by respectively obtaining difference information of a first acquisition index and difference information of a second acquisition index, when a test policy is not configured, the difference information of the first acquisition index included in a first user data set and the difference information of the second acquisition index included in a second user data set are represented, and when the test policy is configured, the difference information of the second acquisition index represents difference information of a third acquisition index included in the first user data set and a fourth acquisition index included in the second user data set. And then, by calculating the difference value information between the difference information of the second acquisition index and the difference information of the first acquisition index, the data interference generated by the difference value information because the non-homogeneous data difference between the first user acquisition index and the second user acquisition index cannot be ensured when a test strategy is not configured is eliminated, so that the difference value information is used as a test weighing parameter, and the accuracy of data analysis is improved.
In order to obtain the first acquisition index difference information and the second acquisition index difference information respectively, a possible implementation manner is provided on the basis of fig. 3 to obtain the first acquisition index difference information and the second acquisition index difference information respectively, as shown in fig. 4, where fig. 4 is a schematic flow diagram of the substep of step S101 in fig. 3 according to an embodiment of the present invention. The step S101 includes:
step S1011 obtains a first user collection mean value in the first history period according to the first user collection index.
Optionally, taking the user acquisition index as the viewing duration of the user as an example, assuming that based on the calculation requirement, the first historical period is 7 days before the statistical time in the stage where the test strategy is not configured, and the first user acquisition average value is the viewing duration average value of the user in the first user data set in the past 7 days;
step S1012 obtains a second user collection mean value in the first history period according to the second user collection index.
Continuing the above example, the second user acquisition average value is the average value of the viewing time lengths of the users in the second user data set in the stage of not configuring the test strategy and 7 days before the statistical moment;
step S1013 uses a difference value between the first user collection mean value and the second user collection mean value as the first collection index difference information.
Step S1014 obtains a third user collection mean value in the second history period according to the third user collection index.
Continuing with the above example, the third user collection average is the average of the viewing time durations of the users in the first user data set 7 days before the statistical time in the stage of configuring the test strategy;
step S1015 obtains a fourth user collection mean value in the second history period according to the fourth user collection index.
Continuing with the above example, the fourth user acquisition average value is the average value of the viewing time lengths of the users in the second user data set 7 days before the statistical moment in the stage of configuring the test strategy;
step S1016 takes the difference between the third user collection mean and the fourth user collection mean as the second collection index difference information.
The difference information of the first acquisition index and the difference information of the second acquisition index are obtained through calculation respectively, so that the difference between the control group and the experimental group when the test strategy is not configured and the difference between the control group and the experimental group when the test strategy is configured and the difference caused by the test strategy are scientifically and respectively obtained.
For facilitating understanding of the process of obtaining the difference information of the second acquisition indicator and the difference information of the first acquisition indicator to further calculate the difference information, please refer to fig. 5, where fig. 5 is a schematic diagram of data interference cancellation according to an embodiment of the present invention.
With reference to fig. 1, assuming that T is 0, the server 10 in fig. 1 is not configured with a test policy, and at this time, the server 10 collects a first user collection index of the terminal device 20a in the first user group and a second user collection index of the terminal device 20b in the second user group respectively.
For example, if the user acquisition index is the average value of the viewing durations of the last 7 days, the first user acquisition index is the average value of the viewing durations of the comparison group of 7 days under the condition that the test strategy is not configured; correspondingly, the second user acquisition index is the average value of the viewing time of 7 days of the experimental group;
furthermore, the first collected index difference information is a second user collected index, namely a first user collected index; it can be expressed by the following formula:
E(ΔYi|Di=0)
wherein D isiThe 0-time T0, i.e. the phase in which the test strategy is not configured, Δ YiRepresenting a user acquisition index, i being a user corresponding to a specific one of the terminal devices, E (Δ Y)i|Di0) characterizes the first acquisition indicator difference information.
Further, assuming that T is equal to 1, the server 10 in fig. 1 has configured the test policy, and at this time, the server 10 collects the third user collection index of the terminal device 20a in the first user group and the fourth user collection index of the terminal device 20b in the second user group, respectively.
Similarly, if the user acquisition index is the average value of the viewing time of the past 7 days, the third user acquisition index is the average value of the viewing time of the comparison group of 7 days under the condition of configuring the test strategy; correspondingly, the fourth user acquisition index is the average value of the viewing time of 7 days of the experimental group;
further, the second collected index difference information is a fourth user collected index — a third user collected index; it can be expressed by the following formula:
E(ΔYi|Di=1)
wherein D isiTime T1, i.e. the phase in which the test strategy has been configured, is characterized by 1, E (Δ Y)i|Di0) characterizing the second acquisition indicator difference information.
Furthermore, as can be seen from the above step S101 and fig. 4, the difference information may be a difference between the second acquisition indicator difference information and the first acquisition indicator difference information, and the difference information excludes the non-homogeneous data interference included in the second acquisition indicator difference information at the stage T ═ 0, so as to improve the accuracy of the subsequent test measurement using the difference information.
Optionally, since there is a possibility that a part of the data may be changed too sharply and cannot be used as valid data for analysis in the data acquisition process, data with a relatively stable trend of data change needs to be selected as the user acquisition indicator in the first user data set and the second user data set. Furthermore, in order to filter the valid data by the time limit, a possible implementation manner is provided below, and on the basis of fig. 4, as shown in fig. 6, fig. 6 is a schematic flow chart of another data interference cancellation method provided by the embodiment of the present invention. Further comprising:
step S1001, a first user prediction index corresponding to the first user acquisition index and a second user prediction index corresponding to the second user acquisition index are respectively obtained through a prediction model.
In this embodiment, the prediction model may be a linear regression model:
Y=a+bX+c,
wherein X represents an input user acquisition index, Y represents a user prediction index, a and c represent an intercept and an error term of the model respectively, and b represents a slope of the model. And inputting the first user collection index and the second user collection index into the prediction model to obtain a first user prediction index and a second user prediction index.
It should be noted that the prediction model may also be implemented by using a regression function in a sklern algorithm package, specifically, the implementation manner is that a machine-learned package is packaged, and the server 10 may call a corresponding package to execute the prediction.
The concrete implementation form of the prediction model is not limited by the invention.
Continuing to take the above example that the user collection index is the average value of the viewing durations of the last 7 days, for the first user collection index, in combination with the embodiment of fig. 5, it may be that, under the condition that the test strategy is not configured, the viewing duration of the comparison group at the time when T is 0 is obtained according to the prediction model to serve as the first user prediction index; similarly, with reference to the embodiment of fig. 5, a second user prediction index corresponding to the second user collection index may be obtained.
Step S1002, determining whether the first user prediction index and the first user acquisition index, and the second user prediction index and the second user acquisition index both meet a stable condition.
Optionally, an error between the first user prediction index and the first user collection index and an error between the second user prediction index and the second user collection index are calculated, and the error is stable when an average value of the errors between the first user prediction index and the first user collection index and an average value of the errors between the second user prediction index and the second user collection index satisfy a normal distribution.
Optionally, when the positive-over distribution with the error mean value of zero is satisfied, the first user prediction index and the first user collection index, and the second user prediction index and the second user collection index are stable.
If yes, the above step S101 may be performed.
If not, executing step S1003, and updating the parameters of the prediction model until the first user prediction index and the first user acquisition index obtained through the updated prediction model and the obtained second user prediction index and second user acquisition index both meet the stability condition.
Optionally, one possible implementation manner of updating the parameters of the prediction model is to adjust the parameters a, b, and c described above so as to minimize the mean square error; and substituting the data of the user watching time length into the prediction model for fitting, calculating the fitted parameters a, b and c, substituting a plurality of groups of data of the user watching time length into the prediction model for fitting to obtain a plurality of groups of parameters a, b and c, and predicting by using the prediction model obtained by each group of parameters a, b and c to respectively obtain predicted values yi ' corresponding to each group of parameters a, b and c, wherein each predicted value yi ' has a corresponding true value yi '. And then selecting parameters a, b and c corresponding to the minimum mean square error as parameters of the final prediction model in a mean square error calculation mode.
Optionally, the mean square error is calculated in the manner
Figure BDA0003222328430000151
Where n represents the number of users, yi represents the actual duration of each user, and yi' represents the duration predicted by the model fitting.
Referring to fig. 7, a schematic structural diagram of a data interference cancellation apparatus according to an embodiment of the present invention is shown. The data interference cancellation apparatus 300 includes: an information acquisition module 320, a difference calculation module 340, and a parameter acquisition module 360.
The information obtaining module 320 is configured to obtain first collection index difference information and second collection index difference information respectively;
the first acquisition index difference information represents difference information of a first user acquisition index contained in the first user data set and a second user acquisition index contained in the second user data set when a test strategy is not configured; the second acquisition index difference information represents difference information between a third user acquisition index included in the first user data set and a fourth user acquisition index included in the second user data set when the test strategy is configured.
The difference calculation module 340 is configured to calculate difference information between the second acquisition index difference information and the first acquisition index difference information.
The parameter obtaining module 360 is configured to use the difference information as a test measurement parameter.
The data interference elimination apparatus provided in the embodiment of the present invention obtains difference information between a first acquisition index and a second acquisition index through an information obtaining module, where the difference information represents difference information between a first user acquisition index included in a first user data set and a second user acquisition index included in a second user data set when a test policy is not configured, and the difference information represents difference information between a third user acquisition index included in the first user data set and a fourth user acquisition index included in the second user data set when the test policy is configured. And then the difference value information of the difference information of the second acquisition index and the difference information of the first acquisition index is calculated through the difference value calculation module, so that the data interference generated by the difference value information due to the fact that the non-homogeneous data difference between the first user acquisition index and the second user acquisition index cannot be guaranteed when a test strategy is not configured is eliminated, the difference value information is used as a test weighing parameter, and the accuracy of data analysis is improved.
As an embodiment, referring to fig. 8 on the basis of fig. 7, fig. 8 is a schematic structural diagram illustrating an information obtaining module 320 of a data interference cancellation apparatus 300 according to an embodiment of the present invention, where in the embodiment of the present invention, the information obtaining module 320 includes a mean value obtaining unit 321 and a difference information obtaining unit 322.
The mean value obtaining unit 321 is configured to obtain a first user collection mean value in a first history time period according to the first user collection index; acquiring a second user acquisition mean value in a first historical time period according to a second user acquisition index; acquiring a third user acquisition mean value in a second historical time period according to a third user acquisition index; and obtaining a fourth user acquisition mean value in the second historical time period according to the fourth user acquisition index.
The difference information obtaining unit 322 is configured to use a difference value between the first user collection average value and the second user collection average value as first collection index difference information; and taking the difference value of the third user collection mean value and the fourth user collection mean value as second collection index difference information.
As an embodiment, referring to fig. 9 on the basis of fig. 7, fig. 9 shows another schematic structure of a data interference cancellation apparatus 300 according to an embodiment of the present invention, in which the data interference cancellation apparatus 300 further includes a stability determination module 310. The stability determination module 310 includes a prediction index obtaining unit 311 and a stability determination unit 312.
The prediction index obtaining unit 311 is configured to obtain a first user prediction index corresponding to the first user acquisition index and a second user prediction index corresponding to the second user acquisition index through a prediction model.
The stability determining unit 312 is configured to determine that the first user prediction index and the first user collection index, and the second user prediction index and the second user collection index both satisfy the stability condition, operate the information obtaining module 320 to obtain difference information of the first collection index and difference information of the second collection index respectively.
The stability determining unit 312 is further configured to update parameters of the prediction model when it is determined that the first user prediction index and the first user collection index, and the second user prediction index and the second user collection index do not satisfy the stability condition, until both the first user prediction index and the first user collection index obtained through the updated prediction model, and the second user prediction index and the second user collection index obtained meet the stability condition.
The data interference elimination apparatus 300 subtracts the difference between the control group and the experimental group when the test strategy is configured, which is indicated by the difference information of the second acquisition index, and the difference caused by the test strategy, from the difference between the control group and the experimental group when the test strategy is not configured, which is indicated by the difference information of the first acquisition index, to obtain the difference information as the test measurement parameter, so that the difference between the control group and the experimental group at the stage when the test strategy is configured can be eliminated, and the accuracy of the final result of the AB test experiment can be ensured.
The data interference cancellation system provided by the embodiment of the present invention may adopt the architecture shown in fig. 1 above. The server 10 may execute the steps shown in fig. 3 to fig. 7, so that the system calculates the difference information between the second acquisition index difference information and the first acquisition index difference information through the server difference calculation module, so that the difference information eliminates data interference generated when a test strategy is not configured, because the non-homogeneous data difference between the first user acquisition index and the second user acquisition index cannot be guaranteed, and thus the difference information is used as a test measurement parameter to improve the accuracy of data analysis.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the present embodiment by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the embodiments of the present invention should be included in the protection scope of the embodiments of the present invention.

Claims (9)

1. A method for data interference cancellation, comprising:
respectively obtaining first acquisition index difference information and second acquisition index difference information;
the first acquisition index difference information represents difference information of a first user acquisition index contained in a first user data set and a second user acquisition index contained in a second user data set when a test strategy is not configured; the second acquisition index difference information represents difference information between a third user acquisition index contained in the first user data set and a fourth user acquisition index contained in the second user data set when the test strategy is configured;
calculating difference value information of the second acquisition index difference information and the first acquisition index difference information;
and taking the difference information as a test measurement parameter.
2. The method of claim 1, wherein the first user collection indicator is a first user collection indicator in a first historical period, and the second user collection indicator is a second user collection indicator in the first historical period; the third user acquisition index is a third user acquisition index in a second historical period, and the fourth user acquisition index is a fourth user acquisition index in the second historical period;
the step of respectively obtaining the difference information of the first acquisition index and the difference information of the second acquisition index includes:
acquiring a first user acquisition mean value in the first historical time period according to the first user acquisition index;
acquiring a second user acquisition mean value in the first historical time period according to the second user acquisition index;
taking the difference value of the first user collection mean value and the second user collection mean value as the first collection index difference information;
acquiring a third user acquisition mean value in the second historical time period according to the third user acquisition index;
acquiring a fourth user acquisition mean value in the second historical time period according to the fourth user acquisition index;
and taking the difference value between the third user collection mean value and the fourth user collection mean value as the second collection index difference information.
3. The method of claim 2, further comprising, before the step of obtaining the first collected indicator difference information and the second collected indicator difference information respectively:
respectively obtaining a first user prediction index corresponding to the first user acquisition index and a second user prediction index corresponding to the second user acquisition index through a prediction model;
determining whether the first user prediction index and the first user acquisition index, and the second user prediction index and the second user acquisition index both meet a stability condition;
if yes, executing the step of respectively obtaining first acquisition index difference information and second acquisition index difference information;
and if not, updating the parameters of the prediction model until the first user prediction index and the first user acquisition index obtained by the updated prediction model and the second user prediction index and the second user acquisition index obtained by the updated prediction model both meet the stable condition.
4. A data interference cancellation apparatus, comprising:
the information acquisition module is used for respectively acquiring first acquisition index difference information and second acquisition index difference information; the first acquisition index difference information represents difference information of a first user acquisition index contained in a first user data set and a second user acquisition index contained in a second user data set when a test strategy is not configured; the second acquisition index difference information represents difference information between a third user acquisition index contained in the first user data set and a fourth user acquisition index contained in the second user data set when the test strategy is configured;
the difference value calculation module is used for calculating difference value information of the second acquisition index difference information and the first acquisition index difference information;
and the parameter acquisition module is used for taking the difference information as a test measurement parameter.
5. The apparatus of claim 4, wherein the first user collection indicator is a first user collection indicator in a first historical period, and the second user collection indicator is a second user collection indicator in the first historical period; the third user acquisition index is a third user acquisition index in a second historical period, the fourth user acquisition index is a fourth user acquisition index in the second historical period, and the information acquisition module comprises:
the average value acquisition unit is used for acquiring a first user acquisition average value in the first historical time period according to the first user acquisition index; acquiring a second user acquisition mean value in the first historical time period according to the second user acquisition index; acquiring a third user acquisition mean value in the second historical time period according to the third user acquisition index; acquiring a fourth user acquisition mean value in the second historical time period according to the fourth user acquisition index;
a difference information obtaining unit, configured to use a difference value between the first user collection mean value and the second user collection mean value as the first collection index difference information; and taking the difference value between the third user collection mean value and the fourth user collection mean value as the second collection index difference information.
6. The apparatus of claim 5, further comprising: a stability judgment module;
the stability judgment module includes:
the prediction index acquisition unit is used for respectively acquiring a first user prediction index corresponding to the first user acquisition index and a second user prediction index corresponding to the second user acquisition index through a prediction model;
the stability judgment unit is used for operating the information acquisition module to respectively acquire first acquisition index difference information and second acquisition index difference information when the first user prediction index and the first user acquisition index as well as the second user prediction index and the second user acquisition index meet stability conditions;
the stability judging unit is further configured to update parameters of the prediction model when it is determined that the first user prediction index and the first user collection index, and the second user prediction index and the second user collection index do not satisfy a stability condition, until both the first user prediction index and the first user collection index obtained through the updated prediction model, and the second user prediction index and the second user collection index obtained meet the stability condition.
7. A data interference cancellation system, said system comprising: data acquisition equipment, the apparatus of any one of claims 4 to 6;
the data acquisition equipment is used for acquiring the first user data set and the second user data set when a test strategy is not configured and when a test strategy is configured; when the test strategy is not configured, the difference information of a first user acquisition index contained in the first user data set and a second user acquisition index contained in the second user data set is obtained; and when the test strategy is configured, the difference information of a third user acquisition index contained in the first user data set and a fourth user acquisition index contained in the second user data set.
8. An electronic device, comprising a processor and a memory, the memory storing a computer program, the processor implementing the method of any one of claims 1-3 when executing the computer program.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1-3.
CN202110961015.4A 2021-08-20 2021-08-20 Data interference elimination method and related device Active CN113688124B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110961015.4A CN113688124B (en) 2021-08-20 2021-08-20 Data interference elimination method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110961015.4A CN113688124B (en) 2021-08-20 2021-08-20 Data interference elimination method and related device

Publications (2)

Publication Number Publication Date
CN113688124A true CN113688124A (en) 2021-11-23
CN113688124B CN113688124B (en) 2023-10-31

Family

ID=78581028

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110961015.4A Active CN113688124B (en) 2021-08-20 2021-08-20 Data interference elimination method and related device

Country Status (1)

Country Link
CN (1) CN113688124B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190370163A1 (en) * 2018-05-29 2019-12-05 Beijing Baidu Netcom Science And Technology Co., L Method and apparatus for outputting information
CN113159815A (en) * 2021-01-25 2021-07-23 腾讯科技(深圳)有限公司 Information delivery strategy testing method and device, storage medium and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190370163A1 (en) * 2018-05-29 2019-12-05 Beijing Baidu Netcom Science And Technology Co., L Method and apparatus for outputting information
CN113159815A (en) * 2021-01-25 2021-07-23 腾讯科技(深圳)有限公司 Information delivery strategy testing method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN113688124B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
CN111290924B (en) Monitoring method and device and electronic equipment
CN109831532B (en) Data sharing method, device, equipment and medium
CN110083475B (en) Abnormal data detection method and device
CN110828017A (en) Information processing method and information processing system for nuclear power plant
CN111739658A (en) Method and device for predicting infectious disease trend based on input case
CN106990989B (en) Method and device for controlling application program installation
CN109144862A (en) Statistical method, device, computer equipment and the storage medium of test data
CN112035320A (en) Service monitoring method and device, electronic equipment and readable storage medium
CN113094087A (en) Software configuration method, electronic device and storage medium
CN112860512A (en) Interface monitoring optimization method and device, computer equipment and storage medium
CN105224646A (en) Object relation analysis method and device and electronic equipment
CN106445715B (en) The report method and device of pedometer message
CN109240916B (en) Information output control method, information output control device and computer readable storage medium
CN109240893B (en) Application running state query method and terminal equipment
CN113688124A (en) Data interference elimination method and related device
CN116756522A (en) Probability forecasting method and device, storage medium and electronic equipment
CN108418730B (en) Network flow testing method, device, equipment and computer readable storage medium
CN110569114A (en) Service processing method, device, equipment and storage medium
CN112380237B (en) Method, device, terminal and storage medium for predicting database hidden danger SQL
CN111222739B (en) Nuclear power station task allocation method and nuclear power station task allocation system
CN113688350A (en) Method, device, storage medium and terminal for predicting traffic flow based on Fourier function
CN112925804A (en) Database maintenance method and device
CN110838001A (en) Sample analysis method and sample analysis system for nuclear power plant
CN111258866A (en) Computer performance prediction method, device, equipment and readable storage medium
CN117061218B (en) Security capability assessment method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant