CN112862536B

CN112862536B - Data processing method, device, equipment and storage medium

Info

Publication number: CN112862536B
Application number: CN202110213201.XA
Authority: CN
Inventors: 邓颖; 蔡政; 李成龙; 任宇堃; 朱志华; 蔡越; 李池洋; 林晓健
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2023-07-11
Anticipated expiration: 2041-02-25
Also published as: CN112862536A

Abstract

The application discloses a data processing method, a device, equipment and a storage medium, wherein the method comprises the steps of obtaining a first result data set of a target control experiment and heterogeneity analysis requirement information; configuring the characteristics to be analyzed of an original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model; inputting the first result data set into the target heterogeneity analysis model, and carrying out result heterogeneity analysis on the feature to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree represents heterogeneity information of the target control experiment on a plurality of feature values of the feature to be analyzed. By means of the technical scheme, result heterogeneity analysis can be rapidly and accurately carried out by combining the heterogeneity analysis model, and efficiency and reliability of the result heterogeneity analysis are improved.

Description

Data processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to a data processing method, apparatus, device, and storage medium.

Background

When the internet experiment is performed, besides the effect of the experiment strategy on the whole sample set, the influence of the same experiment strategy on the experiment object can be different due to the characteristic difference of the experiment object (for example, the experiment is an advertisement display mode, the object is an advertisement, the experiment result can have great difference aiming at advertisements of different industries), the result heterogeneity analysis of the experiment is also required, namely, the effect of determining which values of the characteristic are obvious in the experiment strategy and the effect of which values are weak in the experiment strategy is determined, and then the experiment strategy can be adjusted in a targeted manner, so that the adaptability and the flexibility of the experiment strategy are improved.

In the prior art, when the result heterogeneity analysis of an experiment is performed, a sample set is often divided into small groups according to requirements in a small group analysis mode, and then the change of an index is determined through simple t-test, so that the probability of first-class errors (refusal errors, original assumptions are correct and the original assumptions are refused) is higher in the mode, and when the analysis dimension and the grouping are more, the situation that samples are sparse exists, so that the accuracy of the heterogeneity analysis result is not strong, and in addition, larger statistical errors are caused by subjectively determining the grouping, so that the heterogeneity analysis is not reliable enough, and therefore, a more reliable and efficient scheme needs to be provided.

Disclosure of Invention

In order to solve the problems in the prior art, the application provides a data processing method, a device, equipment and a storage medium. The technical scheme is as follows:

in one aspect, the present application provides a data processing method, including:

acquiring a first result data set and heterogeneity analysis requirement information of a target control experiment;

configuring the characteristics to be analyzed of an original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model;

inputting the first result data set into the target heterogeneity analysis model, and carrying out result heterogeneity analysis on the feature to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree represents heterogeneity information of the target control experiment on a plurality of feature values of the feature to be analyzed.

Another aspect of the present application provides a data processing apparatus, the apparatus comprising:

the data acquisition module is used for acquiring a first result data set and heterogeneity analysis requirement information of the target control experiment;

the characteristic parameter configuration module is used for configuring the characteristics to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model;

The heterogeneity analysis module is used for inputting the first result data set into the target heterogeneity analysis model, carrying out result heterogeneity analysis on the feature to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, and representing heterogeneity information of the target control experiment on a plurality of feature values of the feature to be analyzed.

In another aspect, the present application provides a data processing apparatus, where the apparatus includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, where the at least one instruction or the at least one program is loaded and executed by the processor to implement a data processing method as described above.

In another aspect, the present application provides a computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement a data processing method as described above.

The data processing method, the device, the equipment and the storage medium provided by the application have the following technical effects:

the method comprises the steps of obtaining a first result data set and heterogeneity analysis requirement information of a target control experiment; and then configuring the characteristics to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model, improving the flexibility of data processing, and finally inputting the first result data set into the target heterogeneity analysis model to perform result heterogeneity analysis on the characteristics to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree represents the heterogeneity information of the target control experiment on a plurality of characteristic values of the characteristics to be analyzed. The result heterogeneity analysis can be rapidly and accurately carried out by combining with the heterogeneity analysis model, and the efficiency and reliability of the result heterogeneity analysis are improved.

Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

Drawings

In order to more clearly illustrate the technical solutions and advantages of embodiments of the present application or of the prior art, the following description will briefly introduce the drawings that are required to be used in the embodiments or the prior art descriptions, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an application environment provided by an embodiment of the present application;

FIG. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application;

FIG. 3 is a flow chart of another data processing method according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a heterogeneity analysis result of a target control experiment provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a result of heterogeneity analysis of another target control experiment provided in an embodiment of the present application;

FIG. 6 is a flowchart of another data processing method according to an embodiment of the present disclosure;

FIG. 7 is a flowchart of another data processing method according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a tree structure including result statistics corresponding to each leaf node according to an embodiment of the present application;

FIG. 9 is a schematic diagram of another tree structure including result statistics corresponding to each leaf node according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a result statistics table generated according to a tree structure including result statistics data corresponding to each leaf node according to an embodiment of the present application;

FIG. 11 is a flowchart of another data processing method according to an embodiment of the present disclosure;

FIG. 12 is a flowchart of another data processing method according to an embodiment of the present disclosure;

FIG. 13 is a flowchart of another data processing method according to an embodiment of the present disclosure;

FIG. 14 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

fig. 15 is a block diagram of a hardware structure of a server for implementing a data processing method according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein. Examples of the embodiments are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements throughout or elements having like or similar functionality.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Blockchains are novel application modes of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

The scheme provided by the embodiment of the application relates to the technology of artificial intelligence such as machine learning, and the like, and is specifically described through the following embodiment.

Referring to fig. 1, fig. 1 is a schematic view of an application environment provided in the present application, and as shown in fig. 1, the application environment may include a server 01 and a client 02.

In this embodiment of the present application, the server 01 may be configured to obtain a result data set and information of a requirement for heterogeneous analysis of a target control experiment, and perform heterogeneous analysis of the result in combination with a target heterogeneous analysis model to obtain a heterogeneous analysis result tree of the target control experiment, where the heterogeneous analysis result tree may characterize heterogeneous information of the target control experiment on a plurality of feature values of a feature to be analyzed. Optionally, the server 01 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network content delivery network), and basic cloud computing services such as big data and an artificial intelligence platform.

In this embodiment of the present application, the client 02 may be configured to generate an experiment result comparison page according to a heterogeneity analysis result tree of a target comparison experiment, and display the experiment result comparison page, so as to intuitively and clearly display a heterogeneity analysis result. In practical applications, the client 02 may include, but is not limited to, a terminal device such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart wearable device (e.g., a smart watch), a network device, a firewall, and the like.

In the embodiment of the present application, the server 01 and the client 02 may be directly or indirectly connected through a wired or wireless communication manner, which is not limited herein.

FIG. 2 is a flow chart of a method of data processing provided in an embodiment of the present application, the present description provides method operational steps as described in the examples or flow charts, but may include more or fewer operational steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in a real system or server product, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment). As shown in fig. 2, the method may include:

S201: and acquiring a first result data set and heterogeneity analysis requirement information of the target control experiment.

Specifically, the target control experiment may include, but is not limited to, an a/B test of a target strategy, i.e., by dividing an experimental object into an experimental group and a control group through random sampling, the target strategy may be used for the experimental group, the control group maintains the original strategy unchanged, and then differences of experimental results of the experimental group and the control group are compared to obtain effects of the target strategy. The experimental result may represent index information affected by the policy, in practical application, the index affected by the policy may be set according to practical application requirements, for example, when the experimental object is an advertisement, the experimental result may be a gmv (Gross Merchandise Volume total value amount) value, in this embodiment of the present application, the gmv value may represent the total value of values generated by advertisement conversion, and the gmv value may be obtained by multiplying the conversion number by the benefit brought by a single conversion. In a specific embodiment, the experimental object may be an advertisement, the policy may be an advertisement display mode, at this time, the target policy may be a popup window display of the advertisement, the original policy may be an embedded display of the advertisement, the original embedded display of the advertisement of the control group is maintained by using the popup window display of the advertisement of the experimental group, and the difference of experimental results (gmv values) of the experimental group and the control group may be compared subsequently, so that the effect of popup window display of the advertisement relative to the embedded display of the advertisement may be obtained.

In this embodiment of the present application, the first result data set may include a plurality of pieces of experimental data of the target control experiment, where the plurality of pieces of experimental data may include experimental data of an experimental group and experimental data of a control group, and the experimental data of the experimental group and the experimental data of the control group have the same number; wherein, each piece of experimental data can comprise attribute information of an experimental object and experimental results; specifically, the attribute information of the experimental object may include at least one characteristic information of the experimental object, and in practical applications, for example, when the experimental object is an advertisement, the attribute information of the experimental object may include, but is not limited to, industry and delivery site of the experimental object, the experimental result may include gmv value, and when the experimental object is a person, the attribute information of the experimental object may include, but is not limited to, sex and school of the experimental object. And a large amount of abundant experimental data is acquired, so that the reliability of result heterogeneity analysis is improved.

In practical application, since experimental effects of the same strategy may be different according to experimental objects (for example, the strategy is an advertisement display mode, the object is an advertisement, and experimental results may be different according to advertisements in different industries), in order to analyze heterogeneous causal effects (Heterogenous Causal Effects, HTE) of the strategy, that is, heterogeneity of the causal effects (effects or influences of the strategy on the experimental objects) (the causal effects of the same strategy on different characteristic values of the experimental objects are different), adaptive adjustment can be performed on the strategy subsequently flexibly, and it is necessary to perform result heterogeneity analysis by using a result data set.

In this embodiment of the present application, the heterogeneity analysis requirement information may represent heterogeneity analysis feature information of a target control experiment, and specifically, the heterogeneity analysis requirement information may include feature dimensions of an experimental object, where the feature dimensions may be set according to actual application requirements, for example, gender, academy, age, industry, delivery site, and the like; in a specific embodiment, when the subject is an advertisement, the characteristic dimension may include an industry or a delivery site; in some embodiments, when the characteristic dimension is more than one, the heterogeneous analysis requirement information may further include characteristic analysis order information, for example, when the characteristic dimension may include an industry and a delivery site, the characteristic analysis order information may be an industry-delivery site (a first dimension is the industry, a second dimension is the delivery site, that is, analysis is performed on the industry first, and then analysis is performed on the delivery site), or a delivery site-industry (a first dimension is the delivery site, a second dimension is the industry, that is, analysis is performed on the delivery site first, and then analysis is performed on the industry).

The method has the advantages that a large amount of abundant experimental data are obtained, the reliability of result heterogeneity analysis is improved, the information of the heterogeneity analysis requirement is obtained, and furthermore, the heterogeneity analysis can be flexibly carried out according to the information of the heterogeneity analysis requirement.

S203: and configuring the characteristics to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model.

In practical applications, since the features of the experimental object may include multiple types, the feature to be analyzed of the original heterogeneity analysis model may be configured in combination with the actual analysis requirement, and then the result heterogeneity analysis can be performed on the specific feature dimension (for example, the feature to be analyzed may include industries to analyze the most significant experimental effect on which industries, or the feature to be analyzed may include a delivery site to analyze the most significant experimental effect on which delivery site, or may be performed simultaneously.

In the embodiment of the application, when only one feature dimension in the heterogeneity analysis demand information is needed, the feature dimension can be configured as a feature to be analyzed of an original heterogeneity analysis model to obtain a target heterogeneity analysis model; since the heterogeneity analysis demand information may further include feature analysis order information when the feature dimensions in the heterogeneity analysis demand information are more than one, the configuring the feature to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information, the obtaining the target heterogeneity analysis model may include: according to the characteristic analysis sequence information, configuring the more than one characteristic dimension as the characteristic to be analyzed of the original heterogeneity analysis model to obtain a target heterogeneity analysis model; for example, the feature dimension may include an industry and a launch site, the feature analysis order information may be an industry-launch site, the industry may be configured as a feature to be analyzed in a first dimension, the launch site may be configured as a feature to be analyzed in a second dimension, and so on.

The target heterogeneity analysis model is obtained by configuring the feature to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information, which is equivalent to initializing and configuring the original heterogeneity analysis model according to the heterogeneity analysis demand information, and is beneficial to flexibly carrying out result heterogeneity analysis according to actual analysis demands.

S205: inputting the first result data set into the target heterogeneity analysis model, and carrying out result heterogeneity analysis on the characteristics to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment.

Specifically, the heterogeneity analysis result tree may characterize the heterogeneity information of the target control experiment on the above-described multiple feature values of the feature to be analyzed. In one embodiment, the heterogeneity analysis result tree may include a plurality of leaf nodes arranged according to the magnitude of the heterogeneity intensity, where the leaf nodes are in one-to-one correspondence with the feature values of the feature to be analyzed, the heterogeneity intensity is sequentially weakened from left to right, and the heterogeneity information on the feature values of the feature to be analyzed can be clearly and intuitively obtained by using the tree structure, so that the key value (the heterogeneity intensity is higher, the experimental effect is more obvious) and the non-key value (the heterogeneity intensity is lower, the experimental effect is not obvious) of the feature to be analyzed can be rapidly and accurately determined, the problem that the efficiency is low due to the fact that only individual sub-classification viewing effects can be selected each time in the prior art is avoided, the result heterogeneity analysis can be scientifically and accurately performed on the feature to be analyzed, and the efficiency and the flexibility of the heterogeneity analysis can be improved.

In one embodiment, the target heterogeneity analysis model may be a causal tree model, in particular, the target heterogeneity analysis model may include a heterogeneity intensity computation layer, a leaf node generation layer, a data classification layer, and a recursive computation layer.

Referring to fig. 3, inputting the first result data set into the target heterogeneity analysis model, performing a result heterogeneity analysis on the feature to be analyzed, and obtaining a heterogeneity analysis result tree of the target control experiment may include:

s301: and traversing each characteristic value of the to-be-analyzed characteristic in the first result data set in the heterogeneity intensity calculation layer, and respectively calculating the heterogeneity information of each characteristic value on the first result data set.

In the embodiment of the application, when the feature to be analyzed is an industry, the feature value may include, for example, a game, an electronic commerce, an automobile, finance, web clothing, and the like; when the feature to be analyzed is gender, the feature values may include: male and female; when the feature to be analyzed is a launch site, the feature value may include: mobile internal sites, mobile alliances, etc. The heterogeneity information of a feature value on the first result data set may include a heterogeneity intensity value of the feature value on the first result data set, and the heterogeneity intensity value of a feature value on the first result data set may be specifically calculated using the following formula:

S _l ＝(X _i ,Y _i ,T _i |X _i ∈X _l )

Wherein S is _l The data partition L corresponding to the characteristic value is represented, and L represents the total number of the data partitions; in the embodiment of the present application, each data partition corresponding to a feature value may include 2 (l=2), that is, the first result data set is classified into two according to the feature value; for example, when the feature to be analyzed is an industry, the feature value may include a game, and the first result data set may be divided into two data partitions, namely "industry is a game" and "industry is not a game" according to the feature value; x is X _i Characteristic values (e.g., games, and others) representing the subject, Y _i Representing the experimental results (e.g., gmv value), T _i = {0,1} (t=0 or 1) indicates whether the target policy is used, T _i =1 means that the subject used the target strategy, T _i =0 indicates that the subject did not use the target policy (the original policy was kept unchanged);

represent S _l The mean value (for example, gmv mean value) corresponding to the experimental results of the experimental group or the control group in this partition>

Represent S _l The mean value of the experimental results of the experimental group in this partition, +.>

Represent S _l Mean value corresponding to experimental results of the control group in this zone), N _l,t Represent S _l The number of experimental or control groups (N _l,1 Represent S _l Number of experimental groups in this partition, N _l,0 Represent S _l Number of control groups in this partition), τ (S _l ) Represent S _l Average causality (ACE Average Causal Effect) within this zone, S _l The average value of the effect of the target strategy in the partition relative to the original strategy; />

Representing the above-mentioned value of the heterogeneity intensity, which may be a mean square error data, may be represented by the sum of squares of the average causal effects of the two partitions, which may reflect the degree of difference between the estimated quantity and the estimated quantity,/o->

The larger the value of the feature, the stronger the heterogeneity of the result, i.e. the more remarkable the experimental effect, and the higher the value of further analysis for the value of the feature.

In this embodiment of the present application, each feature value of the feature to be analyzed in the first result data set may be traversed, and the heterogeneity intensity value of each feature value on the first result data set may be calculated, that is, each feature value may be used to perform two classifications on the first result data set, calculate the average causal effect on two data partitions, and further calculate the heterogeneity intensity value of the feature value on the first result data set

Through traversing each characteristic value of the to-be-analyzed characteristic in the first result data set in the heterogeneity intensity calculation layer, the heterogeneity information of each characteristic value on the first result data set can be calculated scientifically and reliably by combining the result data set, and the generation of reliable heterogeneity analysis result trees is facilitated.

S303: in the leaf node generation layer, a target feature value is determined according to the heterogeneity information of each feature value on the first result data set, and a target leaf node is generated according to the target feature value.

In an embodiment of the present application, the heterogeneity information may include a heterogeneity intensity value, and in the leaf node generating layer, determining a target feature value according to the heterogeneity information of each feature value on the first result data set, and generating the target leaf node according to the target feature value may include: and determining the characteristic value with the maximum heterogeneity intensity value as the target characteristic value according to the heterogeneity intensity value of each characteristic value on the first result data set, and configuring the target characteristic value as a target leaf node (the target leaf node at the moment is the first leaf node of the heterogeneity analysis result tree).

S305: in the data classification layer, the first result data set is subjected to data classification according to the target feature value node to obtain a target result data subset.

In this embodiment of the present application, the first result data set may be classified into two according to the target feature value node, for example, the feature to be analyzed is industry, the feature value with the greatest calculated heterogeneity intensity value is game, and then the target leaf node is game, so that the first result data set may be classified into two parts of "industry is game" and "industry is not game" according to the game, and at this time, the data of the part of "industry is not game" may be used as the target result data subset, and further the subsequent analysis may be continued.

S307: and in the recursive computation layer, taking the target result data subset as the first result data set, and repeating the steps of S301-S305 until the target result data subset is empty, so as to obtain the heterogeneity analysis result tree of the target control experiment.

In this embodiment of the present application, the target result data subset may be used as the first result data set, each feature value of the feature to be analyzed in the first result data set (the feature value in the first result data set is taken as an example in the industry, at this time, the feature value in the first result data set is not played) may be continuously traversed, then the heterogeneous intensity value of each feature value on the first result data set is calculated, the feature value with the largest heterogeneous intensity value is selected as the target feature value, the target feature value is configured as a target leaf node, the target leaf node at this time is an adjacent node of the first leaf node generated above, data classification is continuously performed according to the adjacent node to obtain a target result data subset, the target result data subset is used as the first result data set, and the like, until the target result data subset is empty, all feature values of the feature to be analyzed on the first result data set generate corresponding leaf nodes, and the target comparison experiment is obtained.

FIG. 4 is a schematic diagram of a result tree of heterogeneity analysis in a target control experiment according to an embodiment of the present application, please refer to FIG. 4, in which only one feature (industry) to be analyzed is present, the feature to be analyzed has 5 feature values (game, e-commerce, web service, automobile, finance) in the first result data set, the 5 feature values are traversed to obtain the heterogeneity intensity values of the 5 feature values on the first result data set, and the heterogeneity intensity values are determined

The largest characteristic value is game, the game is configured as a first leaf node, the first result data set is divided into two parts of industry game and industry game, the data of industry game is used as a target result data subset, the target result data subset is used as a first result data set, the remaining 4 characteristic values in the first result data set are traversed, and the heterogeneity intensity value of the first result data set at the moment of the 4 characteristic values is determined>

The largest characteristic value is the E-quotient, the E-quotient is used as the second leaf node, and the like, so that the heterogeneity analysis result tree of the target control experiment shown in fig. 4 is obtained.

In the embodiment of the present application, when the feature dimensions in the heterogeneous analysis demand information are more than one, the feature dimensions may be configured as the feature to be analyzed of the original heterogeneous analysis model according to the feature analysis order information, so as to obtain a target heterogeneous analysis model; for example, the feature dimension may include an industry and a launch site, the feature analysis order information may be an industry-launch site, the industry may be configured as a feature to be analyzed in a first dimension, the launch site may be configured as a feature to be analyzed in a second dimension, and so on. When the multi-dimensional feature to be analyzed is included, the result heterogeneity analysis of the feature to be analyzed in the first dimension can be performed in the target heterogeneity analysis model, leaf nodes are generated each time, the first result data set is divided into two parts (namely, A and not A, wherein A represents one feature value of the feature to be analyzed in the first dimension) according to the generated leaf nodes, the two parts can be respectively used as target result data subsets, the target result data subsets are used as first result data sets, traversal calculation is continuously performed on the first result data sets at the moment, when the part of data is A as the first result data set, the feature value traversal of the feature to be analyzed in the second dimension can be performed on the first result data sets at the moment, when the part of data is not A as the first result data sets, the residual feature value traversal of the feature to be analyzed in the first dimension can be continuously performed on the first result data sets at the moment, and the like.

FIG. 5 is a schematic diagram of a result tree for heterogeneous analysis in a target control experiment when the target control experiment includes multi-dimensional features to be analyzed according to the embodiment of the present application, please refer to FIG. 5, in which two-dimensional features to be analyzed are shared (the first dimension is the industry, the second dimension is the delivery site), the first dimension features to be analyzed have 3 feature values (game, e-commerce, web service) in the first result data set, the 3 feature values are traversed to obtain heterogeneous intensity values of the 3 feature values on the first result data set, and the heterogeneous intensity values are determined

The largest characteristic value is game, the game is taken as a first leaf node, a first result data set is divided into two parts of industry game and industry game not, the data of industry game and industry game not are taken as target result data subsets, the target result data subsets are taken as first result data sets, and the industry game is gameCalculating a heterogeneity intensity value by traversing a feature to be analyzed (a delivery station) of a second dimension in the part of data in the first part of data (assuming that 2 mobile internal stations and mobile alliance exist), determining that the mobile internal station with the largest heterogeneity intensity value in the part of data of the industry is a game, determining the mobile internal station as a longitudinal adjacent node of a game leaf node, and then determining the mobile alliance; assuming that the second dimension of the data of the "industry is e-commerce" and the "industry is web service" has only mobile internal sites, the heterogeneity analysis result tree of the target control experiment shown in fig. 5 can be obtained.

The first result data set is input into the target heterogeneity analysis model, the result heterogeneity analysis is carried out on the characteristics to be analyzed, so that a heterogeneity analysis result tree of the target control experiment is obtained, the result heterogeneity analysis is carried out scientifically and reliably by combining machine learning and a large amount of result data, the efficiency and the accuracy of the result heterogeneity analysis are improved, and the result heterogeneity analysis on the multidimensional characteristics to be analyzed can be flexibly carried out.

In an embodiment of the present application, referring to fig. 6, the method may further include:

s601: a second result dataset of the target control experiment is obtained.

In particular, the second result data set of the target control experiment may include a plurality of experimental data of the target control experiment, which may include experimental data of an experimental group and experimental data of a control group, similar to the first result data set of the target control experiment, wherein each experimental data may include attribute information of an experimental subject and experimental results. In practical applications, the method may further include: obtaining a target result data set of the target control experiment, dividing the target result data set into two parts according to a first preset proportion, wherein one part is used as the first result data set, and the other part is used as the second result data set; in practical applications, the first preset ratio may be set according to practical application requirements, for example, the first preset ratio may be 5:5.

S603: inputting the second result data set into the target heterogeneity analysis model, and carrying out result heterogeneity analysis on the characteristics to be analyzed to obtain a heterogeneity analysis verification tree of the target control experiment.

In one embodiment, the specific process of inputting the second result data set into the target heterogeneity analysis model and performing the result heterogeneity analysis on the feature to be analyzed to obtain the heterogeneity analysis verification tree of the target control experiment is similar to S205, and reference may be made to the related description of S205, which is not repeated herein.

S605: and determining invalid leaf nodes in the heterogeneity analysis result tree according to the heterogeneity analysis verification tree.

In this embodiment of the present application, the heterogeneity analysis verification tree may also include a plurality of leaf nodes, where nodes in the heterogeneity analysis result tree with different ranks than the heterogeneity analysis verification tree may be determined, and the nodes with different ranks are used as invalid leaf nodes in the heterogeneity analysis result tree.

S607: and simplifying the heterogeneity analysis result tree according to the invalid node to obtain a processed heterogeneity analysis result tree.

In the embodiment of the application, the invalid leaf nodes can be removed from the heterogeneity analysis result tree, so that the simplification treatment of the heterogeneity analysis result tree is realized, and the heterogeneity analysis result tree after the treatment can be displayed on an experimental result comparison page later; the heterogeneity analysis verification tree of the target control experiment is generated by acquiring the second result data set of the target control experiment, and the heterogeneity analysis verification tree is combined with the heterogeneity analysis result tree to obtain an intersection set, so that the heterogeneity analysis result tree is simpler, the obtained tree structure is more stable, the influence of abnormal values is avoided, and the reliability of the result heterogeneity analysis is improved.

In an embodiment of the present application, referring to fig. 7, the method may further include:

s701: a third result dataset of the target control experiment is obtained.

Specifically, the third result data set of the target control experiment may also include a plurality of experimental data of the target control experiment, which may include experimental data of an experimental group and experimental data of a control group, similar to the first result data set and the second result data set of the target control experiment, wherein each experimental data may include attribute information of an experimental subject and experimental results. In practical applications, the method may further include: obtaining an original result data set of the target control experiment, dividing the original result data set into two parts according to a second preset proportion, wherein one part is used as the target result data set (dividing the target result data set into two parts according to a first preset proportion, one part is used as the first result data set, the other part is used as the second result data set), and the other part is used as the third result data set; in practical applications, the second preset ratio may be set according to practical application requirements, for example, the second preset ratio may be 5:5.

S703: and respectively extracting a result data set corresponding to each leaf node in the heterogeneity analysis result tree from the third result data set.

In the embodiment of the present application, taking the above-mentioned feature to be analyzed as an example of industry, the leaf nodes in the heterogeneity analysis result tree may include, for example, games, electronic commerce, web services, and the like; the result data set corresponding to the industry for the game, the result data set corresponding to the industry for the electronic commerce and the result data set corresponding to the industry for the web service can be respectively extracted from the third result data set. In one embodiment, the result data set corresponding to each leaf node may include two parts, one part is a result data set with a characteristic value equal to the value of the leaf node, and the other part is a result data set with a characteristic value not equal to the value of the leaf node.

S705: and calculating the result statistical data corresponding to each leaf node according to the result data set corresponding to each leaf node.

In practical application, after carrying out result heterogeneity analysis on a feature to be analyzed by using a first result data set and a target heterogeneity analysis model, obtaining a heterogeneity analysis result tree representing heterogeneity information of the target control experiment on a plurality of feature values of the feature to be analyzed, wherein in one embodiment, the heterogeneity analysis result tree can comprise a plurality of feature value nodes (namely leaf nodes of a tree structure) arranged according to the magnitude of heterogeneity intensity, the heterogeneity intensity is sequentially weakened from left to right, and the heterogeneity information on the plurality of feature values of the feature to be analyzed can be clearly and intuitively obtained by using the tree structure, so that the key value (higher in heterogeneity intensity) and the non-key value (lower in heterogeneity intensity) of the feature to be analyzed can be rapidly and accurately determined; in order to further analyze the specific performance of the target control experiment on the values of the features to be analyzed, a third result data set of the target control experiment can be obtained, and the result statistical data corresponding to each leaf node is calculated; for example, the sample duty ratio of each leaf node is calculated according to the third result data (the specific gravity of the data amount in the third result data amount in the result data amount corresponding to each leaf node is calculated), for example, after the result data set corresponding to the industry as the game is extracted from the third result data amount, the specific gravity of the result data set corresponding to the industry as the game in the third result data amount can be calculated, so as to obtain the sample duty ratio corresponding to the leaf node corresponding to the industry as the game; the lifting index of each leaf node may also be calculated according to the third result data, for example, after extracting the result data set corresponding to each leaf node from the third result data set, the lifting degree of gmv of the experimental group relative to the control group is calculated according to the result data corresponding to the experimental group and the result data corresponding to the control group on the corresponding result data set, and this lifting degree gmv is used as the lifting index of the leaf node.

In the embodiment of the present application, the above-mentioned result statistics may represent statistics index information of the target control experiment on the corresponding leaf node; in a specific embodiment, the result statistics data may include a sample duty ratio and a lifting index, where the sample duty ratio may represent a proportion of a data amount in the result data corresponding to the leaf node in the third result data amount; the lifting index can represent the influence degree of the target strategy on the result data set corresponding to the leaf node relative to the original strategy; in practical applications, the above-mentioned promotion index is a positive or negative value, when the promotion index is positive, it indicates that the target policy has a positive effect (e.g. profit) with respect to the original policy on the result data corresponding to the leaf node, when the promotion index is negative, it indicates that the target policy has a negative effect (e.g. loss) with respect to the original policy on the result data corresponding to the leaf node, and in one embodiment, when the experimental result includes gmv value, the above-mentioned promotion index may represent gmv promotion degree of the experimental group with respect to the control group on the result data corresponding to the leaf node, which may be calculated by using the following formula:

Wherein Q is _m Representing the corresponding lifting index of leaf node m, A _m 1 represents the sum of gmv values of the experimental group in the result dataset corresponding to leaf node m, A _m 2 represents the sum of gmv values of the control group in the result dataset corresponding to leaf node m.

In one embodiment, when the result data set corresponding to each leaf node includes two parts, one part is the result data set corresponding to the feature value equal to the leaf node value, and the other part is the result data set corresponding to the feature value not equal to the leaf node value, the corresponding result statistical data corresponding to each leaf node may also include two parts, one part is the statistical data corresponding to the feature value equal to the leaf node value, and the other part is the statistical data corresponding to the feature value not equal to the leaf node value.

On the basis of the heterogeneity analysis result tree shown in fig. 4, the embodiment of the present application provides a tree structure schematic diagram including the result statistics data corresponding to each leaf node, referring to fig. 8, the result statistics data corresponding to each leaf node may include two parts, one part is the statistics data corresponding to the value of the leaf node (i.e. the statistics data in the rectangular frame corresponding to the Y branch of each leaf node includes a lifting index-0.14 and a sample ratio 18%, for example, the statistics data in the rectangular frame corresponding to the Y branch of the game includes a lifting index-0.14, which indicates that the lifting degree of gmv of the experimental group relative to the control group in the data of the part of the game in the industry is-0.14, and the specific gravity of the data of the part of the game in the third result data amount in the industry is 18%); the other part is the statistical data corresponding to the characteristic value not equal to the value of the leaf node (namely, the statistical data in the rectangular frame corresponding to the N branch of each leaf node comprises a lifting index of-0.79 and a sample duty ratio of 82 percent, for example, the lifting degree of an experiment group relative to gmv of a control group in the part of data of industry which is not a game is-0.79, the proportion of the part of data of industry which is not a game in the third result data amount is 82 percent, the statistical data in the rectangular frame corresponding to the N branch of an electronic commerce comprises a lifting index of 0.064 and a sample duty ratio of 58 percent, for example, the lifting degree of gmv of the experiment group relative to the control group in the part of data of industry which is not a game and not an electronic commerce is 0.064, and the proportion of the data of the part of data of industry which is not a game and not an electronic commerce in the third result data amount is 58 percent).

The largest characteristic value is game, the game is taken as a first leaf node, the first result data set is divided into two parts of industry game and industry game not, the data of industry game and industry game not are taken as target result data subsets, the target result data subsets are taken as first result data sets, and the data of industry game isThe characteristic value (assuming that 2 mobile internal sites and mobile alliance are adopted) of the characteristic to be analyzed (put site) in the second dimension in the second part of data is traversed, the heterogeneity intensity value is calculated, the mobile internal site with the largest heterogeneity intensity value in the part of data of 'industry is game' is determined, the mobile internal site is determined as the adjacent node of the longitudinal direction of the leaf node of the game, and the mobile alliance is then determined; assuming that the second dimension of the data of the "industry is e-commerce" and the "industry is web service" has only mobile internal sites, the heterogeneity analysis result tree of the target control experiment shown in fig. 5 can be obtained.

On the basis of the heterogeneity analysis result tree shown in fig. 5, the embodiment of the present application also provides a tree structure schematic diagram including the result statistics data corresponding to each leaf node, referring to fig. 9, the result statistics data corresponding to each leaf node may include two parts, one part is characterized by having a feature value equal to the statistics data corresponding to the leaf node value (i.e., the statistics data in the rectangular frame corresponding to the Y branch of each leaf node, for example, the rectangular frame corresponding to the Y branch of the leftmost mobile internal site includes a lifting index of-0.15 and a sample ratio of 8%, which indicates that the industry is a game, and the lifting degree of gmv of the experimental group relative to the control group in the data of the part of the delivery site that is the mobile internal site is-0.15, and the specific gravity of the data amount in the third result data amount is 8%); the other part is the statistics data corresponding to the feature value not equal to the value of the leaf node (i.e. the rectangular frame corresponding to the N branches of each leaf node, for example, the statistics data in the rectangular frame corresponding to the N branches of the leftmost mobile internal site includes a lifting index of-0.14 and a sample ratio of 10%, which means that the lifting degree of the experimental group relative to gmv of the control group in the part of data of the industry, which is a game and the non-mobile internal site of the put-in site, is-0.14, and the specific gravity of the part of data volume in the third result data volume is 10%.

By calculating the result statistical data corresponding to each leaf node according to the result data set corresponding to each leaf node, the specific performance of the target comparison experiment on each characteristic value of the feature to be analyzed (for example, the specific performance of the target strategy on the samples of which characteristic values are improved greatly) can be scientifically and flexibly determined according to the actual analysis requirement, and the strategy can be pushed and improved in a targeted manner.

In an embodiment of the present application, referring to fig. 11, the method may further include:

s1101: and screening out leaf nodes to be processed according to the result statistical data corresponding to each leaf node in the heterogeneity analysis result tree.

In this embodiment of the present application, the screening of the leaf nodes to be processed according to the result statistics data corresponding to each leaf node in the heterogeneity analysis result tree may include, for example, screening out a leaf node with a lifting index greater than a first preset threshold and a sample occupation ratio less than a second preset threshold as the leaf node to be processed according to the result statistics data corresponding to each leaf node in the heterogeneity analysis result tree. The leaf nodes to be processed are screened out according to the result statistical data corresponding to each leaf node in the heterogeneity analysis result tree, so that the leaf nodes to be processed can be accurately positioned, and the targeted optimization of strategies and the like can be accurately and efficiently realized subsequently.

S1103: and extracting result statistical data corresponding to the leaf nodes to be processed.

Specifically, after the leaf nodes to be processed are screened out according to the result statistical data corresponding to each leaf node in the heterogeneity analysis result tree, the result statistical data corresponding to the leaf nodes to be processed can be extracted, so that the processing strategy can be conveniently and further determined.

S1105: and determining the processing strategy corresponding to the leaf node to be processed according to the result statistical data and the result processing rule corresponding to the leaf node to be processed.

In this embodiment of the present application, the above result processing rule may be set in combination with a large number of actual processing tests and actual application requirements, for example, when a leaf node whose lifting index is greater than a first preset threshold and whose sample ratio is less than a second preset threshold is selected as a leaf node to be processed according to the result statistics data corresponding to each leaf node in the heterogeneity analysis result tree, the result statistics data corresponding to the leaf node to be processed may be extracted, and the feature value is determined to be a part of the leaf node to be processed, and the amount of addition of ten percent of the total amount is increased based on the current sample ratio of the leaf node to be processed; when leaf nodes with a lifting index smaller than the third preset threshold and a sample ratio smaller than the fourth preset threshold are selected as leaf nodes to be processed according to the result statistical data corresponding to each leaf node in the heterogeneity analysis result tree, the use of the target strategy can be suspended on the feature values corresponding to the leaf nodes to be processed so as to reduce the loss, and the application is not limited to this.

And determining a processing strategy corresponding to the leaf node to be processed according to the result statistical data and the result processing rule corresponding to the leaf node to be processed, thereby being beneficial to carrying out targeted adjustment of the strategy by combining with actual statistical data and improving the adaptability and reliability of the strategy adjustment.

In an embodiment of the present application, referring to fig. 12, the method may further include:

s1201: and responding to the experiment result query instruction, and generating an experiment result control page according to the heterogeneity analysis result tree of the target control experiment.

In practical application, a heterogeneity analysis result tree of a target comparison experiment can be generated in advance through an off-line pre-calculation mode and stored in an experiment effect database, so that when an experiment result comparison page is generated in response to an experiment result query instruction, data pre-stored in the experiment effect database can be directly called, the efficiency of generating the experiment result comparison page is improved, and the katon is reduced; specifically, the above-mentioned experiment result query instruction may include a result query instruction of the target control experiment, and specifically, the above-mentioned experiment result query instruction may be issued when the experiment result query control is detected to be triggered; the method can generate an experimental result comparison page according to the heterogeneity analysis result tree of the target comparison experiment, in some embodiments, the original result data set of the target comparison experiment can also be obtained, the above experimental result comparison page is generated by combining the original result data set of the target comparison experiment and the heterogeneity analysis result tree of the target comparison experiment, the heterogeneity analysis result tree is displayed below a table corresponding to the original data, the comparison analysis is carried out intuitively and flexibly by combining the original result data set and the heterogeneity analysis result tree, and the flexibility and efficiency of the result heterogeneity analysis are improved.

In an embodiment, the experiment result query instruction may carry display layer number information, where the display layer number information may represent a display number limitation condition of the heterogeneity analysis result tree, and in an actual application, the display layer number information may be set in combination with an actual application requirement, and in a specific embodiment, the display layer number information may include: 5 layers (i.e. the first dimension of the feature to be analyzed shows only the first 5 feature values). Referring to fig. 13, the generating an experiment result comparison page according to the heterogeneity analysis result tree of the target comparison experiment in response to the experiment result query instruction may include:

s1301: and responding to the experimental result query instruction, and intercepting the display layer number of the heterogeneity analysis result tree of the target control experiment according to the display layer number information to obtain the intercepted heterogeneity analysis result tree.

In the embodiment of the application, since the heterogeneity analysis result tree may include a plurality of feature value nodes (i.e., leaf nodes of the tree structure) arranged according to the magnitude of the heterogeneity intensity, the heterogeneity intensity sequentially weakens from left to right, so that the key value (higher in heterogeneity intensity) and the non-key value (lower in heterogeneity intensity) of the feature to be analyzed can be rapidly and accurately determined; the heterogeneity analysis result tree of the target control experiment is intercepted according to the display layer number information, which is equivalent to the extraction of key values or the elimination of interference data, so that the method is favorable for quickly and intuitively determining the key values to further analyze, eliminating the interference of irrelevant data and improving the efficiency of heterogeneity analysis of results.

S1303: and generating an experimental result comparison page according to the intercepted heterogeneity analysis result tree.

After the number of display layers of the heterogeneity analysis result tree of the target control experiment is intercepted according to the information of the number of display layers to obtain the intercepted heterogeneity analysis result tree, an experiment result comparison page can be generated according to the intercepted heterogeneity analysis result tree, so that the experiment result comparison page is more visual and concise.

S1203: and displaying the experimental result comparison page.

In some embodiments, the step S601 to S607 may be further referred to perform a reduction process on the heterogeneity analysis result tree to obtain a processed heterogeneity analysis result tree, and then an experimental result comparison page may be generated according to the processed heterogeneity analysis result tree in response to the experimental result query instruction, and the experimental result comparison page may be displayed.

As can be seen from the technical solutions provided in the embodiments of the present application, a first result data set and heterogeneity analysis requirement information of a target control experiment are obtained; and then configuring the characteristics to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model, improving the flexibility of data processing, and finally inputting the first result data set into the target heterogeneity analysis model to perform result heterogeneity analysis on the characteristics to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree represents the heterogeneity information of the target control experiment on a plurality of characteristic values of the characteristics to be analyzed. The method is beneficial to scientifically and reliably analyzing the result heterogeneity by combining machine learning and a large amount of result data, improves the efficiency and accuracy of the result heterogeneity analysis, and can flexibly analyze the result heterogeneity on the multidimensional feature to be analyzed. By calculating the result statistical data corresponding to each leaf node according to the result data set corresponding to each leaf node, the specific performance of the target comparison experiment on each characteristic value of the feature to be analyzed (for example, the specific performance of the target strategy on the samples of which characteristic values are improved greatly) can be scientifically and flexibly determined according to the actual analysis requirement, and the strategy can be promoted and improved in a targeted manner. And screening out leaf nodes to be processed according to the result statistical data corresponding to each leaf node in the heterogeneity analysis result tree, extracting the result statistical data corresponding to the leaf nodes to be processed, and determining the processing strategy corresponding to the leaf nodes to be processed according to the result statistical data corresponding to the leaf nodes to be processed and the result processing rule, thereby being beneficial to carrying out targeted adjustment of strategy by combining with actual statistical data and improving the adaptability and reliability of strategy adjustment.

The embodiment of the application also provides a data processing device, as shown in fig. 14, where the device may include:

the data acquisition module 1410 is configured to acquire a first result data set and heterogeneity analysis requirement information of a target control experiment;

the characteristic parameter configuration module 1420 is configured to configure the to-be-analyzed characteristic of the original heterogeneity analysis model based on the heterogeneity analysis demand information, so as to obtain a target heterogeneity analysis model;

the heterogeneity analysis module 1430 is configured to input the first result data set into the target heterogeneity analysis model, perform result heterogeneity analysis on the feature to be analyzed, and obtain a heterogeneity analysis result tree of the target control experiment, where the heterogeneity analysis result tree characterizes heterogeneity information of the target control experiment on a plurality of feature values of the feature to be analyzed.

In one embodiment, the target heterogeneity analysis model includes a heterogeneity intensity calculation layer, a tree node generation layer, a data classification layer, and a recursive computation layer, and the heterogeneity analysis module 1430 may include:

a heterogeneity intensity calculation unit, configured to traverse each feature value of the feature to be analyzed in the first result data set in the heterogeneity intensity calculation layer, and calculate heterogeneity information of each feature value on the first result data set respectively;

A tree node generating unit, configured to determine, in the leaf node generating layer, a target feature value according to the heterogeneity information of each feature value on the first result dataset, and generate a target leaf node according to the target feature value;

the data classification unit is used for performing data classification on the first result data set according to the target leaf node in the data classification layer to obtain a target result data subset;

and the recursive calculation unit is used for taking the target result data subset as the first result data set in the recursive calculation layer, repeating the steps in the heterogeneity intensity calculation layer, traversing each characteristic value of the to-be-analyzed characteristic in the first result data set, respectively calculating the heterogeneity information of each characteristic value on the first result data set, and carrying out data classification on the first result data set according to the target leaf node in the data classification layer to obtain a target result data subset until the target result data subset is empty, thereby obtaining the heterogeneity analysis result tree of the target comparison experiment.

In one embodiment, the apparatus may further include:

The second result data set acquisition module is used for acquiring a second result data set of the target control experiment;

the verification tree generation module is used for inputting the second result data set into the target heterogeneity analysis model, and carrying out result heterogeneity analysis on the feature to be analyzed to obtain a heterogeneity analysis verification tree of the target control experiment;

the invalid leaf node determining module is used for determining invalid leaf nodes in the heterogeneity analysis result tree according to the heterogeneity analysis verification tree;

and the simplifying module is used for simplifying the heterogeneity analysis result tree according to the invalid leaf node to obtain the processed heterogeneity analysis result tree.

In one embodiment, the apparatus may further include:

a third result data set acquisition module, configured to acquire a third result data set of the target control experiment;

the result data set extraction module is used for respectively extracting a result data set corresponding to each leaf node in the heterogeneity analysis result tree from the third result data set;

and the statistical data calculation module is used for calculating the result statistical data corresponding to each leaf node according to the result data set corresponding to each leaf node.

In an embodiment of the present application, the foregoing apparatus may further include:

the leaf node screening module is used for screening leaf nodes to be processed according to the result statistical data corresponding to each leaf node;

the statistical data extraction module is used for extracting result statistical data corresponding to the leaf nodes to be processed;

and the processing strategy determining module is used for determining the processing strategy corresponding to the leaf node to be processed according to the result statistical data and the processing rule corresponding to the leaf node to be processed.

the experimental result comparison page generation module is used for responding to an experimental result query instruction and generating an experimental result comparison page according to the heterogeneity analysis result tree of the target comparison experiment;

and the page display module is used for displaying the experimental result comparison page.

In an embodiment, the experiment result query instruction carries display layer number information, and the experiment result comparison page generating module may include:

the display layer number intercepting module is used for responding to the experiment result query instruction, intercepting the display layer number of the heterogeneity analysis result tree of the target control experiment according to the display layer number information, and obtaining the intercepted heterogeneity analysis result tree;

And the result page generation module is used for generating an experimental result comparison page according to the intercepted heterogeneity analysis result tree.

The device and method embodiments in the device embodiments described are based on the same application concept.

The present application provides a computer device including a processor and a memory, where at least one instruction or at least one program is stored in the memory, where the at least one instruction or the at least one program is loaded and executed by the processor to implement a data processing method as provided in the method embodiments above.

The memory may be used to store software programs and modules that the processor executes to perform various functional applications and data processing by executing the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for functions, and the like; the storage data area may store data created according to the use of the device, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory may also include a memory controller to provide access to the memory by the processor.

The method embodiments provided in the embodiments of the present application may be performed in a mobile terminal, a computer terminal, a server, or a similar computing device, i.e., the above-mentioned computer device may include a mobile terminal, a computer terminal, a server, or a similar computing device. Wherein, the server can be an independent physical server or a plurality of physical clothesThe server cluster or the distributed system formed by the servers can also be cloud servers for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. Taking the operation on the server as an example, fig. 15 is a block diagram of a hardware structure of a server for implementing the above data processing method according to an embodiment of the present application. As shown in fig. 15, the server 1500 may vary considerably in configuration or performance and may include one or more central processing units (Central Processing Units, CPU) 1510 (the processor 1510 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), a memory 1530 for storing data, one or more storage mediums 1520 (e.g., one or more mass storage devices) for storing applications 1523 or data 1522. Wherein the memory 1530 and storage medium 1520 can be transitory or persistent storage. The program stored in the storage medium 1520 may include one or more modules, each of which may include a series of instruction operations on a server. Still further, the central processor 1510 may be arranged to communicate with a storage medium 1520, executing a series of instruction operations in the storage medium 1520 on the server 1500. Server 1500 may also include one or more power supplies 1560, one or more wired or wireless network interfaces 1550, one or more input/output interfaces 1540, and/or one or more operating systems 1521, such as a Windows Server ^TM ，Mac OS X ^TM ，Unix ^TM ,Linux ^TM ，FreeBSD ^TM Etc.

The processor 1510 may be an integrated circuit chip having signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, or the like, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.

Input-output interface 1540 may be used to receive or transmit data via a network. The specific example of the network described above may include a wireless network provided by a communication provider of the server 1500. In one example, input/output interface 1540 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the input-output interface 1540 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

The operating system 1521 may include system programs, such as a framework layer, a core library layer, a driver layer, etc., for handling various basic system services and hardware-related tasks.

It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 15 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, server 1500 may also include more or fewer components than shown in fig. 15, or have a different configuration than shown in fig. 15.

Embodiments of the present application also provide a computer readable storage medium that may be disposed in a server to store at least one instruction or at least one program for implementing a data processing method in a method embodiment, where the at least one instruction or the at least one program is loaded and executed by the processor to implement the data processing method provided in the method embodiment.

Alternatively, in this embodiment, the storage medium may be located in at least one network server among a plurality of network servers of the computer network. Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in the various alternative implementations described above.

As can be seen from the above embodiments of the data processing method, apparatus, device, storage medium or computer program provided by the present application, the present application provides a method for determining a first result data set and heterogeneity analysis requirement information of a target control experiment; and then configuring the characteristics to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model, improving the flexibility of data processing, and finally inputting the first result data set into the target heterogeneity analysis model to perform result heterogeneity analysis on the characteristics to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree represents the heterogeneity information of the target control experiment on a plurality of characteristic values of the characteristics to be analyzed. The method is beneficial to scientifically and reliably analyzing the result heterogeneity by combining machine learning and a large amount of result data, improves the efficiency and accuracy of the result heterogeneity analysis, and can flexibly analyze the result heterogeneity on the multidimensional feature to be analyzed. By calculating the result statistical data corresponding to each leaf node according to the result data set corresponding to each leaf node, the specific performance of the target comparison experiment on each characteristic value of the feature to be analyzed (for example, the specific performance of the target strategy on the samples of which characteristic values are improved greatly) can be scientifically and flexibly determined according to the actual analysis requirement, and the strategy can be promoted and improved in a targeted manner. And screening out leaf nodes to be processed according to the result statistical data corresponding to each leaf node in the heterogeneity analysis result tree, extracting the result statistical data corresponding to the leaf nodes to be processed, and determining the processing strategy corresponding to the leaf nodes to be processed according to the result statistical data corresponding to the leaf nodes to be processed and the result processing rule, thereby being beneficial to carrying out targeted adjustment of strategy by combining with actual statistical data and improving the adaptability and reliability of strategy adjustment.

It should be noted that: the foregoing sequence of the embodiments of the present application is only for describing, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, devices and storage medium embodiments, the description is relatively simple as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the present application is not intended to limit the invention to the particular embodiments of the present application, but to limit the scope of the invention to the particular embodiments of the present application.

Claims

1. A method of data processing, the method comprising:

acquiring a first result data set and heterogeneity analysis demand information of a target control experiment, wherein the target control experiment is a test aiming at a target strategy by taking an advertisement as an experimental object, the target strategy is a popup window display of the advertisement, the original strategy is an embedded display of the advertisement, the first result data set comprises an industry, a putting site and an experimental result of the advertisement, the experimental result indicates the total value of values generated by advertisement conversion, and the heterogeneity represents that the causality effect of the same strategy on different characteristic values of the experimental object is different;

configuring the to-be-analyzed characteristics of an original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model, wherein the target heterogeneity analysis model comprises a heterogeneity intensity calculation layer, a leaf node generation layer, a data classification layer and a recursion calculation layer;

Traversing each characteristic value of the characteristic to be analyzed in the first result data set in the heterogeneity intensity calculation layer, and respectively calculating the heterogeneity information of each characteristic value on the first result data set; the heterogeneity information comprises a heterogeneity intensity value of the corresponding characteristic value on the first result data set, and the heterogeneity intensity value is calculated by using the following formula:

S _l ＝(X _i ,Y _i ,T _i |X _i ∈X _l )

wherein S is _l Representing the data partitions L corresponding to the characteristic values, wherein L represents the total number of the data partitions and X _i Representing the characteristic value of the experimental object, Y _i The experimental result is shown as T _i Indicating whether the target policy is to be used,

represent S _l Mean value corresponding to experimental results of experimental group or control group in this partition, ++>

Represent S _l Mean value, N, of experimental results of the control group in this zone _l,t Represent S _l The number of experimental or control groups within this partition, τ (S _l ) Represent S _l Average causality in this partition, +.>

Representing the heterogeneity intensity value;

in the leaf node generation layer, determining a target feature value according to the heterogeneity information of each feature value on the first result data set, and generating a target leaf node according to the target feature value;

In the data classification layer, performing data classification on the first result data set according to the target leaf node to obtain a target result data subset;

and in the recursive computation layer, taking the target result data subset as the first result data set, repeating the steps in the heterogeneity intensity computation layer, traversing each characteristic value of the characteristic to be analyzed in the first result data set, respectively computing the heterogeneity information of each characteristic value on the first result data set, and in the data classification layer, carrying out data classification on the first result data set according to the target leaf node to obtain a target result data subset until the target result data subset is empty, thereby obtaining a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree represents the heterogeneity information of the target control experiment on a plurality of characteristic values of the characteristic to be analyzed.

2. The method according to claim 1, wherein the method further comprises:

acquiring a second result data set of the target control experiment;

inputting the second result data set into the target heterogeneity analysis model, and carrying out result heterogeneity analysis on the feature to be analyzed to obtain a heterogeneity analysis verification tree of the target control experiment;

Determining invalid leaf nodes in the heterogeneity analysis result tree according to the heterogeneity analysis verification tree;

and simplifying the heterogeneity analysis result tree according to the invalid leaf node to obtain the processed heterogeneity analysis result tree.

3. The method according to claim 1, wherein the method further comprises:

acquiring a third result data set of the target control experiment;

respectively extracting a result data set corresponding to each leaf node in the heterogeneity analysis result tree from the third result data set;

and calculating the result statistical data corresponding to each leaf node according to the result data set corresponding to each leaf node.

4. A method according to claim 3, characterized in that the method further comprises:

screening leaf nodes to be processed according to the result statistical data corresponding to each leaf node;

extracting result statistical data corresponding to the leaf nodes to be processed;

and determining a processing strategy corresponding to the leaf node to be processed according to the result statistical data and the processing rule corresponding to the leaf node to be processed.

5. The method according to claim 1, wherein the method further comprises:

Responding to an experiment result query instruction, and generating an experiment result control page according to the heterogeneity analysis result tree of the target control experiment;

and displaying the experimental result comparison page.

6. The method of claim 5, wherein the experiment result query instruction carries display layer number information, and wherein generating the experiment result comparison page according to the heterogeneity analysis result tree of the target comparison experiment in response to the experiment result query instruction comprises:

responding to an experiment result query instruction, and intercepting the display layer number of the heterogeneity analysis result tree of the target control experiment according to the display layer number information to obtain an intercepted heterogeneity analysis result tree;

and generating an experimental result comparison page according to the intercepted heterogeneity analysis result tree.

7. A data processing apparatus, the apparatus comprising:

the system comprises a data acquisition module, a target control experiment, a target strategy analysis module and a target analysis module, wherein the data acquisition module is used for acquiring a first result data set and heterogeneity analysis demand information of a target control experiment, the target control experiment is a test of a target strategy taking an advertisement as an experiment object, the target strategy is a popup window display of the advertisement, the original strategy is an embedded display of the advertisement, the first result data set comprises an advertisement industry, a delivery site and an experiment result, the experiment result indicates the total value of values generated by advertisement conversion, and the causal effect of the heterogeneity representation same strategy on different characteristic values of the experiment object is different;

The characteristic parameter configuration module is used for configuring the characteristics to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model, wherein the target heterogeneity analysis model comprises a heterogeneity intensity calculation layer, a leaf node generation layer, a data classification layer and a recursion calculation layer;

the heterogeneity analysis module is used for traversing each characteristic value of the feature to be analyzed in the first result data set in the heterogeneity intensity calculation layer and respectively calculating the heterogeneity information of each characteristic value on the first result data set; the heterogeneity information comprises a heterogeneity intensity value of the corresponding characteristic value on the first result data set, and the heterogeneity intensity value is calculated by using the following formula:

S _l ＝(X _i ,Y _i ,T _i |X _i ∈X _l )

Represent S _l The mean value of the experimental results of the experimental group in this partition, +. >

Representing the heterogeneity intensity value;

8. A data processing apparatus, characterized in that the apparatus comprises a processor and a memory, in which at least one instruction or at least one program is stored, which is loaded and executed by the processor to implement the data processing method according to any one of claims 1 to 6.

9. A computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement the data processing method of any one of claims 1 to 6.