CN112862536A - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN112862536A
CN112862536A CN202110213201.XA CN202110213201A CN112862536A CN 112862536 A CN112862536 A CN 112862536A CN 202110213201 A CN202110213201 A CN 202110213201A CN 112862536 A CN112862536 A CN 112862536A
Authority
CN
China
Prior art keywords
result
heterogeneity
target
heterogeneity analysis
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110213201.XA
Other languages
Chinese (zh)
Other versions
CN112862536B (en
Inventor
邓颖
蔡政
李成龙
任宇堃
朱志华
蔡越
李池洋
林晓健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110213201.XA priority Critical patent/CN112862536B/en
Publication of CN112862536A publication Critical patent/CN112862536A/en
Application granted granted Critical
Publication of CN112862536B publication Critical patent/CN112862536B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute
    • G06Q30/0271Personalized advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data processing method, a device, equipment and a storage medium, wherein the method comprises the steps of obtaining a first result data set and heterogeneity analysis demand information of a target contrast experiment; configuring the characteristics to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model; and inputting the first result data set into the target heterogeneity analysis model, and performing result heterogeneity analysis on the features to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree represents heterogeneity information of the target control experiment on a plurality of feature values of the features to be analyzed. By the aid of the technical scheme, result heterogeneity can be rapidly and accurately analyzed by combining a heterogeneity analysis model, and efficiency and reliability of result heterogeneity analysis are improved.

Description

Data processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a data processing method, apparatus, device, and storage medium.
Background
When the internet experiment is performed, besides the effect of the experiment strategy on the whole sample set, the influence of the same experiment strategy on the experiment object may be different due to different characteristics of the experiment object (for example, the experiment is in an advertisement display mode, the object is an advertisement, and the experiment result may have great difference for advertisements in different industries), and the heterogeneity analysis of the experiment result is also required, that is, it is determined which values of the characteristics of the experiment strategy have significant effect, and which values have weak effect, so that the experiment strategy can be adjusted in a targeted manner in the follow-up process, and the adaptability and flexibility of the experiment strategy are improved.
In the prior art, when experimental result heterogeneity analysis is performed, a sample set is divided into groups according to requirements, and then change of indexes is determined through a simple t-test, which may cause a high probability of first-class errors (false rejection, correct original hypothesis, and false rejection), and when there are many analysis dimensions and groups, there is a case of sparse samples, which may cause poor accuracy of heterogeneity analysis results.
Disclosure of Invention
In order to solve the problems in the prior art, the present application provides a data processing method, apparatus, device and storage medium. The technical scheme is as follows:
one aspect of the present application provides a data processing method, where the method includes:
acquiring a first result data set and heterogeneity analysis demand information of a target control experiment;
configuring the characteristics to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model;
and inputting the first result data set into the target heterogeneity analysis model, and performing result heterogeneity analysis on the features to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree represents heterogeneity information of the target control experiment on a plurality of feature values of the features to be analyzed.
Another aspect of the present application provides a data processing apparatus, including:
the data acquisition module is used for acquiring a first result data set and heterogeneity analysis demand information of a target control experiment;
the characteristic parameter configuration module is used for configuring the characteristics to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model;
and the heterogeneity analysis module is used for inputting the first result data set into the target heterogeneity analysis model, and performing result heterogeneity analysis on the features to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree represents heterogeneity information of the target control experiment on a plurality of feature values of the features to be analyzed.
Another aspect of the present application provides a data processing apparatus, which includes a processor and a memory, wherein at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded by the processor and executed to implement the data processing method as described above.
Another aspect of the present application provides a computer-readable storage medium, in which at least one instruction or at least one program is stored, and the at least one instruction or the at least one program is loaded and executed by a processor to implement the data processing method as described above.
The data processing method, the data processing device, the data processing equipment and the storage medium have the following technical effects:
the method comprises the steps of obtaining a first result data set and heterogeneity analysis demand information of a target control experiment; and finally, inputting the first result data set into the target heterogeneity analysis model, and performing result heterogeneity analysis on the features to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree represents heterogeneity information of the target control experiment on a plurality of feature values of the features to be analyzed. The method can be combined with a heterogeneity analysis model to quickly and accurately analyze the result heterogeneity, and the efficiency and reliability of the result heterogeneity analysis are improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
In order to more clearly illustrate the technical solutions and advantages of the embodiments of the present application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic diagram of an application environment provided by an embodiment of the present application;
fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application;
FIG. 3 is a schematic flow chart diagram of another data processing method provided in the embodiments of the present application;
FIG. 4 is a schematic diagram of heterogeneity analysis result tree of a target control experiment provided in the examples of the present application;
FIG. 5 is a schematic diagram of a heterogeneity analysis result tree of another target control experiment provided in the examples of the present application;
FIG. 6 is a schematic flow chart diagram of another data processing method provided in the embodiments of the present application;
FIG. 7 is a schematic flow chart diagram of another data processing method provided in the embodiments of the present application;
FIG. 8 is a schematic diagram of a tree structure including result statistics corresponding to leaf nodes according to an embodiment of the present application;
FIG. 9 is a schematic diagram of another tree structure including result statistics corresponding to leaf nodes according to an embodiment of the present application;
FIG. 10 is a diagram illustrating a result statistics table generated according to a tree structure containing result statistics corresponding to leaf nodes according to an embodiment of the present application;
FIG. 11 is a schematic flow chart diagram illustrating another data processing method according to an embodiment of the present application;
FIG. 12 is a schematic flow chart diagram illustrating another data processing method according to an embodiment of the present application;
FIG. 13 is a schematic flow chart diagram of another data processing method provided in the embodiments of the present application;
FIG. 14 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
fig. 15 is a block diagram of a hardware structure of a server for implementing a data processing method according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. Examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
The scheme provided by the embodiment of the application relates to the technologies such as machine learning of artificial intelligence, and the like, and is specifically explained by the following embodiment.
Referring to fig. 1, fig. 1 is a schematic diagram of an application environment provided by the present application, and as shown in fig. 1, the application environment may include a server 01 and a client 02.
In the embodiment of the present application, the server 01 may be configured to obtain a result data set of a target control experiment and heterogeneity analysis demand information, and perform result heterogeneity analysis by combining with a target heterogeneity analysis model to obtain a heterogeneity analysis result tree of the target control experiment, where the heterogeneity analysis result tree may represent heterogeneity information of the target control experiment on a plurality of feature values of features to be analyzed. Optionally, the server 01 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.
In the embodiment of the present application, the client 02 may be configured to generate an experiment result comparison page according to the heterogeneity analysis result tree of the target comparison experiment, display the experiment result comparison page, and visually and clearly display the heterogeneity analysis result. In practical applications, the client 02 may include, but is not limited to, terminal devices such as a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart wearable device (e.g., a smart watch), a network device, a firewall, and the like.
In the embodiment of the present application, the server 01 and the client 02 may be directly or indirectly connected through a wired or wireless communication manner, and the present application is not limited herein.
Fig. 2 is a flow chart of a data processing method provided in an embodiment of the present application, and the present specification provides the method operation steps as described in the embodiment or the flow chart, but more or less operation steps can be included based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In practice, the system or server product may be implemented in a sequential or parallel manner (e.g., parallel processor or multi-threaded environment) according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 2, the method may include:
s201: and acquiring a first result data set and heterogeneity analysis demand information of the target control experiment.
Specifically, the target control experiment may include, but is not limited to, an a/B test of the target strategy, that is, by dividing the experimental subject into an experimental group and a control group through random sampling, the target strategy may be applied to the experimental group, the control group remains the original strategy, and then the difference between the experimental results of the experimental group and the control group is compared to obtain the effect of the target strategy. The experimental result may represent index information affected by a policy, and in practical application, the index affected by the policy may be set according to a practical application requirement, for example, when the experimental object is an advertisement, the experimental result may be gmv (Gross merchandisc Volume total value quantity), in this embodiment of the present application, a gmv value may represent a total value generated by advertisement conversion, and a gmv value may be obtained by multiplying the conversion number by a revenue generated by a single conversion. In a specific embodiment, the experimental subject may be an advertisement, the policy may be a display mode of the advertisement, the target policy may be a pop-up window display of the advertisement, the original policy may be an embedded display of the advertisement, the experiment group advertisement is displayed using the pop-up window, the original embedded display of the control group advertisement is maintained, the difference of the experiment results (gmv values) of the experiment group and the control group may be compared subsequently, and the effect of the pop-up window display of the advertisement relative to the embedded display of the advertisement may be obtained.
In this embodiment of the application, the first result data set may include a plurality of pieces of experimental data of the target control experiment, where the plurality of pieces of experimental data may include experimental data of an experimental group and experimental data of a control group, and the number of the experimental data of the experimental group is the same as that of the experimental data of the control group; each piece of experimental data may include attribute information of an experimental object and an experimental result; specifically, the attribute information of the subject may include at least one characteristic information of the subject, in practical applications, for example, when the subject is an advertisement, the attribute information of the subject may include, but is not limited to, industry and delivery site of the subject, the experimental result may include gmv value, and when the subject is a human, the attribute information of the subject may include, but is not limited to, gender and academic calendar of the subject. And by acquiring a large amount of abundant experimental data, the reliability of the result heterogeneity analysis is favorably improved.
In practical applications, since the experimental effect of the same strategy may vary from subject to subject (for example, the strategy is the display mode of advertisement, the subject is advertisement, and the experimental result may vary for advertisement of different industries), in order to analyze heterogeneous Causal effect (HTE) of the strategy, i.e. analysis of the heterogeneity of causal effects (action or influence of a strategy on an experimental subject) (the causal effects of the same strategy differ at different characteristic values of the experimental subject), and subsequent flexible adaptation of the strategies, it is necessary to use the results dataset for the analysis of the heterogeneity of the results, in the present embodiment, the heterogeneity analysis may be the significance level of the effect of the analysis strategy on different subjects, i.e. to determine which classes of subjects the strategy is effective significantly and which classes of subjects the strategy is less effective.
In the embodiment of the present application, the heterogeneity analysis requirement information may represent heterogeneity analysis feature information of a target control experiment, specifically, the heterogeneity analysis requirement information may include feature dimensions of an experimental object, and the feature dimensions may be set according to actual application requirements, for example, gender, school calendar, age, industry, delivery sites, and the like; in a specific embodiment, when the experimental subject is an advertisement, the characteristic dimension may include an industry or a delivery site; in some embodiments, when the characteristic dimension is more than one, the heterogeneity analysis requirement information may further include characteristic analysis sequence information, for example, when the characteristic dimension may include an industry and a delivery site, the characteristic analysis sequence information may be an industry-delivery site (a first dimension is an industry, and a second dimension is a delivery site, that is, an analysis is performed on the industry first and then on the delivery site), or a delivery site-industry (a first dimension is a delivery site, and a second dimension is an industry, that is, an analysis is performed on the delivery site first and then on the industry).
By acquiring a large amount of abundant experimental data, the reliability of the result heterogeneity analysis is favorably improved, the heterogeneity analysis demand information is acquired, and the heterogeneity analysis can be flexibly performed according to the heterogeneity analysis demand information.
S203: and configuring the characteristics to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model.
In practical applications, because the features of the experimental object may include multiple types, the feature to be analyzed of the original heterogeneity analysis model may be configured in combination with actual analysis requirements, and then the result heterogeneity analysis may be subsequently performed on a specific feature dimension (for example, the feature to be analyzed may include industries to analyze which industries have the most significant experimental effect, or the feature to be analyzed may include delivery sites to analyze which delivery sites have the most significant experimental effect, or both may be simultaneously analyzed).
In the embodiment of the present application, when there is only one feature dimension in the heterogeneity analysis requirement information, the feature dimension may be configured as a feature to be analyzed of the original heterogeneity analysis model, so as to obtain a target heterogeneity analysis model; when the feature dimension in the heterogeneity analysis requirement information is more than one, the heterogeneity analysis requirement information may further include feature analysis sequence information, and the configuring the feature to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis requirement information described above may obtain the target heterogeneity analysis model, and the obtaining the target heterogeneity analysis model may include: configuring the more than one feature dimension into the feature to be analyzed of the original heterogeneity analysis model according to the feature analysis sequence information to obtain a target heterogeneity analysis model; for example, the feature dimensions may include an industry and a delivery site, and the feature analysis sequence information may be an industry-delivery site, and then the industry may be configured as the feature to be analyzed in the first dimension, the delivery site may be configured as the feature to be analyzed in the second dimension, and so on.
The target heterogeneity analysis model is obtained by configuring the features to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information, which is equivalent to performing initialization configuration on the original heterogeneity analysis model according to the heterogeneity analysis demand information, and is beneficial to flexibly performing result heterogeneity analysis according to actual analysis demands.
S205: and inputting the first result data set into the target heterogeneity analysis model, and performing result heterogeneity analysis on the characteristics to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment.
Specifically, the heterogeneity analysis result tree may represent heterogeneity information of the target control experiment on a plurality of feature values of the features to be analyzed. In one embodiment, the heterogeneity analysis result tree may include a plurality of leaf nodes arranged according to heterogeneity intensity size, and the leaf nodes correspond to the characteristic values of the characteristics to be analyzed one by one, the heterogeneity intensity from left to right is weakened in sequence, the tree structure can be used for clearly and intuitively obtaining the heterogeneity information on the characteristic values of the characteristics to be analyzed, and then can confirm the above-mentioned key value (heterogeneity intensity is higher, the experimental effect is more showing) and non-key value (heterogeneity intensity is lower, the experimental effect is not showing significantly) of waiting to analyze the characteristic fast accurately, it is low to avoid the efficiency that only can choose the categorised effect of looking over of individual son at every turn and lead to in the group analysis among the prior art, problem visual inadequately, can treat accurately and carry out the heterogeneity analysis of result to the characteristic, and be favorable to promoting the efficiency and the flexibility of heterogeneity analysis.
In one embodiment, the target heterogeneity analysis model may be a causal tree model, and in particular, the target heterogeneity analysis model may include a heterogeneity intensity calculation layer, a leaf node generation layer, a data classification layer, and a recursive calculation layer.
Referring to fig. 3, the inputting the first result data set into the target heterogeneity analysis model, and performing result heterogeneity analysis on the features to be analyzed to obtain the heterogeneity analysis result tree of the target control experiment may include:
s301: in the heterogeneity intensity calculation layer, traversing each feature value of the feature to be analyzed in the first result data set, and calculating heterogeneity information of each feature value on the first result data set.
In the embodiment of the present application, when the feature to be analyzed is an industry, the feature value may include, for example, games, e-commerce, automobiles, finance, internet uniforms, and the like; when the feature to be analyzed is gender, the feature value may include: male and female; when the feature to be analyzed is a release site, the feature value may include: mobile home sites, mobile alliances, etc. The heterogeneity information of a feature value on the first result data set may include a heterogeneity intensity value of the feature value on the first result data set, and the heterogeneity intensity value of a feature value on the first result data set may be calculated by using the following formula:
Sl=(Xi,Yi,Ti|Xi∈Xl)
Figure BDA0002952184210000091
Figure BDA0002952184210000092
Figure BDA0002952184210000101
wherein S islExpressing the data partition L corresponding to the characteristic value, wherein L expresses the total number of the data partitions; in this embodiment, the data partition corresponding to each feature value may include 2 (L ═ 2), that is, the first result data set is classified twice according to the feature value; for example, when the feature to be analyzed is an industry, the feature value may include a game, and the first result data set may be divided into two data partitions, that is, "the industry is a game" and "the industry is not a game" according to the feature value; xiValues of characteristics (e.g. games, and others) representing subjects, YiIndicates the experimental results (e.g., gmv values), TiWhere {0,1} (T is 0 or 1) denotes whether or not the target policy is used, Ti1 indicates that the subject used the target strategy, Ti0 means that the target strategy is not used by the experimental object (the original strategy is kept unchanged);
Figure BDA0002952184210000102
denotes SlMean values (e.g., gmv mean values) corresponding to the results of the experiments in the experimental group or the control group in the zone
Figure BDA0002952184210000103
Denotes SlThe mean value of the experimental results of the experimental groups in this partition,
Figure BDA0002952184210000104
denotes SlMean values corresponding to the experimental results of the control group in this partition), N)l,tDenotes SlNumber of experimental or control groups (N) in this partitionl,1Denotes SlNumber of experimental groups in this partition, Nl,0Denotes SlThe number of control groups in this partition), τ (S)l) Denotes SlThe Average Causal Effect (ACE Average cause Effect) within this partition, SlThe average value of the effect of the target strategy in the partition relative to the original strategy;
Figure BDA0002952184210000105
the magnitude of heterogeneity, which may be a mean square error data, may be represented by a sum of squares of the mean causal effects of the two partitions, which may reflect the degree of difference between the estimator and the estimated volume,
Figure BDA0002952184210000106
the larger the value, the stronger the heterogeneity of the result on the characteristic value, i.e. the more remarkable the experimental effect, and the higher the value of further analysis for the characteristic value.
In this embodiment, each feature value of the feature to be analyzed in the first result data set may be traversed, and the heterogeneity intensity value of each feature value on the first result data set may be calculated, that is, each feature value may be used to perform a second classification on the first result data set, calculate an average causal effect on two data partitions, and further calculate the heterogeneity intensity value of the feature value on the first result data set
Figure BDA0002952184210000107
In the heterogeneity intensity calculation layer, each feature value of the feature to be analyzed in the first result data set is traversed, so that the heterogeneity information of each feature value on the first result data set can be scientifically and reliably calculated by combining the result data sets, and further, the generation of reliable heterogeneity analysis result trees is facilitated.
S303: in the leaf node generation layer, a target feature value is determined according to the heterogeneity information of each feature value on the first result data set, and a target leaf node is generated according to the target feature value.
In an embodiment of the present application, the heterogeneity information may include a heterogeneity intensity value, and the determining, in the leaf node generation layer, a target feature value according to the heterogeneity information of each feature value on the first result data set may include: according to the heterogeneity intensity value of each feature value on the first result data set, determining the feature value with the largest heterogeneity intensity value as the target feature value, and configuring the target feature value as a target leaf node (the target leaf node at this time is the first leaf node of the heterogeneity analysis result tree).
S305: in the data classification layer, data classification is performed on the first result data set according to the target feature value node, so that a target result data subset is obtained.
In this embodiment of the application, the first result data set may be subjected to secondary classification according to the target feature value node, for example, the feature to be analyzed is an industry, the calculated feature value with the largest heterogeneity intensity value is a game, the target leaf node is a game, the first result data set may be further divided into two parts, namely "the industry is a game" and "the industry is not a game" according to the game, and at this time, the data of "the industry is not a game" may be used as a target result data subset, and subsequent analysis may be continued.
S307: in the recursive computation layer, the steps S301 to S305 are repeated with the target result data subset as the first result data set until the target result data subset is empty, thereby obtaining the heterogeneity analysis result of the target control experiment.
In this embodiment, the target result data subset may be used as the first result data set, each feature value of the feature to be analyzed in the first result data set is continuously traversed (taking the feature to be analyzed as an example of the industry, the feature value in the first result data set at this time is not in a game), then a heterogeneity strength value of each feature value on the first result data set is respectively calculated, a feature value with the highest heterogeneity strength value is selected as the target feature value, the target feature value is configured as a target leaf node, the target leaf node at this time is an adjacent node to the first leaf node generated above, data classification is continuously performed according to the adjacent node to obtain a target result data subset, the target result data subset is used as the first result data set, and so on, until the target result data subset is empty, all feature values of the features to be analyzed on the first result data set generate corresponding leaf nodes, and the heterogeneous analysis result tree of the target control experiment is obtained.
Fig. 4 is a schematic diagram of a heterogeneity analysis result tree of a target control experiment provided in an embodiment of the present application, please refer to fig. 4, where only one feature to be analyzed is provided (industry), 5 feature values of the feature to be analyzed in the first result data set (game, e-commerce, web service, automobile, finance) are total, the 5 feature values are traversed to obtain a heterogeneity intensity value of the 5 feature values on the first result data set, and the heterogeneity intensity value is determined
Figure BDA0002952184210000121
The largest characteristic value is a game, the game is configured as a first leaf node, the first result data set is divided into two parts, namely 'the industry is the game' and 'the industry is not the game', the data of the part of 'the industry is not the game' is used as a target result data subset, the target result data subset is used as a first result data set, the remaining 4 characteristic values in the first result data set at the moment are traversed, and the 4 characteristic values are determined to be in the first result data set at the momentIntensity value of heterogeneity
Figure BDA0002952184210000122
The largest characteristic value is the electricity quotient, the electricity quotient is taken as the second leaf node, and so on, the heterogeneity analysis result tree of the target control experiment shown in fig. 4 is obtained.
In this embodiment of the present application, when there is more than one feature dimension in the heterogeneity analysis requirement information, the more than one feature dimension may be configured as the feature to be analyzed of the original heterogeneity analysis model according to the feature analysis sequence information, so as to obtain the target heterogeneity analysis model; for example, the feature dimensions may include an industry and a delivery site, and the feature analysis sequence information may be an industry-delivery site, and then the industry may be configured as the feature to be analyzed in the first dimension, the delivery site may be configured as the feature to be analyzed in the second dimension, and so on. When the multi-dimensional feature to be analyzed is included, the result heterogeneity analysis of the feature to be analyzed in the first dimension may be performed in the target heterogeneity analysis model, each time a leaf node is generated, and the first result data set is divided into two parts (a and not a, where a represents a feature value of the feature to be analyzed in the first dimension) according to the generated leaf node, the two parts may be respectively used as target result data subsets, and the target result data subsets are used as the first result data set, and traversal calculation is continued on the first result data set at this time, when the part of data "is a" is used as the first result data set, feature value traversal of the feature to be analyzed in the second dimension may be performed on the first result data set at this time, when the part of data "is not a" is used as the first result data set, remaining feature value traversal of the feature to be analyzed in the first dimension may be continued on the first result data set at this time, and so on.
Fig. 5 is a schematic diagram of a heterogeneity analysis result tree of a target control experiment when the heterogeneous analysis result tree includes multi-dimensional features to be analyzed according to an embodiment of the present application, and fig. 5 shows that the first dimension is a business and the second dimension is a release site, and the first dimension is the number of results of the features to be analyzed in the first dimension3 feature values in the data set (game, e-commerce and network service) are obtained, the 3 feature values are traversed to obtain a heterogeneous intensity value of the 3 feature values on the first result data set, and the heterogeneous intensity value is determined
Figure BDA0002952184210000131
The maximum characteristic value is taken as a game, the game is taken as a first leaf node, the first result data set is divided into two parts, namely 'the industry is the game' and 'the industry is not the game', the data of the part, namely 'the industry is the game' and 'the industry is not the game', are respectively taken as target result data subsets, then the target result data subsets are taken as first result data sets, the characteristic values (2 are assumed to be 2: mobile interior sites and mobile alliances) of the characteristics to be analyzed (launch sites) of the second dimension in the data of the part, namely 'the industry is the game', in the data of the second dimension, the heterogeneity intensity value is calculated, the mobile interior site with the maximum heterogeneity intensity value is determined, the mobile interior site is determined, and the mobile alliance is determined as a longitudinal adjacent node of the leaf node of the game, and then the mobile alliance is; assuming that the second dimension of the feature to be analyzed in the data of the section of the industry, namely the e-commerce, and the section of the network clothing, is only the mobile internal site, the heterogeneity analysis result tree of the target control experiment shown in fig. 5 can be obtained.
The first result data set is input into the target heterogeneity analysis model, and result heterogeneity analysis is performed on the features to be analyzed to obtain a heterogeneity analysis result of the target control experiment, so that the method is beneficial to scientifically and reliably performing result heterogeneity analysis by combining machine learning and a large amount of result data, the efficiency and the accuracy of result heterogeneity analysis are improved, and result heterogeneity analysis on multi-dimensional features to be analyzed can be flexibly performed.
In an embodiment of the present application, referring to fig. 6, the method may further include:
s601: a second result dataset is obtained for the target control experiment.
In particular, similar to the first result dataset of the target control experiment, the second result dataset of the target control experiment may include a plurality of pieces of experimental data of the target control experiment, which may include experimental data of an experimental group and experimental data of a control group, wherein each piece of experimental data may include attribute information and an experimental result of an experimental subject. In practical applications, the method may further include: acquiring a target result data set of the target control experiment, and dividing the target result data set into two parts according to a first preset proportion, wherein one part is used as the first result data set, and the other part is used as the second result data set; in practical applications, the first predetermined ratio may be set according to practical application requirements, for example, the first predetermined ratio may be 5: 5.
S603: and inputting the second result data set into the target heterogeneity analysis model, and performing result heterogeneity analysis on the characteristics to be analyzed to obtain a heterogeneity analysis verification tree of the target control experiment.
In an embodiment, the specific process of inputting the second result data set into the target heterogeneity analysis model, and performing result heterogeneity analysis on the features to be analyzed to obtain the heterogeneity analysis verification tree of the target control experiment is similar to S205, which may be referred to the related description of S205, and is not described herein again.
S605: and determining invalid leaf nodes in the heterogeneity analysis result tree according to the heterogeneity analysis verification tree.
In this embodiment, the heterogeneity analysis verification tree may also include a plurality of leaf nodes, and the nodes in the heterogeneity analysis result tree that are ranked differently from the heterogeneity analysis verification tree may be determined and used as invalid leaf nodes in the heterogeneity analysis result tree.
S607: and simplifying the heterogeneity analysis fruit bearing tree according to the invalid node to obtain the processed heterogeneity analysis fruit bearing tree.
In the embodiment of the application, the invalid leaf nodes can be removed from the heterogeneity analysis result tree, so that the heterogeneity analysis result tree can be simplified, and the processed heterogeneity analysis result tree can be displayed on an experimental result comparison page; the heterogeneity analysis verification tree of the target control experiment is generated by obtaining the second result data set of the target control experiment, and the heterogeneity analysis verification tree and the heterogeneity analysis result tree are combined to obtain an intersection, so that the heterogeneity analysis result tree is simpler, the obtained tree structure is more stable, the influence of abnormal values is avoided, and the reliability of result heterogeneity analysis is favorably improved.
In an embodiment of the present application, referring to fig. 7, the method may further include:
s701: a third result dataset is obtained for the target control experiment.
Specifically, similar to the first result data set and the second result data set of the target control experiment, the third result data set of the target control experiment may also include a plurality of pieces of experiment data of the target control experiment, which may include experiment data of an experiment group and experiment data of a control group, wherein each piece of experiment data may include attribute information and an experiment result of the experimental subject. In practical applications, the method may further include: acquiring an original result data set of the target control experiment, dividing the original result data set into two parts according to a second preset proportion, wherein one part is used as the target result data set (dividing the target result data set into two parts according to a first preset proportion, one part is used as the first result data set, the other part is used as the second result data set), and the other part is used as the third result data set; in practical applications, the second preset ratio may be set according to practical application requirements, for example, the second preset ratio may be 5: 5.
S703: and extracting a result data set corresponding to each leaf node in the heterogeneity analysis result tree from the third result data set respectively.
In the embodiment of the present application, taking the above-mentioned feature to be analyzed as an example of an industry, the leaf nodes in the heterogeneity analysis result tree may include, for example, games, e-commerce, network uniforms, and the like; then a result data set corresponding to the industry being the game, a result data set corresponding to the industry being the e-commerce and a result data set corresponding to the industry being the web service can be respectively extracted from the third result data set. In one embodiment, the result data set corresponding to each leaf node may include two parts, one part is the result data set whose feature value is equal to the leaf node value, and the other part is the result data set whose feature value is not equal to the leaf node value.
S705: and calculating result statistical data corresponding to each leaf node according to the result data set corresponding to each leaf node.
In practical application, after performing result heterogeneity analysis on a feature to be analyzed by using a first result data set and a target heterogeneity analysis model, a heterogeneity analysis result tree representing heterogeneity information of the target control experiment on a plurality of feature values of the feature to be analyzed is obtained, in one embodiment, the heterogeneity analysis result tree may include a plurality of feature value nodes (i.e., leaf nodes of a tree structure) arranged according to heterogeneity intensity, the heterogeneity intensity is sequentially weakened from left to right, the heterogeneity information on the plurality of feature values of the feature to be analyzed can be clearly and intuitively obtained by using the tree structure, and further, a key value (higher heterogeneity intensity) and a non-key value (lower heterogeneity intensity) of the feature to be analyzed can be rapidly and accurately determined; in order to further analyze the concrete performance of the target control experiment on each feature value of the feature to be analyzed, a third result data set of the target control experiment can be obtained, and result statistical data corresponding to each leaf node is calculated; for example, the sample proportion of each leaf node is calculated according to the third result data (the proportion of the data quantity in the result data set corresponding to each leaf node in the third result data quantity is calculated), for example, after the result data set corresponding to the industry for the game is extracted from the third result data set, the proportion of the result data set corresponding to the industry for the game in the third result data quantity can be calculated, and the sample proportion corresponding to the leaf node of the industry for the game is obtained; for example, after the result data set corresponding to each leaf node is extracted from the third result data set, gmv degrees of lifting of the experimental group relative to the control group is calculated according to the result data corresponding to the experimental group and the result data corresponding to the control group on the corresponding result data set, and this gmv degree of lifting is taken as the lifting index of the leaf node.
In this embodiment of the present application, the result statistical data may represent statistical index information of the target comparison experiment on the corresponding leaf node; in a specific embodiment, the result statistical data may include a sample proportion and a lifting index, where the sample proportion may represent a proportion of a data quantity in the result data corresponding to the leaf node in the third result data quantity; the lifting index can represent the influence degree of the target strategy on the result data set corresponding to the leaf node relative to the original strategy; in practical applications, the lifting index is a positive or negative value, when the lifting index is a positive value, it indicates that the target strategy has a positive influence (e.g., profit) relative to the original strategy on the result data corresponding to the leaf node, and when the lifting index is a negative value, it indicates that the target strategy has a negative influence (e.g., loss) relative to the original strategy on the result data corresponding to the leaf node, in one embodiment, when the experimental result includes gmv values, the lifting index may represent gmv degrees of lifting of the experimental group relative to the control group on the result data corresponding to the leaf node, which may be specifically calculated by using the following formula:
Figure BDA0002952184210000161
wherein Q ismDenotes the lifting index, A, corresponding to the leaf node mm1 denotes the sum of gmv values of the experimental groups in the result dataset for leaf node m, Am2 denotes the sum of gmv values of the control group in the result dataset for leaf node m.
In one embodiment, when the result data set corresponding to each leaf node includes two parts, one part is the result data set corresponding to the leaf node whose characteristic value is equal to the leaf node value, and the other part is the result data set corresponding to the leaf node whose characteristic value is not equal to the leaf node value, the corresponding result statistical data corresponding to each leaf node may also include two parts, one part is the statistical data corresponding to the leaf node whose characteristic value is equal to the leaf node value, and the other part is the statistical data corresponding to the leaf node whose characteristic value is not equal to the leaf node value.
On the basis of the heterogeneity analysis result tree shown in fig. 4, an embodiment of the present application provides a tree structure diagram including result statistical data corresponding to each leaf node, please refer to fig. 8, where the result statistical data corresponding to each leaf node may include two parts, and one part is statistical data with a characteristic value equal to that of the leaf node (i.e., a rectangular frame corresponding to a Y branch of each leaf node, for example, statistical data in the rectangular frame corresponding to the Y branch of the game includes a lift index of-0.14 and a sample percentage of 18%, which indicates that the industry is that gmv lift of an experimental group relative to a control group in the part of data of the game is-0.14, and the industry is that the proportion of the part of data of the game in the third result data amount is 18%); the other part is that the characteristic value is not equal to the statistical data corresponding to the leaf node value (i.e. the rectangular box corresponding to the N branch of each leaf node, for example, the intra-rectangular box statistical data corresponding to the N branch of the game includes a lift index of-0.79 and a sample percentage of 82%, indicating that gmv lift degree of the experimental group in the part of the data corresponding to the N branch of the industry not being the game is-0.79 relative to the control group, the proportion of the data amount of the part of the data corresponding to the industry not being the game in the third result data amount is 82%, the intra-rectangular box statistical data corresponding to the N branch of the electric business includes a lift index of 0.064 and a sample percentage of 58%, indicating that gmv lift degree of the experimental group in the part of the data corresponding to the industry not being the game and not being the electric business is 0.064 relative to the control group, and the proportion of the data amount of the part of the data corresponding to the.
Fig. 5 is a schematic diagram of a heterogeneity analysis result tree of a target control experiment when the heterogeneous analysis result tree includes multi-dimensional features to be analyzed according to an embodiment of the present application, and please refer to fig. 5, in which two-dimensional features to be analyzed (the first dimension is industry, and the second dimension is a release site) are shared, and the feature value of the first dimension features to be analyzed in the first result dataset is taken as a feature value3 (games, e-commerce and network service) are totally obtained, the 3 characteristic values are traversed to obtain the heterogeneity intensity value of the 3 characteristic values on the first result data set, and the heterogeneity intensity value is determined
Figure BDA0002952184210000171
The maximum characteristic value is taken as a game, the game is taken as a first leaf node, the first result data set is divided into two parts, namely 'the industry is the game' and 'the industry is not the game', the data of the part, namely 'the industry is the game' and 'the industry is not the game', are respectively taken as target result data subsets, then the target result data subsets are taken as first result data sets, the characteristic values (2 are assumed to be 2: mobile interior sites and mobile alliances) of the characteristics to be analyzed (launch sites) of the second dimension in the data of the part, namely 'the industry is the game', in the data of the second dimension, the heterogeneity intensity value is calculated, the mobile interior site with the maximum heterogeneity intensity value is determined, the mobile interior site is determined, and the mobile alliance is determined as a longitudinal adjacent node of the leaf node of the game, and then the mobile alliance is; assuming that the second dimension of the feature to be analyzed in the data of the section of the industry, namely the e-commerce, and the section of the network clothing, is only the mobile internal site, the heterogeneity analysis result tree of the target control experiment shown in fig. 5 can be obtained.
On the basis of the heterogeneity analysis result tree shown in fig. 5, an embodiment of the present application also provides a tree structure diagram including result statistical data corresponding to each leaf node, please refer to fig. 9, where the result statistical data corresponding to each leaf node may include two parts, and one part is statistical data with a characteristic value equal to that of the leaf node (i.e., a rectangular frame corresponding to a Y branch of each leaf node, for example, the statistical data in the rectangular frame corresponding to the Y branch of the leftmost mobile internal site includes a lift index of-0.15 and a sample percentage of 8%, which indicates that the industry is a game, and in the part of data of the launching site which is the mobile internal site, the lift degree of gmv of the experimental group relative to the control group is-0.15, and the proportion of the data amount in the third result data amount is 8%); the other part is that the feature value is not equal to the statistical data corresponding to the leaf node value (i.e. the rectangular box corresponding to the N branch of each leaf node, for example, the rectangular box corresponding to the N branch of the leftmost mobile internal site includes a promotion index of-0.14 and a sample percentage of 10%, which indicates that the promotion degree of the experimental group relative to gmv of the control group in the part of data of the industry that is a game and the drop site is a non-mobile internal site is-0.14, and the proportion of the part of data amount in the third result data amount is 10%.
The result statistical data corresponding to each leaf node is calculated according to the result data set corresponding to each leaf node, the concrete performance of the target comparison experiment on each characteristic value of the characteristic to be analyzed can be scientifically and flexibly determined by combining with the actual analysis requirement (for example, the target strategy is determined to be greatly improved on a sample of which characteristic value is selected), and then the strategy can be promoted and improved in a targeted manner.
In an embodiment of the present application, referring to fig. 11, the method may further include:
s1101: and screening out leaf nodes to be processed according to the result statistical data corresponding to each leaf node in the heterogeneity analysis result tree.
In this embodiment, the leaf nodes to be processed may be screened out according to actual application requirements, and the screening out the leaf nodes to be processed according to the result statistical data corresponding to each leaf node in the heterogeneity analysis result tree may include, for example, screening out leaf nodes having a lifting index greater than a first preset threshold and a sample proportion smaller than a second preset threshold according to the result statistical data corresponding to each leaf node in the heterogeneity analysis result tree as the leaf nodes to be processed. The leaf nodes to be processed are screened out by analyzing the result statistical data corresponding to each leaf node in the result tree according to the heterogeneity, so that the leaf nodes to be processed can be accurately positioned, and targeted optimization of strategies and the like can be accurately and efficiently realized subsequently.
S1103: and extracting result statistical data corresponding to the leaf nodes to be processed.
Specifically, after the leaf nodes to be processed are screened out according to the result statistical data corresponding to each leaf node in the heterogeneity analysis result tree, the result statistical data corresponding to the leaf nodes to be processed can be extracted, and the processing strategy can be further determined conveniently.
S1105: and determining a processing strategy corresponding to the leaf node to be processed according to the result statistical data and the result processing rule corresponding to the leaf node to be processed.
In this embodiment, the result processing rule may be set in combination with a large number of actual processing tests and actual application requirements, for example, when a leaf node with a lifting index greater than a first preset threshold and a sample proportion smaller than a second preset threshold is screened out according to result statistical data corresponding to each leaf node in the heterogeneity analysis result tree as a leaf node to be processed, result statistical data corresponding to the leaf node to be processed may be extracted, a portion where a feature value is the leaf node to be processed is determined, and a put-in amount of ten percent of a total amount is added on the basis that the feature value is a current sample proportion of the leaf node to be processed; when the leaf nodes with the lifting indexes smaller than the third preset threshold and the sample ratios smaller than the fourth preset threshold are screened out as the leaf nodes to be processed according to the result statistical data corresponding to each leaf node in the heterogeneity analysis result tree, the use of the target strategy can be suspended on the feature values corresponding to the leaf nodes to be processed to reduce the loss, which is not limited in the present application.
The leaf nodes to be processed are screened out according to the result statistical data corresponding to each leaf node in the heterogeneity analysis result tree, the result statistical data corresponding to the leaf nodes to be processed are extracted, the processing strategies corresponding to the leaf nodes to be processed are determined according to the result statistical data corresponding to the leaf nodes to be processed and the result processing rules, the targeted adjustment of the strategies can be favorably carried out by combining actual statistical data, and the adaptability and the reliability of the strategy adjustment are improved.
In an embodiment of the present application, referring to fig. 12, the method may further include:
s1201: and responding to the experiment result query instruction, and analyzing the result tree according to the heterogeneity of the target control experiment to generate an experiment result control page.
In practical application, a heterogeneity analysis result of a target comparison experiment can be generated in advance in an off-line precomputation mode and stored in an experiment effect database, and then when an experiment result comparison page is generated in response to an experiment result query instruction, data prestored in the experiment effect database can be directly called, so that the generation efficiency of the experiment result comparison page is improved, and the seizure is reduced; specifically, the experiment result query instruction may include a result query instruction of a target control experiment, and specifically, the experiment result query instruction may be issued when the experiment result query control is detected to be triggered; the result trees can be analyzed according to the heterogeneity of the target control experiment to generate an experiment result control page, in some embodiments, an original result data set of the target control experiment can be obtained, the experiment result control page is generated by combining the original result data set of the target control experiment and the heterogeneity analysis result trees of the target control experiment, the heterogeneity analysis result trees are displayed under a table corresponding to original data, the control analysis can be performed intuitively and flexibly by combining the original result data set and the heterogeneity analysis result trees, and the flexibility and the efficiency of result heterogeneity analysis are improved.
In an embodiment, the experiment result query instruction may carry display layer number information, where the display layer number information may represent a display number limitation condition of a heterogeneous analysis result tree, and in practical application, the display layer number information may be set in combination with a requirement of practical application, and in a specific embodiment, the display layer number information may include: 5 layers (i.e. the feature to be analyzed in the first dimension only shows the first 5 feature values). Referring to fig. 13, the above-mentioned generating an experiment result control page according to the heterogeneity analysis result tree of the target control experiment in response to the experiment result query command may include:
s1301: and responding to an experiment result query instruction, and intercepting the display layer number of the heterogeneity analysis fruit bearing tree of the target contrast experiment according to the display layer number information to obtain the intercepted heterogeneity analysis fruit bearing tree.
In the embodiment of the present application, the heterogeneity analysis result tree may include a plurality of feature value nodes (i.e., leaf nodes of the tree structure) arranged according to the heterogeneity intensity, and the heterogeneity intensity is sequentially weakened from left to right, so that the key value (higher heterogeneity intensity) and the non-key value (lower heterogeneity intensity) of the feature to be analyzed may be quickly and accurately determined; the display layer number interception is carried out on the heterogeneity analysis result tree of the target comparison experiment according to the display layer number information, which is equivalent to the key value extraction or interference data elimination, so that the key value can be rapidly and visually determined for further analysis, the interference of irrelevant data is removed, and the efficiency of the heterogeneity analysis of the result is improved.
S1303: and analyzing the fruiting tree according to the intercepted heterogeneity to generate an experimental result comparison page.
And after the display layer number is intercepted according to the display layer number information and the intercepted heterogeneity analysis result tree is obtained, an experiment result comparison page can be generated according to the intercepted heterogeneity analysis result tree, so that the experiment result comparison page is more visual and concise.
S1203: the experimental results are shown on the control page.
In some embodiments, the step S601 to S607 may be further referred to perform reduction processing on the heterogeneity analysis result tree to obtain a processed heterogeneity analysis result tree, and then the experiment result comparison page may be generated according to the processed heterogeneity analysis result tree in response to the experiment result query instruction, and the experiment result comparison page may be displayed.
According to the technical scheme provided by the embodiment of the application, the first result data set and the heterogeneity analysis demand information of the target control experiment are obtained; and finally, inputting the first result data set into the target heterogeneity analysis model, and performing result heterogeneity analysis on the features to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree represents heterogeneity information of the target control experiment on a plurality of feature values of the features to be analyzed. The method is beneficial to scientifically and reliably analyzing the result heterogeneity by combining machine learning and a large amount of result data, improves the efficiency and the accuracy of the result heterogeneity analysis, and can flexibly analyze the result heterogeneity on the multi-dimensional characteristics to be analyzed. The result statistical data corresponding to each leaf node is calculated according to the result data set corresponding to each leaf node, the concrete performance of the target comparison experiment on each characteristic value of the characteristic to be analyzed can be scientifically and flexibly determined by combining with actual analysis requirements (for example, the target strategy is determined to be greatly improved on samples of which characteristic values), and then the strategy can be promoted and improved in a targeted mode. The leaf nodes to be processed are screened out through analyzing the result statistical data corresponding to each leaf node in the result tree according to the heterogeneity, the result statistical data corresponding to the leaf nodes to be processed are extracted, the processing strategies corresponding to the leaf nodes to be processed are determined according to the result statistical data corresponding to the leaf nodes to be processed and the result processing rules, the targeted adjustment of the strategies can be favorably carried out by combining the actual statistical data, and the adaptability and the reliability of the strategy adjustment can be improved.
An embodiment of the present application further provides a data processing apparatus, as shown in fig. 14, the apparatus may include:
a data obtaining module 1410, configured to obtain a first result data set and heterogeneity analysis requirement information of the target control experiment;
a characteristic parameter configuration module 1420, configured to configure the feature to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis requirement information, so as to obtain a target heterogeneity analysis model;
a heterogeneity analyzing module 1430, configured to input the first result data set into the target heterogeneity analyzing model, and perform result heterogeneity analysis on the features to be analyzed to obtain a heterogeneity analyzing result tree of the target control experiment, where the heterogeneity analyzing result tree represents heterogeneity information of the target control experiment on multiple feature values of the features to be analyzed.
In one embodiment, the target heterogeneity analysis model includes a heterogeneity intensity calculation layer, a tree node generation layer, a data classification layer, and a recursive calculation layer, and the heterogeneity analysis module 1430 may include:
a heterogeneity intensity calculating unit, configured to traverse each feature value of the feature to be analyzed in the first result data set in the heterogeneity intensity calculating layer, and calculate heterogeneity information of each feature value on the first result data set;
a tree node generation unit, configured to determine, in the leaf node generation layer, a target feature value according to the heterogeneity information of each feature value on the first result data set, and generate a target leaf node according to the target feature value;
the data classification unit is used for performing data classification on the first result data set according to the target leaf node in the data classification layer to obtain a target result data subset;
a recursive calculation unit, configured to, in the recursive calculation layer, take the target result data subset as the first result data set, repeat in the heterogeneity intensity calculation layer, traverse each feature value of the feature to be analyzed in the first result data set, calculate heterogeneity information of each feature value on the first result data set, and perform data classification on the first result data set according to the target leaf node in the data classification layer to obtain a target result data subset, until the target result data subset is empty, to obtain a heterogeneity analysis result tree of the target comparison experiment.
In one embodiment, the apparatus may further include:
a second result data set acquisition module for acquiring a second result data set of the target control experiment;
the verification tree generation module is used for inputting the second result data set into the target heterogeneity analysis model and performing result heterogeneity analysis on the features to be analyzed to obtain a heterogeneity analysis verification tree of the target control experiment;
an invalid leaf node determining module, configured to determine an invalid leaf node in the heterogeneity analysis result tree according to the heterogeneity analysis verification tree;
and the simplification module is used for carrying out simplification processing on the heterogeneity analysis fruit bearing tree according to the invalid leaf nodes to obtain the processed heterogeneity analysis fruit bearing tree.
In one embodiment, the apparatus may further include:
a third result data set obtaining module, configured to obtain a third result data set of the target control experiment;
a result data set extraction module, configured to extract, from the third result data set, a result data set corresponding to each leaf node in the heterogeneity analysis result tree;
and the statistical data calculation module is used for calculating the result statistical data corresponding to each leaf node according to the result data set corresponding to each leaf node.
In this embodiment, the apparatus may further include:
the leaf node screening module to be processed is used for screening out leaf nodes to be processed according to the result statistical data corresponding to each leaf node;
the statistical data extraction module is used for extracting result statistical data corresponding to the leaf node to be processed;
and the processing strategy determining module is used for determining the processing strategy corresponding to the leaf node to be processed according to the result statistical data and the processing rule corresponding to the leaf node to be processed.
In this embodiment, the apparatus may further include:
the experimental result comparison page generating module is used for responding to an experimental result query instruction and generating an experimental result comparison page according to the heterogeneity analysis result of the target comparison experiment;
and the page display module is used for displaying the experimental result comparison page.
In one embodiment, the experimental result query instruction carries information on the number of display layers, and the experimental result comparison page generating module may include:
the display layer number intercepting module is used for responding to an experiment result query instruction and intercepting the display layer number of the heterogeneity analysis fruit bearing tree of the target contrast experiment according to the display layer number information to obtain the intercepted heterogeneity analysis fruit bearing tree;
and the result page generation module is used for generating an experiment result comparison page according to the intercepted heterogeneity analysis result tree.
The device and method embodiments in the device embodiment are based on the same application concept.
The embodiment of the present application provides a computer device, which includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the data processing method provided by the above method embodiment.
The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the apparatus, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.
The method embodiments provided in the embodiments of the present application may be executed in a mobile terminal, a computer terminal, a server, or a similar computing device, that is, the computer device may include a mobile terminal, a computer terminal, a server, or a similar computing device. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. Taking the example of the application running on a server, fig. 15 is a hardware structure block diagram of a server for implementing the data processing method according to the embodiment of the present application. As shown in fig. 15, the server 1500 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1510 (the processor 1510 may include but is not limited to a Processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 1530 for storing data, and one or more storage media 1520 (e.g., one or more mass storage devices) for storing applications 1523 or data 1522. The memory 1530 and storage media 1520 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 1520 may include one or more modules, each of which may include a series of instruction operations in a server. Still further, the central processor 1510 may be disposed in communication with the storage media 1520, executing one of the storage media 1520 on the server 1500A series of instruction operations. The Server 1500 may also include one or more power supplies 1560, one or more wired or wireless network interfaces 1550, one or more input-output interfaces 1540, and/or one or more operating systems 1521, such as a Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTMAnd so on.
The Processor 1510 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The input/output interface 1540 can be used to receive and transmit data over a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 1500. In one example, i/o Interface 1540 includes a Network adapter (NIC) that may be coupled to other Network devices through a base station to communicate with the internet. In one example, the input/output interface 1540 can be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
The operating system 1521 may include system programs, such as a framework layer, a core library layer, a driver layer, etc., for handling various basic system services and performing hardware-related tasks, for implementing various basic services and for handling hardware-based tasks.
It will be understood by those skilled in the art that the structure shown in fig. 15 is only an illustration and is not intended to limit the structure of the electronic device. For example, server 1500 may also include more or fewer components than shown in FIG. 15, or have a different configuration than shown in FIG. 15.
Embodiments of the present application further provide a computer-readable storage medium, where the storage medium may be disposed in a server to store at least one instruction or at least one program for implementing a data processing method in the method embodiments, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the data processing method provided in the method embodiments.
Alternatively, in this embodiment, the storage medium may be located in at least one network server of a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above.
As can be seen from the embodiments of the data processing method, apparatus, device, storage medium, or computer program provided in the present application, the first result data set and the heterogeneity analysis requirement information of the target control experiment are obtained; and finally, inputting the first result data set into the target heterogeneity analysis model, and performing result heterogeneity analysis on the features to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree represents heterogeneity information of the target control experiment on a plurality of feature values of the features to be analyzed. The method is beneficial to scientifically and reliably analyzing the result heterogeneity by combining machine learning and a large amount of result data, improves the efficiency and the accuracy of the result heterogeneity analysis, and can flexibly analyze the result heterogeneity on the multi-dimensional characteristics to be analyzed. The result statistical data corresponding to each leaf node is calculated according to the result data set corresponding to each leaf node, the concrete performance of the target comparison experiment on each characteristic value of the characteristic to be analyzed can be scientifically and flexibly determined by combining with actual analysis requirements (for example, the target strategy is determined to be greatly improved on samples of which characteristic values), and then the strategy can be promoted and improved in a targeted mode. The leaf nodes to be processed are screened out through analyzing the result statistical data corresponding to each leaf node in the result tree according to the heterogeneity, the result statistical data corresponding to the leaf nodes to be processed are extracted, the processing strategies corresponding to the leaf nodes to be processed are determined according to the result statistical data corresponding to the leaf nodes to be processed and the result processing rules, the targeted adjustment of the strategies can be favorably carried out by combining the actual statistical data, and the adaptability and the reliability of the strategy adjustment can be improved.
It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, device and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method of data processing, the method comprising:
acquiring a first result data set and heterogeneity analysis demand information of a target control experiment;
configuring the characteristics to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model;
and inputting the first result data set into the target heterogeneity analysis model, and performing result heterogeneity analysis on the features to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree represents heterogeneity information of the target control experiment on a plurality of feature values of the features to be analyzed.
2. The method as claimed in claim 1, wherein the target heterogeneity analysis model comprises a heterogeneity intensity calculation layer, a leaf node generation layer, a data classification layer, and a recursive calculation layer; inputting the first result data set into the target heterogeneity analysis model, and performing result heterogeneity analysis on the features to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree comprises:
in the heterogeneity intensity calculation layer, traversing each feature value of the feature to be analyzed in the first result data set, and respectively calculating heterogeneity information of each feature value on the first result data set;
in the leaf node generation layer, determining a target feature value according to the heterogeneity information of each feature value on the first result data set, and generating a target leaf node according to the target feature value;
in the data classification layer, performing data classification on the first result data set according to the target leaf node to obtain a target result data subset;
in the recursive computation layer, the target result data subset is used as the first result data set, the heterogeneity intensity computation layer is repeated, each feature value of the feature to be analyzed in the first result data set is traversed, heterogeneity information of each feature value on the first result data set is computed respectively, and in the data classification layer, data classification is performed on the first result data set according to the target leaf node to obtain a target result data subset, and until the target result data subset is empty, the heterogeneity analysis result tree of the target comparison experiment is obtained.
3. The method of claim 1, further comprising:
obtaining a second result dataset of the target control experiment;
inputting the second result data set into the target heterogeneity analysis model, and performing result heterogeneity analysis on the features to be analyzed to obtain a heterogeneity analysis verification tree of the target control experiment;
determining invalid leaf nodes in the heterogeneity analysis result tree according to the heterogeneity analysis verification tree;
and simplifying the heterogeneity analysis fruit bearing tree according to the invalid leaf nodes to obtain the processed heterogeneity analysis fruit bearing tree.
4. The method of claim 1, further comprising:
obtaining a third result data set of the target control experiment;
extracting a result data set corresponding to each leaf node in the heterogeneity analysis result tree from the third result data set respectively;
and calculating result statistical data corresponding to each leaf node according to the result data set corresponding to each leaf node.
5. The method of claim 4, further comprising:
screening out leaf nodes to be processed according to the result statistical data corresponding to each leaf node;
extracting result statistical data corresponding to the leaf node to be processed;
and determining a processing strategy corresponding to the leaf node to be processed according to the result statistical data and the processing rule corresponding to the leaf node to be processed.
6. The method of claim 1, further comprising:
responding to an experiment result query instruction, and analyzing a result tree according to the heterogeneity of the target control experiment to generate an experiment result control page;
and displaying the experimental result control page.
7. The method as claimed in claim 6, wherein the experimental result query instruction carries information on the number of display layers, and the generating of the experimental result comparison page according to the heterogeneity analysis result tree of the target comparison experiment in response to the experimental result query instruction comprises:
responding to an experiment result query instruction, and intercepting the display layer number of the heterogeneity analysis fruit bearing tree of the target contrast experiment according to the display layer number information to obtain an intercepted heterogeneity analysis fruit bearing tree;
and analyzing the fruiting tree according to the intercepted heterogeneity to generate an experimental result contrast page.
8. A data processing apparatus, characterized in that the apparatus comprises:
the data acquisition module is used for acquiring a first result data set and heterogeneity analysis demand information of a target control experiment;
the characteristic parameter configuration module is used for configuring the characteristics to be analyzed of the original heterogeneity analysis model based on the heterogeneity analysis demand information to obtain a target heterogeneity analysis model;
and the heterogeneity analysis module is used for inputting the first result data set into the target heterogeneity analysis model, and performing result heterogeneity analysis on the features to be analyzed to obtain a heterogeneity analysis result tree of the target control experiment, wherein the heterogeneity analysis result tree represents heterogeneity information of the target control experiment on a plurality of feature values of the features to be analyzed.
9. A data processing apparatus, characterized in that the apparatus comprises a processor and a memory, in which at least one instruction or at least one program is stored, which is loaded and executed by the processor to implement the data processing method according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which at least one instruction or at least one program is stored, which is loaded and executed by a processor to implement the data processing method according to any one of claims 1 to 7.
CN202110213201.XA 2021-02-25 2021-02-25 Data processing method, device, equipment and storage medium Active CN112862536B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110213201.XA CN112862536B (en) 2021-02-25 2021-02-25 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110213201.XA CN112862536B (en) 2021-02-25 2021-02-25 Data processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112862536A true CN112862536A (en) 2021-05-28
CN112862536B CN112862536B (en) 2023-07-11

Family

ID=75991530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110213201.XA Active CN112862536B (en) 2021-02-25 2021-02-25 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112862536B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015172567A1 (en) * 2014-05-12 2015-11-19 中国科学院计算机网络信息中心 Internet information searching, aggregating and presentation method
CN109493325A (en) * 2018-10-23 2019-03-19 清华大学 Tumor Heterogeneity analysis system based on CT images
CN111597247A (en) * 2020-06-05 2020-08-28 腾讯科技(深圳)有限公司 Data anomaly analysis method and device and storage medium
CN112199854A (en) * 2020-10-23 2021-01-08 国网青海省电力公司清洁能源发展研究院 Method for constructing efficiency analysis model of power industry

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015172567A1 (en) * 2014-05-12 2015-11-19 中国科学院计算机网络信息中心 Internet information searching, aggregating and presentation method
CN109493325A (en) * 2018-10-23 2019-03-19 清华大学 Tumor Heterogeneity analysis system based on CT images
CN111597247A (en) * 2020-06-05 2020-08-28 腾讯科技(深圳)有限公司 Data anomaly analysis method and device and storage medium
CN112199854A (en) * 2020-10-23 2021-01-08 国网青海省电力公司清洁能源发展研究院 Method for constructing efficiency analysis model of power industry

Also Published As

Publication number Publication date
CN112862536B (en) 2023-07-11

Similar Documents

Publication Publication Date Title
WO2022225579A1 (en) Variables & implementations of solution automation & interface analysis
Li et al. A comparative analysis of evolutionary and memetic algorithms for community detection from signed social networks
CN111698247B (en) Abnormal account detection method, device, equipment and storage medium
CN112052404B (en) Group discovery method, system, equipment and medium of multi-source heterogeneous relation network
CN111563192B (en) Entity alignment method, device, electronic equipment and storage medium
CN113610239A (en) Feature processing method and feature processing system for machine learning
CN111914159B (en) Information recommendation method and terminal
CN116109121B (en) User demand mining method and system based on big data analysis
CN111932386A (en) User account determining method and device, information pushing method and device, and electronic equipment
CN111104242A (en) Method and device for processing abnormal logs of operating system based on deep learning
CN112231416A (en) Knowledge graph ontology updating method and device, computer equipment and storage medium
CN115964461A (en) Network data matching method and platform based on artificial intelligence and big data analysis
CN112559631A (en) Data processing method and device of distributed graph database and electronic equipment
CN113554175A (en) Knowledge graph construction method and device, readable storage medium and terminal equipment
CN113569162A (en) Data processing method, device, equipment and storage medium
CN114462582A (en) Data processing method, device and equipment based on convolutional neural network model
CN113095511A (en) Method and device for judging in-place operation of automatic master station
CN112862536A (en) Data processing method, device, equipment and storage medium
CN115456093A (en) High-performance graph clustering method based on attention-graph neural network
Alwahaishi et al. Analysis of the dblp publication classification using concept lattices
CN115131058A (en) Account identification method, device, equipment and storage medium
WO2022200624A2 (en) Systems and methods for end-to-end machine learning with automated machine learning explainable artificial intelligence
CN114661887A (en) Cross-domain data recommendation method and device, computer equipment and medium
CN113821418A (en) Fault tracking analysis method and device, storage medium and electronic equipment
He et al. Parallel decision tree with application to water quality data analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant