CN113592557A - Attribution method and device of advertisement putting result, storage medium and electronic equipment - Google Patents

Attribution method and device of advertisement putting result, storage medium and electronic equipment Download PDF

Info

Publication number
CN113592557A
CN113592557A CN202110886914.2A CN202110886914A CN113592557A CN 113592557 A CN113592557 A CN 113592557A CN 202110886914 A CN202110886914 A CN 202110886914A CN 113592557 A CN113592557 A CN 113592557A
Authority
CN
China
Prior art keywords
candidate
prediction
influence
factors
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110886914.2A
Other languages
Chinese (zh)
Inventor
高少文
余鲲涛
朱家华
王婷婷
于溟鲲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Youzhuju Network Technology Co Ltd
Original Assignee
Beijing Youzhuju Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Youzhuju Network Technology Co Ltd filed Critical Beijing Youzhuju Network Technology Co Ltd
Priority to CN202110886914.2A priority Critical patent/CN113592557A/en
Publication of CN113592557A publication Critical patent/CN113592557A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure relates to an attribution method, an attribution device, a storage medium and an electronic device of an advertisement putting result, wherein the method comprises the following steps: acquiring a data set to be predicted, wherein the data set to be predicted comprises a plurality of pieces of prediction data generated in an advertisement putting process, and each piece of prediction data comprises feature data of a plurality of influence factors; according to a trained random forest prediction model, adopting a Shap value algorithm to count the shape value of each influence factor, wherein the random forest prediction model is used for obtaining a prediction result based on the data set to be predicted, and the shape value of each influence factor is used for representing the importance of the influence factor on the prediction result; and determining target influence factors according to the shape values of the influence factors, wherein by adopting the scheme, core factors influencing the advertisement putting effect can be obtained, and the stability of the importance of each influence factor determined based on a random forest prediction model to a prediction result is high.

Description

Attribution method and device of advertisement putting result, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of electronic information technologies, and in particular, to an advertisement delivery result attribution method, an advertisement delivery result attribution device, a storage medium, and an electronic device.
Background
The advertisement system is used as a complex bidding system, and a large amount of advertisement plans can compete for display opportunities on the same media platform for bidding running amount. Due to the excitement of bidding environment, the instability of a delivery system and the like, delivery problems such as cold start, volume loss, volume explosion, over cost and the like are inevitably generated, and expected delivery results are difficult to obtain.
In order to obtain the expected delivery result, data generated in the advertisement delivery process needs to be analyzed to obtain factors influencing the advertisement delivery effect, and the delivery plan is modified based on the influencing factors to obtain the expected delivery result. Therefore, how to obtain the core factors influencing the advertising effect is crucial.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, the present disclosure provides an attribution method for advertisement placement results, comprising:
acquiring a data set to be predicted, wherein the data set to be predicted comprises a plurality of pieces of prediction data generated in an advertisement putting process, and each piece of prediction data comprises feature data of a plurality of influence factors;
according to a trained random forest prediction model, adopting a Shap value algorithm to count the shape value of each influence factor, wherein the random forest prediction model is used for obtaining a prediction result based on the data set to be predicted, and the shape value of each influence factor is used for representing the importance of the influence factor on the prediction result;
and determining the target influence factors according to the shape values of the influence factors.
In a second aspect, the present disclosure provides an apparatus for attributing advertisement placement results, comprising:
the system comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring a data set to be predicted, the data set to be predicted comprises a plurality of pieces of prediction data generated in an advertisement putting process, and each piece of prediction data comprises feature data of a plurality of influence factors;
the statistical module is used for adopting a Shap value algorithm to perform statistics on the shape value of each influence factor according to a trained random forest prediction model, wherein the random forest prediction model is used for obtaining a prediction result based on the data set to be predicted, and the shape value of each influence factor is used for representing the importance of the influence factor on the prediction result;
and the determining module is used for determining the target influence factors according to the shape values of the influence factors.
In a third aspect, the present disclosure provides a computer-readable medium, on which a computer program is stored, which when executed by a processing device, implements the steps of the method for attributing an advertisement delivery result as described in the first aspect above.
In a fourth aspect, the present disclosure provides an electronic device comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to implement the steps of the method for attributing advertisement placement results in the first aspect.
According to the technical scheme, a shape value of each influence factor is counted by adopting a shape value algorithm according to a trained random forest prediction model, and the shape value of each influence factor is used for representing the importance of the influence factor on a prediction result, so that a target influence factor influencing an advertisement putting result can be determined according to the shape value of each influence factor; in addition, when the random forest prediction model is trained, each decision tree included in the random forest prediction model is trained on the basis of randomly selected feature data, so that the stability of each influence factor determined on the basis of the random forest prediction model on the importance of a prediction result is high.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:
fig. 1 is a flow chart illustrating a method of attributing advertisement placement results according to an exemplary embodiment of the present disclosure.
FIG. 2 is a schematic diagram illustrating a training decision tree selection feature data according to an exemplary embodiment of the present disclosure.
FIG. 3 is another schematic diagram illustrating a method of training a decision tree to select feature data according to an exemplary embodiment of the present disclosure.
Fig. 4 is a block diagram illustrating an advertisement placement result attribution apparatus according to an exemplary embodiment of the present disclosure.
Fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
In the related art, the existing advertisement attribution method adopts an integration method based on boosting, thereby improving attribution effect, and although the accuracy is higher, the stability is poorer. For the advertisement attribution scene, the attribution algorithm requires higher accuracy than the model, and if answers to similar problem explanations are very different, the attribution conclusion is not reliable.
In view of the above, the present disclosure provides an attribution method, an attribution device, a storage medium and an electronic device for advertisement placement results, so as to improve attribution stability affecting advertisement placement.
Fig. 1 is a flow chart illustrating a method of attributing advertisement placement results according to an exemplary embodiment of the present disclosure. Referring to fig. 1, the attribution method of advertisement delivery results includes the following steps.
Step 101, a data set to be predicted is obtained, wherein the data set to be predicted comprises a plurality of pieces of prediction data generated in an advertisement putting process, and each piece of prediction data comprises feature data of a plurality of influence factors.
Illustratively, in the advertisement delivery process, advertisement data including an influence factor 1, an influence factor 2, an influence factor 3, and an influence factor 4 may be generated, and feature extraction processing may be performed on the advertisement data of the influence factor 1, the influence factor 2, the influence factor 3, and the influence factor 4, respectively, to obtain feature data corresponding to each influence factor, and the feature data of the influence factor 1, the influence factor 2, the influence factor 3, and the influence factor 4 may constitute a piece of prediction data.
For example, prediction data generated in a preset time period in an advertisement delivery process may be obtained, where the preset time period may be set according to an actual situation, and details of this embodiment are not described herein.
For example, prediction data generated in a plurality of discontinuous delivery periods in the advertisement delivery process can be obtained, and attribution analysis can be carried out based on the obtained prediction data.
And step 102, according to the trained random forest prediction model, adopting a Shap value algorithm to count the shape value of each influence factor, wherein the random forest prediction model is used for obtaining a prediction result based on a data set to be predicted, and the shape value of each influence factor is used for representing the importance of the influence factor on the prediction result.
Specifically, the sharp value can reflect the contribution rate of the feature data of each influence factor to the learning target of the model, and the purpose of estimating the sharp value can be achieved by calculating the contribution rate of each feature to the learning target.
Illustratively, different features are ranked in each decision tree according to the information gain of the feature classification (whether the feature is a feature affecting advertisement placement) so that the feature with higher relevance and large contribution to learning goal has higher weight in the decision tree, while the unimportant feature may not appear in the decision tree at all. The Shap value may represent the contribution rate of each feature to the learning objective, and therefore the gain of a feature when it appears in a sequence and when it does not appear in the sequence may be calculated to estimate the Shap value. For example, x1, x2, and x3 represent features of different influencing factors, the sequence S1 ═ x1, x2, x3] ═ 10, and S2 ═ x2, x3 ═ 8, and these two sequences represent x1, x2, and x3, which in combination, contribute 10 to the target, and x2, and x3, which in combination, contribute 8 to the target, and then the contribution of x1 to the target is 10-8 ═ 2, and 2 can be taken as the estimated sharp value.
For example, the estimation of the Shap value can be achieved by calculating the benefit of the feature on model indexes such as accuracy and the like brought by the appearance of the feature in the decision tree. For example, the covered sample size is 100, and it is assumed that the decision tree has a path [ a >10and b <5and c >20], and for the feature value a-12 (corresponding to some influencing factor) of the sharp value, [ a >10and b <5and c >20] the accuracy of the decision tree is 0.9, and [ b <5and c >20] the accuracy of the path is 0.8, and since the feature value a-12 appears in the previous path, the sharp value of the feature value a-12 can be represented by (0.9-0.8) × 100.
It should be noted that the random forest prediction model includes a plurality of decision trees, each decision tree predicts based on the data set to be predicted to obtain a predictor, and the prediction result is determined based on all the predictor results.
The random forest prediction model can be obtained by training in the following mode:
firstly, obtaining a model training data set, wherein the model training data set comprises a plurality of pieces of model training data;
secondly, sampling for K times in a sampling mode with a release function, and randomly selecting n pieces of model training data in each sampling to obtain K sub-training data sets;
thirdly, based on each sub-training data set, a decision tree is obtained through training. When each decision tree is trained, m features are randomly extracted from the sub-training data set, the decision trees are trained by using the m features, and in the process of training the decision trees, the splitting feature of each node needs to be determined from the m features until the splitting feature of a certain node cannot be selected.
The feature selection of the decision tree is further explained by taking a feature data as an example shown in fig. 2 and 3, and fig. 2 and 3 are sample features selected in a first run and a second run when the same decision tree is trained. As can be seen from fig. 2 and 3, the sub-training data sets of the decision tree are composed of samples 5 to 8, and the m features selected in fig. 2 are samples 5 to 8, i.e., features 3 and 4, and the m features selected in fig. 3 are samples 5 to 8, i.e., features 2 and 3. Each run trains the decision tree based on the features in partition 1, and the randomly selected features for the two partitions are different. For a certain node, in the first operation, the partition 1 in fig. 2 votes out the feature 3 as the splitting feature of the node, and in the second operation, the partition 1 in fig. 3 votes out the feature 2 as the splitting feature of the node. The difference of the partition data in each operation causes the difference of the global voting result, so that the feature 3 may become a split node in the result of the first operation, but the feature 2 may become a split node in the second operation, and the difference of the global voting result caused by the difference of the partition data can improve the stability of the random forest prediction model.
It can be understood that K decision trees are generated in K rounds, and since the K decision trees are random in both the selection of the training set and the selection of the features, the K decision trees are independent of each other and can be executed in parallel. Since advertisement attribution analysis may involve a large amount of advertisement data, a scheme that may be performed in parallel independently of each other may increase the efficiency of attribution analysis as compared to a conventional standalone attribution approach.
And 103, determining target influence factors according to the shape values of the influence factors.
It can be understood that, since the shape value of each influence factor is used to characterize the importance of the influence factor to the predicted result, the core factors influencing the advertisement delivery result can be determined according to the shape values of the influence factors.
It should be noted that the higher the shape value, the higher the importance of the prediction result.
According to the mode, the shape value of each influence factor is counted by adopting a shape value algorithm according to a trained random forest prediction model, and the shape value of each influence factor is used for representing the importance of the influence factor on a prediction result, so that the target influence factor influencing the advertisement putting result can be determined according to the shape value of each influence factor; in addition, when the random forest prediction model is trained, each decision tree is trained on the basis of randomly selected feature data, so that the stability of determining the importance of each influence factor on the prediction result by the random forest prediction model is high.
In a possible manner, step 103 shown in fig. 1 may include: determining a candidate influence factor set according to the shape value of each influence factor; determining candidate influence factors linearly related to the prediction result in the candidate influence factor set as candidate influence factors to be tested; determining a hypothesis test result of the candidate influencing factors to be tested; and determining whether the candidate influencing factors to be detected are target influencing factors according to the hypothesis test results of the candidate influencing factors to be detected.
The target influencing factor is a factor significantly related to the predicted result.
For example, the candidate influence factor set may be determined according to the high-low situation of the shape value of each influence factor. It is understood that the candidate influencing factor set contains both linearly related and non-linearly related factors, i.e. one dimension is retained if it is strongly linear with the target, or strongly correlated with the target after being combined with other dimensions.
In a possible mode, the influence factor with the shape value higher than the preset value can be determined as a candidate influence factor, and the candidate influence factor set is obtained.
For example, the preset value may be zero or other values, and the embodiment is not limited herein.
It should be noted that, for distinguishing the linear correlation factor from the nonlinear correlation factor, reference may be made to related technologies, and this embodiment is not described herein again.
In the related art, for the scheme of determining a significant influence factor from a plurality of influence factors by using the importance score, the setting of the importance threshold generally depends on manual experience, and therefore, effective guidance is still lacking for distinguishing important and unimportant factors. The hypothesis test has the advantage that whether the original hypothesis is overturned is determined by calculating whether the probability of sampling the current data is small enough under the premise that the original hypothesis is established. Generally, a class of error rate threshold α of the double-sided test is 0.05, and if pvalue (assumed value) <0.05 is calculated, it can be considered that a statistically significant effect is achieved, so that the confidence of the attribution analysis can be improved by performing the assumption test on the candidate influencing factors to be tested and performing the attribution analysis according to the assumed test result having the statistical significance. In addition, a part of irrelevant influence factors can be filtered out by utilizing the shape value, so that part of calculation overhead can be saved, and the efficiency of attribution analysis is further improved.
In a possible way, the hypothesis testing result of the candidate influencing factors to be tested can be determined by: performing hypothesis testing on model parameters of the linear regression model corresponding to the candidate influence factors to be tested; and determining the hypothesis test result of the candidate influencing factors to be tested according to the hypothesis test result of the model parameters of the linear regression model.
It should be noted that the hypothesis test result of the influence factor of the candidate to be tested is the hypothesis test result of the model parameters of the regression model in which the influence factor of the candidate to be tested is linear.
The linear regression model corresponding to the influence factors of the candidate to be detected is obtained by training the feature data corresponding to the influence factors of the candidate to be detected and the label data corresponding to each feature data. Because there often exists a certain correlation between features, when a regression model is used to fit a plurality of influencing factors, the fitting result will fluctuate, resulting in the original significant factors being not significant. Therefore, in order to avoid the influence of multiple collinearity, a logistic regression model is independently fitted for each influencing factor, and hypothesis test is carried out on whether the model parameters of the regression model have correlation, so that the high-correlation factors can be ensured not to be unreliable due to the influence of other correlation factors, and the high-correlation factors can be screened out with high accuracy and recall rate.
First, it is understood that the candidate influencing factors linearly related to the prediction result may be plural. Therefore, the following further explains the implementation process of performing hypothesis testing on the model parameters of the linear regression model corresponding to the candidate influencing factors to be tested, taking the jth candidate influencing factor to be tested as an example. Specifically, the method comprises the following steps:
first, setting original hypothesis and alternative hypothesis of model parameters of a linear regression model, wherein the original hypothesis is H0jIs 0, alternative hypothesis is H1j≠0,βjAnd representing model parameters of the jth candidate influence factor to be detected.
And secondly, obtaining the statistic of the linear regression model according to the model parameters of the linear regression model and a preset formula. Wherein, the preset formula may be: w2=(δj/SE(δj))2Will deltajAnd replacing the model parameters of the linear regression model to obtain the statistic of the linear regression model. For example, W2=(βj/SE(βj))2Statistics of model parameters representing influencing factors of the jth candidate to be tested. SE (. beta.)j) And (4) representing the standard deviation of the model parameters of the jth candidate influencing factor to be detected.
And thirdly, judging whether the statistic meets a preset condition. The preset condition may be whether the statistic is larger than the quantile of the chi-square distribution at the preset value. Specifically, the statistical quantity is determined to meet the preset condition under the condition that the statistical quantity is larger than the quantile of the chi-square distribution in the preset value, and otherwise, the statistical quantity is determined to not meet the preset condition under the condition that the statistical quantity is smaller than or equal to the quantile of the chi-square distribution in the preset value. For example, the chi-square distribution may be a chi-square distribution with a degree of freedom of 1, and the preset value may be 0.05.
Fourthly, determining the hypothesis test result of the model parameters of the linear regression model according to the judgment result. It can be understood that the hypothesis test result of the model parameters of the linear regression model includes that the original hypothesis is not established or the original hypothesis is established, and in the case that the statistic satisfies the preset condition, the original hypothesis is determined to be not established (alternative hypothesis is accepted); and determining that the original hypothesis is satisfied (rejecting the alternative hypothesis) when the statistic does not satisfy the preset condition. And the original assumption of the model parameters of the linear regression model does not represent the candidate influence factors to be tested as the target influence factors.
Based on the same inventive concept, the present disclosure provides an advertisement delivery result attribution device, referring to fig. 4, including:
the system comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring a data set to be predicted, the data set to be predicted comprises a plurality of pieces of prediction data generated in an advertisement putting process, and each piece of prediction data comprises feature data of a plurality of influence factors;
the statistical module is used for adopting a Shap value algorithm to perform statistics on the shape value of each influence factor according to a trained random forest prediction model, wherein the random forest prediction model is used for obtaining a prediction result based on the data set to be predicted, and the shape value of each influence factor is used for representing the importance of the influence factor on the prediction result;
and the determining module is used for determining the target influence factors according to the shape values of the influence factors.
Optionally, the determining module includes:
the first determining submodule is used for determining a candidate influence factor set according to the shape value of each influence factor;
a second determining submodule, configured to determine candidate influence factors in the candidate influence factor set that are linearly related to the prediction result as candidate influence factors to be checked;
a third determining submodule, configured to determine a hypothesis test result of the candidate influencing factors to be tested;
and the fourth determining submodule is used for determining whether the influence factors of the candidate to be detected are target influence factors according to the hypothesis test result of the influence factors of the candidate to be detected.
Optionally, the apparatus comprises:
a hypothesis testing module for performing hypothesis testing on the model parameters of the linear regression model corresponding to the candidate influencing factors to be tested;
and the result determining module is used for determining the hypothesis test result of the candidate influencing factors to be tested according to the hypothesis test result of the model parameters of the linear regression model.
Optionally, the hypothesis testing module comprises:
the setting submodule is used for setting an original hypothesis and a spare hypothesis of the model parameters of the linear regression model;
the calculation submodule is used for obtaining the statistic of the linear regression model according to the model parameters of the linear regression model and a preset formula;
the judgment submodule is used for judging whether the statistic meets a preset condition or not; and the number of the first and second electrodes,
and the result determining submodule is used for determining a hypothesis test result of the model parameters of the linear regression model according to the judgment result, wherein the hypothesis test result of the model parameters of the linear regression model comprises that an original hypothesis is not established or that the original hypothesis is established, and the original hypothesis of the model parameters of the linear regression model does not characterize the candidate influence factors to be tested as the target influence factors.
Optionally, the determining submodule is specifically configured to determine whether the statistic is greater than a quantile of chi-square distribution in a preset value; and determining that the statistic satisfies the preset condition under the condition that the statistic is larger than the quantile of the chi-square distribution in a preset numerical value.
Optionally, the apparatus further comprises:
the training data acquisition module is used for acquiring a training data set, wherein the training data set comprises feature data corresponding to the candidate influence factors to be detected and label data corresponding to each feature data;
and the training module is used for training to obtain a linear regression model corresponding to the candidate influence factors to be detected according to the training data set.
Optionally, the first determining sub-module is specifically configured to determine, as a candidate influence factor, an influence factor with a shape value higher than a preset value, so as to obtain the candidate influence factor set.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Based on the same inventive concept, the disclosed embodiments also provide a computer readable medium, on which a computer program is stored, which when executed by a processing device, implements the steps of the above attribution method for advertisement delivery results.
Based on the same inventive concept, an embodiment of the present disclosure further provides an electronic device, including:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to implement the steps of the attribution method of the advertisement delivery result described above.
Referring now to FIG. 5, a block diagram of an electronic device 500 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some implementations, the electronic devices may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a data set to be predicted, wherein the data set to be predicted comprises a plurality of pieces of prediction data generated in an advertisement putting process, and each piece of prediction data comprises feature data of a plurality of influence factors; according to a trained random forest prediction model, adopting a Shap value algorithm to count the shape value of each influence factor, wherein the random forest prediction model is used for obtaining a prediction result based on the data set to be predicted, and the shape value of each influence factor is used for representing the importance of the influence factor on the prediction result; and determining the target influence factors according to the shape values of the influence factors.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of a module does not in some cases form a limitation of the module itself, and for example, an obtaining module may also be described as a "module that obtains a data set to be predicted".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Example 1 provides a method of attributing an advertisement placement result, according to one or more embodiments of the present disclosure, including:
acquiring a data set to be predicted, wherein the data set to be predicted comprises a plurality of pieces of prediction data generated in an advertisement putting process, and each piece of prediction data comprises feature data of a plurality of influence factors;
according to a trained random forest prediction model, adopting a Shap value algorithm to count the shape value of each influence factor, wherein the random forest prediction model is used for obtaining a prediction result based on the data set to be predicted, and the shape value of each influence factor is used for representing the importance of the influence factor on the prediction result;
and determining the target influence factors according to the shape values of the influence factors.
Example 2 provides the method of example 1, and the determining a target influence factor according to the shape value of each of the influence factors includes:
determining a candidate influence factor set according to the shape value of each influence factor;
determining the candidate influence factors linearly related to the prediction result in the candidate influence factor set as candidate influence factors to be tested;
determining a hypothesis test result of the candidate influencing factors to be tested;
and determining whether the influence factors of the candidate to be detected are target influence factors according to the hypothesis test result of the influence factors of the candidate to be detected.
Example 3 provides the method of example 2, the hypothesis testing results of the candidate contributors to be tested are determined by:
performing hypothesis testing on model parameters of the linear regression model corresponding to the candidate influence factors to be tested;
and determining the hypothesis test result of the candidate influencing factors to be tested according to the hypothesis test result of the model parameters of the linear regression model.
Example 4 provides the method of example 3, the hypothesis testing model parameters of the linear regression model corresponding to the candidate influencing factors to be tested, including:
setting original hypothesis and alternative hypothesis of model parameters of the linear regression model;
obtaining statistics of the linear regression model according to model parameters of the linear regression model and a preset formula;
judging whether the statistic meets a preset condition or not; and the number of the first and second electrodes,
and determining a hypothesis test result of the model parameters of the linear regression model according to the judgment result, wherein the hypothesis test result of the model parameters of the linear regression model comprises that an original hypothesis is not established or that the original hypothesis is established, and the original hypothesis of the model parameters of the linear regression model does not characterize the candidate influence factors to be tested as the target influence factors.
Example 5 provides the method of example 4, wherein the determining whether the statistic satisfies a preset condition includes:
judging whether the statistic is larger than the quantile of the chi-square distribution in a preset value;
and determining that the statistic satisfies the preset condition under the condition that the statistic is larger than the quantile of the chi-square distribution in a preset numerical value.
Example 6 provides the method of example 3, the linear regression model being trained in the following manner, in accordance with one or more embodiments of the present disclosure:
acquiring a training data set, wherein the training data set comprises feature data corresponding to the candidate influencing factors to be detected and label data corresponding to each feature data;
and training to obtain a linear regression model corresponding to the candidate influence factors to be detected according to the training data set.
Example 7 provides the method of example 2, wherein determining the set of candidate impact factors according to the shape values of the impact factors includes:
and determining the influence factors with the shape value higher than the preset value as candidate influence factors to obtain the candidate influence factor set.
Example 8 provides an apparatus for attributing an advertisement placement result, according to one or more embodiments of the present disclosure, comprising:
the system comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring a data set to be predicted, the data set to be predicted comprises a plurality of pieces of prediction data generated in an advertisement putting process, and each piece of prediction data comprises feature data of a plurality of influence factors;
the statistical module is used for adopting a Shap value algorithm to perform statistics on the shape value of each influence factor according to a trained random forest prediction model, wherein the random forest prediction model is used for obtaining a prediction result based on the data set to be predicted, and the shape value of each influence factor is used for representing the importance of the influence factor on the prediction result;
and the determining module is used for determining the target influence factors according to the shape values of the influence factors.
Example 9 provides a computer readable medium having stored thereon a computer program that, when executed by a processing apparatus, performs the steps of the method of any one of examples 1 to 7, in accordance with one or more embodiments of the present disclosure.
Example 10 provides, in accordance with one or more embodiments of the present disclosure, an electronic device comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method of any one of examples 1 to 7.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims (10)

1. A method for attributing advertisement putting results, comprising:
acquiring a data set to be predicted, wherein the data set to be predicted comprises a plurality of pieces of prediction data generated in an advertisement putting process, and each piece of prediction data comprises feature data of a plurality of influence factors;
according to a trained random forest prediction model, adopting a Shap value algorithm to count the shape value of each influence factor, wherein the random forest prediction model is used for obtaining a prediction result based on the data set to be predicted, and the shape value of each influence factor is used for representing the importance of the influence factor on the prediction result;
and determining the target influence factors according to the shape values of the influence factors.
2. The method according to claim 1, wherein the determining a target influence factor according to the shape value of each influence factor comprises:
determining a candidate influence factor set according to the shape value of each influence factor;
determining the candidate influence factors linearly related to the prediction result in the candidate influence factor set as candidate influence factors to be tested;
determining a hypothesis test result of the candidate influencing factors to be tested;
and determining whether the influence factors of the candidate to be detected are target influence factors according to the hypothesis test result of the influence factors of the candidate to be detected.
3. The attribution method according to claim 2, wherein the hypothesis testing results of the candidate influencing factors to be tested are determined by:
performing hypothesis testing on model parameters of the linear regression model corresponding to the candidate influence factors to be tested;
and determining the hypothesis test result of the candidate influencing factors to be tested according to the hypothesis test result of the model parameters of the linear regression model.
4. The attribution method according to claim 3, wherein the hypothesis testing model parameters of the linear regression model corresponding to the candidate influencing factors to be tested comprises:
setting original hypothesis and alternative hypothesis of model parameters of the linear regression model;
obtaining statistics of the linear regression model according to model parameters of the linear regression model and a preset formula;
judging whether the statistic meets a preset condition or not; and the number of the first and second electrodes,
and determining a hypothesis test result of the model parameters of the linear regression model according to the judgment result, wherein the hypothesis test result of the model parameters of the linear regression model comprises that an original hypothesis is not established or that the original hypothesis is established, and the original hypothesis of the model parameters of the linear regression model does not characterize the candidate influence factors to be tested as the target influence factors.
5. The method according to claim 4, wherein the determining whether the statistic satisfies a predetermined condition comprises:
judging whether the statistic is larger than the quantile of the chi-square distribution in a preset value;
and determining that the statistic satisfies the preset condition under the condition that the statistic is larger than the quantile of the chi-square distribution in a preset numerical value.
6. The attribution method of claim 3, wherein the linear regression model is trained by:
acquiring a training data set, wherein the training data set comprises feature data corresponding to the candidate influencing factors to be detected and label data corresponding to each feature data;
and training to obtain a linear regression model corresponding to the candidate influence factors to be detected according to the training data set.
7. The attribution method of claim 2, wherein the determining a set of candidate influencers based on the shape values of each of the influencers comprises:
and determining the influence factors with the shape value higher than the preset value as candidate influence factors to obtain the candidate influence factor set.
8. An apparatus for attributing advertisement placement results, comprising:
the system comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring a data set to be predicted, the data set to be predicted comprises a plurality of pieces of prediction data generated in an advertisement putting process, and each piece of prediction data comprises feature data of a plurality of influence factors;
the statistical module is used for adopting a Shap value algorithm to perform statistics on the shape value of each influence factor according to a trained random forest prediction model, wherein the random forest prediction model is used for obtaining a prediction result based on the data set to be predicted, and the shape value of each influence factor is used for representing the importance of the influence factor on the prediction result;
and the determining module is used for determining the target influence factors according to the shape values of the influence factors.
9. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1 to 7.
10. An electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 7.
CN202110886914.2A 2021-08-03 2021-08-03 Attribution method and device of advertisement putting result, storage medium and electronic equipment Pending CN113592557A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110886914.2A CN113592557A (en) 2021-08-03 2021-08-03 Attribution method and device of advertisement putting result, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110886914.2A CN113592557A (en) 2021-08-03 2021-08-03 Attribution method and device of advertisement putting result, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN113592557A true CN113592557A (en) 2021-11-02

Family

ID=78254448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110886914.2A Pending CN113592557A (en) 2021-08-03 2021-08-03 Attribution method and device of advertisement putting result, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113592557A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116805045A (en) * 2023-08-17 2023-09-26 北京电科智芯科技有限公司 Meteorological prediction model correction method, device, equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109242203A (en) * 2018-09-30 2019-01-18 中冶华天南京工程技术有限公司 A kind of water quality prediction of river and water quality impact factors assessment method
CN109409647A (en) * 2018-09-10 2019-03-01 昆明理工大学 A kind of analysis method of the salary level influence factor based on random forests algorithm
CN111325353A (en) * 2020-02-28 2020-06-23 深圳前海微众银行股份有限公司 Method, device, equipment and storage medium for calculating contribution of training data set
CN111340231A (en) * 2020-02-11 2020-06-26 深圳前海微众银行股份有限公司 SHAP feature attribution method, device, equipment and readable storage medium
CN112801693A (en) * 2021-01-18 2021-05-14 百果园技术(新加坡)有限公司 Advertisement characteristic analysis method and system based on high-value user

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409647A (en) * 2018-09-10 2019-03-01 昆明理工大学 A kind of analysis method of the salary level influence factor based on random forests algorithm
CN109242203A (en) * 2018-09-30 2019-01-18 中冶华天南京工程技术有限公司 A kind of water quality prediction of river and water quality impact factors assessment method
CN111340231A (en) * 2020-02-11 2020-06-26 深圳前海微众银行股份有限公司 SHAP feature attribution method, device, equipment and readable storage medium
CN111325353A (en) * 2020-02-28 2020-06-23 深圳前海微众银行股份有限公司 Method, device, equipment and storage medium for calculating contribution of training data set
CN112801693A (en) * 2021-01-18 2021-05-14 百果园技术(新加坡)有限公司 Advertisement characteristic analysis method and system based on high-value user

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116805045A (en) * 2023-08-17 2023-09-26 北京电科智芯科技有限公司 Meteorological prediction model correction method, device, equipment and readable storage medium
CN116805045B (en) * 2023-08-17 2024-01-23 北京电科智芯科技有限公司 Meteorological prediction model correction method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN114422267B (en) Flow detection method, device, equipment and medium
CN110704751A (en) Data processing method and device, electronic equipment and storage medium
CN113592535B (en) Advertisement recommendation method and device, electronic equipment and storage medium
CN112836128A (en) Information recommendation method, device, equipment and storage medium
CN109829117B (en) Method and device for pushing information
CN116072108A (en) Model generation method, voice recognition method, device, medium and equipment
CN113033680B (en) Video classification method and device, readable medium and electronic equipment
CN113033707B (en) Video classification method and device, readable medium and electronic equipment
CN110795554A (en) Target information analysis method, device, equipment and storage medium
CN113592557A (en) Attribution method and device of advertisement putting result, storage medium and electronic equipment
CN113051400B (en) Labeling data determining method and device, readable medium and electronic equipment
CN112685996B (en) Text punctuation prediction method and device, readable medium and electronic equipment
CN112669816B (en) Model training method, voice recognition method, device, medium and equipment
CN112734462B (en) Information recommendation method, device, equipment and medium
CN110069997A (en) Scene classification method, device and electronic equipment
CN116416018A (en) Content output method, content output device, computer readable medium and electronic equipment
CN110189000B (en) Grading unification method and device and storage medium
CN113177176A (en) Feature construction method, content display method and related device
CN115981987A (en) Test case processing method and device
CN113223496A (en) Voice skill testing method, device and equipment
CN115240711A (en) Audio quality analysis method and device, electronic equipment and computer-readable storage medium
CN114697763B (en) Video processing method, device, electronic equipment and medium
CN111343245A (en) Uploading line scheduling method and device, electronic equipment and readable storage medium
CN111432080A (en) Ticket data processing method, electronic equipment and computer readable storage medium
CN116823407B (en) Product information pushing method, device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211102