CN110689376A - Click rate prediction method and device and electronic equipment - Google Patents

Click rate prediction method and device and electronic equipment Download PDF

Info

Publication number
CN110689376A
CN110689376A CN201910927957.3A CN201910927957A CN110689376A CN 110689376 A CN110689376 A CN 110689376A CN 201910927957 A CN201910927957 A CN 201910927957A CN 110689376 A CN110689376 A CN 110689376A
Authority
CN
China
Prior art keywords
item
historical
promotion item
historical promotion
click rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910927957.3A
Other languages
Chinese (zh)
Inventor
赵嘉祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN201910927957.3A priority Critical patent/CN110689376A/en
Publication of CN110689376A publication Critical patent/CN110689376A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • G06Q30/0244Optimization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • G06Q30/0256User search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application provides a click rate prediction method and device and electronic equipment. The method comprises the following steps: extracting basic characteristics of a target popularization item to be predicted; inputting the basic characteristics of the target popularization item into an extreme gradient lifting xgboost model to obtain high-order characteristics of the target popularization item; the xgboost model is obtained by training on the basis of basic features of the first historical promotional item and a label of the first historical promotional item, wherein the label of the first historical promotional item indicates the click rate of the first historical promotional item after release; inputting the high-order characteristics of the target popularization item into a prediction model to obtain the click rate of the target popularization item; the prediction model is obtained based on high-order characteristics of the second historical promotion item and label training of the second historical promotion item, the label of the second historical promotion item indicates the click rate of the second historical promotion item after release, and the high-order characteristics of the second historical promotion item and the high-order characteristics of the first historical promotion item have the same characteristic dimension.

Description

Click rate prediction method and device and electronic equipment
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to a click rate prediction method and device and electronic equipment.
Background
With the rapid development of network technology, internet advertisement becomes one of the most important profitable means for internet enterprises. Research shows that accurate placement of internet advertisements can obtain better advertisement benefits, and can effectively reduce the times of search requests initiated to a server by a user because search expectation is not realized, thereby reducing the pressure of the server.
Click-Through-Rate (CTR) is a key index reflecting the accuracy of internet advertisement placement. How to predict the click rate to provide data support for making a more appropriate advertisement putting decision is a technical problem which needs to be solved urgently at present.
Disclosure of Invention
The embodiment of the application aims to provide a method, a device and electronic equipment for predicting click rate, which can predict click rate and provide data support for making a more appropriate advertisement putting decision.
In order to achieve the above purpose, the embodiments of the present application are implemented as follows:
in a first aspect, a method for predicting click rate is provided, including:
extracting basic characteristics of a target popularization item to be predicted;
inputting the basic characteristics of the target popularization item into an extreme gradient lifting xgboost model to obtain high-order characteristics of the target popularization item; the xgboost model is obtained by training based on the basic features of a first historical promotion item and the label of the first historical promotion item, wherein the label of the first historical promotion item indicates the click rate of the first historical promotion item after being released;
inputting the high-order characteristics of the target popularization item into a prediction model to obtain the click rate of the target popularization item; the prediction model is obtained based on high-order characteristics of a second historical promotion item and label training of the second historical promotion item, the label of the second historical promotion item indicates the click rate of the second historical promotion item after the second historical promotion item is released, and the high-order characteristics of the second historical promotion item and the high-order characteristics of the first historical promotion item have the same characteristic dimension.
In a second aspect, an apparatus for predicting click rate is provided, which includes:
the basic feature extraction module is used for extracting basic features of the target popularization item to be predicted;
the high-order characteristic extraction module is used for inputting the basic characteristics of the target popularization item to an extreme gradient lifting xgboost model to obtain the high-order characteristics of the target popularization item; the xgboost model is obtained by training based on the basic features of a first historical promotion item and the label of the first historical promotion item, wherein the label of the first historical promotion item indicates the click rate of the first historical promotion item after being released;
the click rate prediction module is used for inputting the high-order characteristics of the target promotion item into a prediction model to obtain the click rate of the target promotion item; the prediction model is obtained based on high-order characteristics of a second historical promotion item and label training of the second historical promotion item, the label of the second historical promotion item indicates the click rate of the second historical promotion item after the second historical promotion item is released, and the high-order characteristics of the second historical promotion item and the high-order characteristics of the first historical promotion item have the same characteristic dimension.
In a third aspect, an electronic device is provided that includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program being executed by the processor to:
extracting basic characteristics of a target popularization item to be predicted;
inputting the basic characteristics of the target popularization item into an extreme gradient lifting xgboost model to obtain high-order characteristics of the target popularization item; the xgboost model is obtained by training based on the basic features of a first historical promotion item and the label of the first historical promotion item, wherein the label of the first historical promotion item indicates the click rate of the first historical promotion item after being released;
inputting the high-order characteristics of the target popularization item into a prediction model to obtain the click rate of the target popularization item; the prediction model is obtained based on high-order characteristics of a second historical promotion item and label training of the second historical promotion item, the label of the second historical promotion item indicates the click rate of the second historical promotion item after the second historical promotion item is released, and the high-order characteristics of the second historical promotion item and the high-order characteristics of the first historical promotion item have the same characteristic dimension.
In a fourth aspect, a computer-readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the steps of:
extracting basic characteristics of a target popularization item to be predicted;
inputting the basic characteristics of the target popularization item into an extreme gradient lifting xgboost model to obtain high-order characteristics of the target popularization item; the xgboost model is obtained by training based on the basic features of a first historical promotion item and the label of the first historical promotion item, wherein the label of the first historical promotion item indicates the click rate of the first historical promotion item after being released;
inputting the high-order characteristics of the target popularization item into a prediction model to obtain the click rate of the target popularization item; the prediction model is obtained based on high-order characteristics of a second historical promotion item and label training of the second historical promotion item, the label of the second historical promotion item indicates the click rate of the second historical promotion item after the second historical promotion item is released, and the high-order characteristics of the second historical promotion item and the high-order characteristics of the first historical promotion item have the same characteristic dimension.
According to the scheme of the embodiment of the application, the basic features of the popularization item are extracted firstly, then the basic features are input into the xgboost model, the high-order features of the popularization item are efficiently and mechanically generated by utilizing the characteristics of multithread splitting, overfitting prevention, automatic learning of splitting direction under the condition of lacking feature values and the like of the xgboost model, and the limitation of manually setting the high-order features is avoided. And then, inputting the high-order characteristics into a prediction model, and predicting the click rate of the promotion item by the prediction model by further taking the high-order characteristics as reference factors, thereby providing data support for making a more appropriate promotion item release decision, improving the release hit rate of the promotion item, further reducing the frequency of search requests initiated to the server by a user due to the fact that search expectation is not realized to a certain extent, and reducing the pressure of the server.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative efforts.
Fig. 1 is a first flowchart of a method for predicting a click rate according to an embodiment of the present disclosure.
Fig. 2 is a schematic structural diagram of a device for predicting a click rate according to an embodiment of the present disclosure.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As described above, the accurate delivery of internet advertisements can not only obtain better advertisement benefits, but also effectively reduce the number of searches for a server requested by a user, thereby reducing the pressure of the server. The click rate is a key index reflecting the accuracy of internet advertisement delivery. Therefore, the technical scheme capable of predicting the click rate is provided, and data support can be provided for making a release decision.
FIG. 1 is a flowchart illustrating a method for predicting click through rate according to an embodiment of the present disclosure. The method shown in fig. 1 may be performed by a corresponding apparatus, comprising:
and S102, extracting basic characteristics of the target popularization item to be predicted.
The target promotion item may be an advertisement not aiming at profit, such as government announcement, inspiration and declaration in religion, education, culture, municipal administration, social group and the like, or an advertisement or coupon aiming at profit, which is a means for transmitting goods or service information.
Specifically, in this step, the target promotion item to be predicted may be structured to obtain structured data of the target promotion item. And then, carrying out feature recognition on the structured data of the target popularization item to obtain the basic features of the target popularization item. It should be noted that the method for feature recognition belongs to the prior art, and since the present application does not relate to the improvement in this respect, the detailed description is omitted for example.
In addition, the step can also combine at least two extracted basic characteristics to generate new basic characteristics on the basis of the original basic characteristics.
Step S104, inputting the basic characteristics of the target popularization item into an extreme gradient lifting xgboost model to obtain the high-order characteristics of the target popularization item; the xgboost model is obtained by training based on the basic features of the first historical promotion item and the label of the first historical promotion item, and the label of the first historical promotion item indicates the click rate of the first historical promotion item after the first historical promotion item is released.
It should be understood that the number of first historical promotional items is not limited to one, and may refer broadly to a data set consisting of a plurality of historical promotional items.
In this step, the basic features of the first historical promotion item are used as the input of the xgboost model, and the training result output by the xgboost model can be obtained. Meanwhile, a loss function of the xgboost model is obtained through derivation based on the maximum likelihood estimation, and the loss between the training result and the label is calculated based on the loss function. And finally, optimizing the weight value of the basic feature in the xgboost model with the aim of reducing loss to achieve the training effect.
Step S106, inputting the high-order characteristics of the target popularization item into a prediction model to obtain the click rate of the target popularization item; the prediction model is obtained based on high-order characteristics of the second historical promotion item and label training of the second historical promotion item, the label of the second historical promotion item indicates the click rate of the second historical promotion item after release, and the high-order characteristics of the second historical promotion item and the high-order characteristics of the first historical promotion item have the same characteristic dimension.
It should be understood that the implementation manner of the prediction model is not exclusive, and the embodiment of the present application is not limited in this respect. By way of exemplary presentation: the prediction model can be a Deep Neural Network (DNN) model with a classification function, and the popularization items can be classified under the corresponding click rate, so that the prediction purpose is achieved.
Similarly, the number of the second history promotion items is not limited to one, and may refer to a data set composed of a plurality of history promotion items.
In this step, the prediction model can be trained by adopting the supervised training method of the xgboost model. Since the principle is the same, the description is omitted here for illustration.
In addition, as a reasonable prediction scheme, the future result should be predicted by inference based on the previous data. Therefore, the time period corresponding to the first historical promotional item (training the xgboost model) should be earlier than the time period corresponding to the second historical promotional item (training the predictive model).
The prediction method based on the click rate shown in fig. 1 can know that: according to the scheme of the embodiment of the application, the basic features of the popularization item are extracted firstly, then the basic features are input into the xgboost model, the high-order features of the popularization item are efficiently and mechanically generated by utilizing the characteristics of multithread splitting, overfitting prevention, automatic learning of splitting direction under the condition of lacking feature values and the like of the xgboost model, and the limitation of manually setting the high-order features is avoided. And then, inputting the high-order characteristics into a prediction model, and predicting the click rate of the promotion item by the prediction model by further taking the high-order characteristics as reference factors, thereby providing data support for making a more appropriate promotion item release decision, improving the release hit rate of the promotion item, further reducing the frequency of search requests initiated to the server by a user due to the fact that search expectation is not realized to a certain extent, and reducing the pressure of the server.
The method of the embodiments of the present application is described in detail below.
The method of the embodiment of the application is based on the structure of the 'xgboost model' + 'DNN model', and the click rate of the promotion item is predicted. Wherein, the main process comprises:
step S201, training an xgboost model.
In this step, a training data set of the xgboost model is first constructed, and the sample object in the training data set is the first history promotion item described above. The training data includes base features of the first historical promotional item and tags of the first historical promotional item. And the label of the first historical promotion item indicates the click rate of the first historical promotion item after the first historical promotion item is released.
Thereafter, the xgboost-based model is trained based on the training dataset of the xgboost model. And the basic characteristics of the first historical promotion item are used as the input of the xgboost model, and the label of the first historical promotion item is used as the output of the xgboost model.
The above training process involves the implementation of the Xgboost algorithm, which is described as an example below.
Boosting is a method used to improve the accuracy of weak classification algorithms. The Boosting algorithm belongs to a serial integration method, a prediction function of the Boosting algorithm is the integration of a plurality of base classifiers, and the learning process of the Boosting algorithm is to learn the previous (t-1) base classifiers and then learn the t-th base classifier. The Xgboost algorithm is an improvement on the Boosting algorithm, and the most main base learner in the Xgboost algorithm is CART (classification and regression tree), so its prediction function is:
Figure BDA0002219418670000071
where K denotes K decision trees, fkA (k) th tree is shown,
Figure BDA0002219418670000072
represents a sample xiThe predicted result of (1). F ═ F (x) wq(x)}(q:RM→T,w∈RT) Representing a decision tree space, where m represents the dimension of the dataset, T is the number of leaf nodes, q represents the structure of the tree, w represents the score of the leaf nodes, RMFor the sample example, q (x) denotes the mapping of the input sample x to a leaf node of the tree, which corresponds to the leaf node with the reference wq(x). Thus, the regularized objective function can be written as:
Figure BDA0002219418670000073
wherein
Figure BDA0002219418670000074
Is a sample xiTraining error of (d), omega (f)k) Regular terms representing the kth tree.
Wherein gamma and lambda are punishment, and | w | | non-woven phosphor2Regularized to weight L2.
The model obtained by the first t-1 step iterative optimization is assumed to beIn the t step, the parameter to be solved is ft(xi) Then, the objective function in the t step is:
wherein
Figure BDA0002219418670000078
Is a constant term.
Performing a dixel expansion of the above formula can obtain:
Figure BDA0002219418670000079
wherein
Figure BDA00022194186700000710
Figure BDA00022194186700000711
giAnd hiFirst and second derivatives, respectively.
Constant items irrelevant to the parameters to be solved are removed, so that a new optimization target is obtained as follows:
transforming the above equation, iteratively converting the sample into leaf node iterations for the tree:
Figure BDA00022194186700000713
wherein wjIs the result value of the leaf node in the model.
Thus, for a given tree structure, the labels of the leaf nodes can be computed directly:
Figure BDA0002219418670000081
wherein ijThe number of samples for each leaf node.
And the corresponding optimal objective function value:
Figure BDA0002219418670000082
wherein the content of the first and second substances,
Figure BDA0002219418670000083
the accumulated value of the first derivative of the objective function representing all samples within a leaf node,
Figure BDA0002219418670000084
the accumulated value of the second derivative of the objective function representing all samples within a leaf node.
Because the second-order Taylor expansion is carried out on the loss function, and the regular term is added into the objective function, the optimal solution is integrally solved, the complexity of the objective function and the complexity of the model are balanced, and the overfitting prevention effect is achieved. However, since the structure of the tree is unknown, it is also not possible to traverse all of the tree structure. Thus, a greedy algorithm is employed to split the nodes, traversing all attributes from the root node. Traversing the possible values of the attributes, the sample set scored into the left sub-tree is ILThe sample set divided into the right subtree is IRThen the loss reduction value resulting from splitting the node is:
Figure BDA0002219418670000085
wherein, λ is used to reduce the sensitivity of branch yield, γ is the change in complexity caused by adding a new node,
Figure BDA0002219418670000087
i∈Ihi=i∈iLhi+i∈iRhi。
here, an attribute and its corresponding size need to be found, so that the value of the above equation is the largest. Because the tree structure is unknown, only a greedy algorithm can be adopted, and starting from the root node, one attribute and the corresponding value thereof are selected each time, so that the loss function is reduced most. The type of the greedy algorithm is not particularly limited, and those skilled in the art should make corresponding settings according to actual situations, and generally adopt an accurate greedy algorithm for segmentation search.
The following describes the implementation of a precise greedy algorithm for segmentation search:
inputting: i, a sample set of a current node;
inputting: i isk={i∈I|xIkNot equal missing }, wherein xikCharacteristic value of kth column of ith sample;
inputting: d, feature dimension;
gain←0;
G←∑i∈Igi,H←∑i∈Ihi
for k equal to 1, …, m calculates:
GL←0,HL←0;
in the order of Ik(according to x)jkAscending order) of j, calculate:
GL←GL+gi,HL←HL+hj,GR←G-GL,HR←H-HL
wherein G isRIs the sum of the first derivatives of the right child nodes,HRIs the sum of the second derivatives of the right child node, GLIs the sum of the first derivatives of the right child nodes, HLThe second derivative sum of the right child node.
Figure BDA0002219418670000091
Finishing;
finishing;
and (3) outputting: the largest score is the split and default direction.
And step S202, extracting basic features of the target popularization item to be predicted.
Specifically, the target promotion item to be predicted is firstly subjected to structural processing, and structural data of the target promotion item is obtained.
Wherein the structuring process may include, but is not limited to, the following:
1) and (4) cleaning missing values, removing unnecessary fields and reasonably filling missing contents.
2) Format content cleaning, and correcting or deleting the content which does not conform to the format.
3) And (4) cleaning logic errors, removing data duplication, removing unreasonable values and correcting contradictory contents.
4) And cleaning non-demand data.
And then extracting the basic characteristics of the target promotion item from the structured data.
Step S203, inputting the basic characteristics of the target popularization item into an xgboost model to obtain the high-order characteristics of the target popularization item.
And step S204, inputting the high-order characteristics of the target popularization item into the DNN model, and predicting the click rate of the target popularization item.
After the click rate of the target promotion item is predicted, the method of the embodiment of the application can determine whether to release the target promotion item according to the click rate of the target promotion item. For example, if the click rate of the target promotional item is lower than a preset threshold, the target promotional item is determined not to be released, so that the releasing platform displays the high-quality promotional item.
In addition, the method of the embodiment of the application can also determine the display position of the target popularization item on the putting platform according to the click rate of the target popularization item. For example, the click rate of the target promotional item may have a corresponding relationship with the display position of the target promotional item on the delivery platform. The higher the click rate is, the more preferential the releasing platform is to display, so that the user can obtain expected results under the condition of less search times, and the pressure on the server is avoided.
In summary, the complexity of the tree model can be added to the regularization term based on the xgboost model in the embodiment of the present application to avoid overfitting. The loss function of the xgboost model is expanded by Taylor expansion, and a first derivative and a second derivative are used at the same time, so that the optimization speed can be accelerated. When finding the optimal splitting point, the XGBoost model can still automatically find the direction to be split by considering that the efficiency of the traditional greedy algorithm is low, and the approximate greedy algorithm is realized to accelerate and reduce the memory consumption. In addition, the xgboost model integrates a sparse perception segmentation search algorithm to automatically utilize the sparsity of the features to learn the parallelization tree, so that the xgboost model can effectively process the high-dimensional sparse features of the promotion item data and improve the training efficiency of the model.
In correspondence with the above method, as shown in fig. 2, an embodiment of the present application further provides a device 200 for predicting a click rate, including:
the basic feature extraction module 210 extracts basic features of the target popularization item to be predicted;
the high-order feature extraction module 220 is used for inputting the basic features of the target promotion item into an extreme gradient lifting xgboost model to obtain the high-order features of the target promotion item; the xgboost model is obtained by training based on the basic features of a first historical promotion item and the label of the first historical promotion item, wherein the label of the first historical promotion item indicates the click rate of the first historical promotion item after being released;
the click rate prediction module 230 is used for inputting the high-order characteristics of the target promotion item into a prediction model to obtain the click rate of the target promotion item; the prediction model is obtained based on high-order characteristics of a second historical promotion item and label training of the second historical promotion item, the label of the second historical promotion item indicates the click rate of the second historical promotion item after the second historical promotion item is released, and the high-order characteristics of the second historical promotion item and the high-order characteristics of the first historical promotion item have the same characteristic dimension.
The prediction device based on the click rate shown in fig. 2 can know that: according to the scheme of the embodiment of the application, the basic features of the popularization item are extracted firstly, then the basic features are input into the xgboost model, the high-order features of the popularization item are efficiently and mechanically generated by utilizing the characteristics of multithread splitting, overfitting prevention, automatic learning of splitting direction under the condition of lacking feature values and the like of the xgboost model, and the limitation of manually setting the high-order features is avoided. And then, inputting the high-order characteristics into a prediction model, and predicting the click rate of the promotion item by the prediction model by further taking the high-order characteristics as reference factors, thereby providing data support for making a more appropriate promotion item release decision, improving the release hit rate of the promotion item, further reducing the frequency of search requests initiated to the server by a user due to the fact that search expectation is not realized to a certain extent, and reducing the pressure of the server.
Optionally, the time period corresponding to the first historical promotional item is earlier than the time period corresponding to the second historical promotional item.
Optionally, the high-order feature of the second historical popularization item is obtained by inputting the basic feature of the second historical popularization item into the xgboost model.
Optionally, the basic features of the promotional item include at least one of the following feature dimensions:
the image characteristics of the audience objects of the promotion items, the interest characteristics of the promotion items and the historical click rate characteristics corresponding to the promotion items.
Optionally, the click rate of the target promotion item has a corresponding relationship with the display position of the target promotion item on the delivery platform.
Optionally, the predictive model comprises a deep neural network model.
Optionally, extracting the basic features of the target popularization item to be predicted includes:
carrying out structural processing on data of a target promotion item to be predicted to obtain structural data of the target promotion item;
and carrying out feature recognition on the structured data of the target popularization item to obtain the basic features of the target popularization item.
Obviously, the prediction apparatus according to the embodiment of the present application may be an execution subject of the prediction method shown in fig. 1, and thus the prediction apparatus can implement the function of the prediction method implemented in fig. 1. Since the principle is the same, the detailed description is omitted here.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 3, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (peripheral component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 3, but this does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the click rate prediction device on the logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
extracting basic characteristics of a target popularization item to be predicted;
inputting the basic characteristics of the target popularization item into an extreme gradient lifting xgboost model to obtain high-order characteristics of the target popularization item; the xgboost model is obtained by training based on the basic features of a first historical promotion item and the label of the first historical promotion item, wherein the label of the first historical promotion item indicates the click rate of the first historical promotion item after being released;
inputting the high-order characteristics of the target popularization item into a prediction model to obtain the click rate of the target popularization item; the prediction model is obtained based on high-order characteristics of a second historical promotion item and label training of the second historical promotion item, the label of the second historical promotion item indicates the click rate of the second historical promotion item after the second historical promotion item is released, and the high-order characteristics of the second historical promotion item and the high-order characteristics of the first historical promotion item have the same characteristic dimension.
Based on the electronic device shown in fig. 3, it can be known that: according to the scheme of the embodiment of the application, the basic features of the popularization item are extracted firstly, then the basic features are input into the xgboost model, the high-order features of the popularization item are efficiently and mechanically generated by utilizing the characteristics of multithread splitting, overfitting prevention, automatic learning of splitting direction under the condition of lacking feature values and the like of the xgboost model, and the limitation of manually setting the high-order features is avoided. And then, inputting the high-order characteristics into a prediction model, and predicting the click rate of the promotion item by the prediction model by further taking the high-order characteristics as reference factors, thereby providing data support for making a more appropriate promotion item release decision, improving the release hit rate of the promotion item, further reducing the frequency of search requests initiated to the server by a user due to the fact that search expectation is not realized to a certain extent, and reducing the pressure of the server.
The method for predicting click rate disclosed in the embodiment of fig. 1 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
It should be understood that the electronic device according to the embodiment of the present application may implement the functions of the prediction apparatus in the embodiment shown in fig. 1, and details are not described herein again.
Of course, besides the software implementation, the electronic device of the present application does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.
Furthermore, an embodiment of the present application also provides a computer-readable storage medium storing one or more programs, where the one or more programs include instructions, which when executed by a portable electronic device including a plurality of application programs, can cause the portable electronic device to perform the method of the embodiment shown in fig. 1, and specifically to perform the following method:
extracting basic characteristics of a target popularization item to be predicted;
inputting the basic characteristics of the target popularization item into an extreme gradient lifting xgboost model to obtain high-order characteristics of the target popularization item; the xgboost model is obtained by training based on the basic features of a first historical promotion item and the label of the first historical promotion item, wherein the label of the first historical promotion item indicates the click rate of the first historical promotion item after being released;
inputting the high-order characteristics of the target popularization item into a prediction model to obtain the click rate of the target popularization item; the prediction model is obtained based on high-order characteristics of a second historical promotion item and label training of the second historical promotion item, the label of the second historical promotion item indicates the click rate of the second historical promotion item after the second historical promotion item is released, and the high-order characteristics of the second historical promotion item and the high-order characteristics of the first historical promotion item have the same characteristic dimension.
It should be understood that the above instructions, when executed by a portable electronic device comprising a plurality of application programs, can enable the prediction apparatus described above to implement the functions of the embodiment shown in fig. 1, and will not be described in detail herein.
As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims (10)

1. A click rate prediction method comprises the following steps:
extracting basic characteristics of a target popularization item to be predicted;
inputting the basic characteristics of the target popularization item into an extreme gradient lifting xgboost model to obtain high-order characteristics of the target popularization item; the xgboost model is obtained by training based on the basic features of a first historical promotion item and the label of the first historical promotion item, wherein the label of the first historical promotion item indicates the click rate of the first historical promotion item after being released;
inputting the high-order characteristics of the target popularization item into a prediction model to obtain the click rate of the target popularization item; the prediction model is obtained based on high-order characteristics of a second historical promotion item and label training of the second historical promotion item, the label of the second historical promotion item indicates the click rate of the second historical promotion item after the second historical promotion item is released, and the high-order characteristics of the second historical promotion item and the high-order characteristics of the first historical promotion item have the same characteristic dimension.
2. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,
the time period corresponding to the first historical promotion item is earlier than the time period corresponding to the second historical promotion item.
3. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,
and the high-order characteristic of the second historical promotion item is obtained by inputting the basic characteristic of the second historical promotion item into the xgboost model.
4. The method according to any one of claims 1 to 3,
the basic features of the promotional item include at least one of the following feature dimensions:
the image characteristics of the audience objects of the promotion items, the interest characteristics of the promotion items and the historical click rate characteristics corresponding to the promotion items.
5. The method according to any one of claims 1 to 3,
and the click rate of the target promotion item has a corresponding relation with the display position of the target promotion item on the putting platform.
6. The method according to any one of claims 1 to 3,
the predictive model includes a deep neural network model.
7. The method according to any one of claims 1 to 3,
extracting basic characteristics of a target popularization item to be predicted, comprising the following steps:
carrying out structural processing on a target promotion item to be predicted to obtain structural data of the target promotion item;
and carrying out feature recognition on the structured data of the target popularization item to obtain the basic features of the target popularization item.
8. An apparatus for predicting click rate, comprising:
the basic feature extraction module is used for extracting basic features of the target popularization item to be predicted;
the high-order characteristic extraction module is used for inputting the basic characteristics of the target popularization item to an extreme gradient lifting xgboost model to obtain the high-order characteristics of the target popularization item; the xgboost model is obtained by training based on the basic features of a first historical promotion item and the label of the first historical promotion item, wherein the label of the first historical promotion item indicates the click rate of the first historical promotion item after being released;
the click rate prediction module is used for inputting the high-order characteristics of the target promotion item into a prediction model to obtain the click rate of the target promotion item; the prediction model is obtained based on high-order characteristics of a second historical promotion item and label training of the second historical promotion item, the label of the second historical promotion item indicates the click rate of the second historical promotion item after the second historical promotion item is released, and the high-order characteristics of the second historical promotion item and the high-order characteristics of the first historical promotion item have the same characteristic dimension.
9. An electronic device includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program being executed by the processor to:
extracting basic characteristics of a target popularization item to be predicted;
inputting the basic characteristics of the target popularization item into an extreme gradient lifting xgboost model to obtain high-order characteristics of the target popularization item; the xgboost model is obtained by training based on the basic features of a first historical promotion item and the label of the first historical promotion item, wherein the label of the first historical promotion item indicates the click rate of the first historical promotion item after being released;
inputting the high-order characteristics of the target popularization item into a prediction model to obtain the click rate of the target popularization item; the prediction model is obtained based on high-order characteristics of a second historical promotion item and label training of the second historical promotion item, the label of the second historical promotion item indicates the click rate of the second historical promotion item after the second historical promotion item is released, and the high-order characteristics of the second historical promotion item and the high-order characteristics of the first historical promotion item have the same characteristic dimension.
10. A computer-readable storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of:
extracting basic characteristics of a target popularization item to be predicted;
inputting the basic characteristics of the target popularization item into an extreme gradient lifting xgboost model to obtain high-order characteristics of the target popularization item; the xgboost model is obtained by training based on the basic features of a first historical promotion item and the label of the first historical promotion item, wherein the label of the first historical promotion item indicates the click rate of the first historical promotion item after being released;
inputting the high-order characteristics of the target popularization item into a prediction model to obtain the click rate of the target popularization item; the prediction model is obtained based on high-order characteristics of a second historical promotion item and label training of the second historical promotion item, the label of the second historical promotion item indicates the click rate of the second historical promotion item after the second historical promotion item is released, and the high-order characteristics of the second historical promotion item and the high-order characteristics of the first historical promotion item have the same characteristic dimension.
CN201910927957.3A 2019-09-27 2019-09-27 Click rate prediction method and device and electronic equipment Pending CN110689376A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910927957.3A CN110689376A (en) 2019-09-27 2019-09-27 Click rate prediction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910927957.3A CN110689376A (en) 2019-09-27 2019-09-27 Click rate prediction method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN110689376A true CN110689376A (en) 2020-01-14

Family

ID=69110775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910927957.3A Pending CN110689376A (en) 2019-09-27 2019-09-27 Click rate prediction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110689376A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631707A (en) * 2015-12-23 2016-06-01 北京奇虎科技有限公司 Advertisement click rate estimation method based on decision tree, application recommendation method and device
CN109299976A (en) * 2018-09-07 2019-02-01 深圳大学 Clicking rate prediction technique, electronic device and computer readable storage medium
CN109657696A (en) * 2018-11-05 2019-04-19 阿里巴巴集团控股有限公司 Multitask supervised learning model training, prediction technique and device
CN109960759A (en) * 2019-03-22 2019-07-02 中山大学 Recommender system clicking rate prediction technique based on deep neural network
CN110245987A (en) * 2019-06-17 2019-09-17 重庆金窝窝网络科技有限公司 A kind of ad click rate prediction technique, device, server and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631707A (en) * 2015-12-23 2016-06-01 北京奇虎科技有限公司 Advertisement click rate estimation method based on decision tree, application recommendation method and device
CN109299976A (en) * 2018-09-07 2019-02-01 深圳大学 Clicking rate prediction technique, electronic device and computer readable storage medium
CN109657696A (en) * 2018-11-05 2019-04-19 阿里巴巴集团控股有限公司 Multitask supervised learning model training, prediction technique and device
CN109960759A (en) * 2019-03-22 2019-07-02 中山大学 Recommender system clicking rate prediction technique based on deep neural network
CN110245987A (en) * 2019-06-17 2019-09-17 重庆金窝窝网络科技有限公司 A kind of ad click rate prediction technique, device, server and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘旭: "基于深度学习的互联网广告点击率预估方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Similar Documents

Publication Publication Date Title
US11205271B2 (en) Method and device for semantic segmentation of image
CN108009228B (en) Method and device for setting content label and storage medium
US11631234B2 (en) Automatically detecting user-requested objects in images
US20210201143A1 (en) Computing device and method of classifying category of data
US10262272B2 (en) Active machine learning
US9633002B1 (en) Systems and methods for coreference resolution using selective feature activation
WO2020057413A1 (en) Junk text identification method and device, computing device and readable storage medium
WO2016107354A1 (en) Method and apparatus for providing user personalised resource message pushing
US8719025B2 (en) Contextual voice query dilation to improve spoken web searching
CN111783767B (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN111767737A (en) Text intention similarity determining method and device, electronic equipment and storage medium
CN112101031B (en) Entity identification method, terminal equipment and storage medium
US20210110111A1 (en) Methods and systems for providing universal portability in machine learning
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
WO2019223145A1 (en) Electronic device, promotion list recommendation method and system, and computer-readable storage medium
CN112464042B (en) Task label generating method and related device for convolution network according to relationship graph
CN117591547A (en) Database query method and device, terminal equipment and storage medium
US20210056264A1 (en) Neologism classification techniques
CN113239683A (en) Method, system and medium for correcting Chinese text errors
CN111079376B (en) Data labeling method, device, medium and electronic equipment
CN112561530A (en) Transaction flow processing method and system based on multi-model fusion
CN110689376A (en) Click rate prediction method and device and electronic equipment
US11972625B2 (en) Character-based representation learning for table data extraction using artificial intelligence techniques
CN116225973A (en) Chip code testing method and device based on embedded implementation electronic equipment
WO2022237065A1 (en) Classification model training method, video classification method, and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200114