CN102663617A - Method and system for prediction of advertisement clicking rate - Google Patents

Method and system for prediction of advertisement clicking rate Download PDF

Info

Publication number
CN102663617A
CN102663617A CN201210074541XA CN201210074541A CN102663617A CN 102663617 A CN102663617 A CN 102663617A CN 201210074541X A CN201210074541X A CN 201210074541XA CN 201210074541 A CN201210074541 A CN 201210074541A CN 102663617 A CN102663617 A CN 102663617A
Authority
CN
China
Prior art keywords
data
sample
set
click
training
Prior art date
Application number
CN201210074541XA
Other languages
Chinese (zh)
Inventor
李娜
罗峰
黄苏支
Original Assignee
亿赞普(北京)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 亿赞普(北京)科技有限公司 filed Critical 亿赞普(北京)科技有限公司
Priority to CN201210074541XA priority Critical patent/CN102663617A/en
Publication of CN102663617A publication Critical patent/CN102663617A/en

Links

Abstract

The invention provides a method and a system for prediction of advertisement clicking rate to solve a problem of accuracy of the clicking rate prediction affected by seriously imbalanced sample data in the original sample set. The method comprises: extracting sample data to construct an original sample set, wherein the sample data comprises clicked data and unclicked data of users; constructing a training sample set by carrying out sampling on the original sample set; constructing a prediction model by using the sample data in the training sample set as model parameters; and predicting the user clicking rate of each kind of advertisement by utilizing the prediction model to predict the testing sample set. The method and the system in the invention can eliminate the problem of serious imbalance of the proportion between the clicked data and the unclicked data in the original sample set and construct a relatively balanced training sample set. The method and the system improve recognition rate of the prediction model on the clicked rate and improve the accuracy of clicking rate prediction.

Description

一种广告的点击率预测方法及系统 An advertising click-through rate prediction method and system

技术领域 FIELD

[0001] 本申请涉及网络技术,特别是涉及一种广告的点击率预测方法及系统。 [0001] The present application relates to network technology, particularly to an advertising method and system for forecasting CTR.

背景技术 Background technique

[0002] 互联网的兴起使人们可以在浏览相同的页面时看到不同的广告,可以实现广告的个性化展示。 [0002] The rise of the Internet so that people can view the same page to see different ads, you can personalize your ads. 通过对点击率进行测试,可以了解不同用户感兴趣的广告,从而向每个用户更精准的展示对应的广告,以提高广告的点击率,改善广告投放效果和页面的访问量。 Through the CTR for testing, you can see different advertisements of interest to the user, so that the corresponding display ads to each user more precise, in order to improve CTR, improve advertising effectiveness and the amount of access the page.

[0003] 对点击率进行测试时需要对历史的投放效果进行分析建摸。 [0003] need to put in effect the history of the analysis was built upon touch on hits for testing. 首先要提取样本数据来构建原始样本集合,对点击率进行预测,因此样本数据中包括用户的点击数据和未点击数据。 Extracting first sample data to construct the original sample set, to predict click rate, so the sample data includes a user click data and click data is not. 在这过程中,样本数据的不均衡问题是制约建模效果的一大问题。 In this process, the imbalance of the sample data is a major problem restricting the modeling effect. 资料显示,目前互联网广告投放平均只有O. 3 %左右的用户进行了点击,即广告每展示1000次,只有3次左右的点击。 Statistics show that the average Internet advertising is currently only about O. 3% of users served, clicks that ad impressions per 1,000, only about three clicks.

[0004]因而将原始样本集合作为训练样本集合时,会造成训练样本集合中点击数据和未点击数据之间比例严重失衡。 [0004] Thus when the original sample set as a training set of samples will result in the training sample set and click data click rate is not a serious imbalance between the data. 在这种严重不均衡的样本数据中未点击数据的样本特征占据了绝大部分,在现有技术中,直接将所述原始样本集合作为训练样本集合构造预测模型,导致预测模型的预测结果偏向于未点击数据,预测结果不准确。 In this serious imbalance in the sample data click data is not accounted for most of the characteristics of a sample, in the prior art, set directly as a training sample set constructed prediction model of the original sample, the model predictive forecast results biased not to click data, inaccurate predictions.

[0005] 样本数据分布不均衡一般会造成某类样本数据量非常稀少,如在申请中点击数据非常匮乏。 [0005] sample data will generally result in uneven distribution of certain types of data sample size is very rare, such as click data are very scarce in the application. 而在实际数据挖掘中,噪声数据的存在是不可避免,并在一定程度上对预测或分类模型产生影响。 In the actual data mining, data is inevitable presence of noise, and the impact prediction or classification model to a certain extent. 在这种不均衡的问题中,由于点击数据本身非常稀少,难以提供足够的统计数据将其与噪声数据相区分,抗噪能力相对较弱。 In this imbalance problem, because the click data itself is very rare, it is difficult to provide sufficient statistical data to distinguish data with noise, noise immunity is relatively weak. 从而,少量的噪声样本就会影响训练模型建立和预测結果。 Thus, a small amount of noise samples will affect the training model and predict results.

[0006] 通常原始样本集合中会含有噪声,由于点击数据和未点击数据之间比例严重失衡,例如点击数据和未点击数据的比例为3 : 997,其中含有I个噪声数据,则噪声数据对点击数据的影响就比较大,而对未点击数据的影响比较小。 [0006] Generally the set of original sample will contain noise, since the click data and the click rate is not a serious imbalance between data, such as click data and click data is not a ratio of 3: 997, which contains the I noise data, the noise data impact of click data is relatively large, and the impact on not click data is relatively small.

[0007] 因而基于此种样本数据进行预测模型的训练时,未点击数据在模型中产生影响偏大,进而导致预测模型更加偏向于未点击数据,根据样本数据得到的预测模型的预测结果偏向于未点击数据,而点击数据仅占小部分空间,基于所述预测模型对测试样本进行测试时,偏差的预测模型会产生偏差的预测结果,使得预测结果偏向于未点击的情况从而影响了点击率预测的准确性。 When [0007] Thus trained predictive model based on this sample data, data not click influence in the model is too large, leading to the prediction model more biased click data is not, according to the prediction result of the prediction model is biased in favor of sample data obtained not click data and the click data is only a small part of the space, the model prediction results for the test sample testing, the predictive model bias deviation is generated based on the prediction, the prediction results so that the biased click without affecting the CTR forecast accuracy.

发明内容 SUMMARY

[0008] 本申请提供了一种广告的点击率预测方法及系统,以解决原始样本集合中样本数据严重不均衡会影响点击率预测的准确性的问题。 [0008] The present application provides a click-through rate forecasting method and system for advertising to address the original sample set of sample data in a serious imbalance can affect the accuracy of prediction hits.

[0009] 为了解决上述问题,本申请公开了一种广告的点击率预测方法,包括: [0009] In order to solve the above problems, the present application discloses an advertising click-through rate prediction method, comprising:

[0010] 提取样本数据构建原始样本集合,其中所述样本数据包括用户的点击数据与未点击数据;[0011] 通过对所述原始样本集合进行采样,构建训练样本集合; [0010] Construction of extracting the sample data set of the original sample, wherein the sample data comprises a user click data and click data is not; [0011] By sampling the original sample set, constructing the training sample set;

[0012] 以所述训练样本集合中的样本数据为模型參数构建预测模型; [0012] In samples of the training data set as a model parameter forecast model;

[0013] 利用所述预测模型对测试样本集合进行预测,预测出用户针对每种广告的点击率。 [0013] using the prediction model to predict a set of test sample, the user predicted CTR for each advertisement.

[0014] 优选的,所述通过对所述原始样本集合进行采样构建训练样本集合,包括: [0014] Preferably, the set of samples by the training sample set constructed of the original sample, comprising:

[0015] 以预置的采样比对原始训练样本进行采样,并构建与所述采样比相对应的训练样本集合,其中,所述预置的采样比为通过统计得出的点击数据和未点击数据的比值。 [0015] than the preset sampling samples the original training samples, and constructing the corresponding set of samples than training samples, wherein the sampling ratio is preset by the click data and statistical results not clicked the ratio data.

[0016] 优选的,通过对所述原始样本集合进行采样,构建训练样本集合,包括: [0016] Preferably, by sampling the original sample set, constructing training sample set, comprising:

[0017] 采样中,将所有点击数据均加入所述训练样本集合中。 [0017] The samples, all the click data are added to the training sample set.

[0018] 优选的,所述提取样本数据构建原始样本集合,包括: [0018] Preferably, the sample data extraction establishing the original sample set, comprising:

[0019] 从投放数据中提取某段时间内的数据作为样本数据构建原始样本集合; [0019] extracting data within a certain period of time to build the original sample set as the sample data from the data delivery;

[0020] 并提取原始样本集合中每个样本数据对应的样本特征,所述样本特征用于描述样本数据; [0020] and extracted sample feature original sample set of data corresponding to each sample, wherein the sample is used to describe the sample data;

[0021] 其中,将投放数据中用户的点击次数作为点击数据,用户的未点击次数作为未点击数据。 [0021] wherein the data served user clicks as click data, the user does not click data as a non-clicks.

[0022] 优选的,测试样本集合为:针对广告投放页面,提取点击广告投放页面的用户作为测试的样本数据后,构建的测试样本集合。 [0022] Preferably, the test sample set as follows: for the advertising page, click on the ad to extract user page as a sample test data, a set of test samples constructed.

[0023] 优选的,所述的方法还包括: [0023] Preferably, the method further comprises:

[0024] 针对所述测试样本集合中的用户,在页面中向所述用户展示点击率最高的广告。 [0024] For the test sample set of users, show the highest CTR on a page to the user.

[0025] 相应的,本申请还公开了ー种广告的点击率预测系统,包括: [0025] Accordingly, the present application also discloses a CTR prediction system ad ー species, comprising:

[0026] 构建原始样本集合模块,用于提取样本数据构建原始样本集合,其中所述样本数据包括用户的点击数据与未点击数据; [0026] Construction of the original sample collection module, for extracting the sample data set to build the original sample, wherein the sample data comprises a user click data and click data is not;

[0027] 构建训练样本集合模块,用于通过对所述原始样本集合进行采样构建训练样本集合; [0027] Construction of the training sample set module configured by the original sample sets sampled training sample set constructed;

[0028] 构建预测模型模块,用于以所述训练样本集合中的样本数据为模型參数构建预测模型; [0028] The prediction model constructing module, configured to sample the training data set as a model parameter prediction model constructed;

[0029] 点击率预测模块,用于利用所述预测模型对测试样本集合进行预测,预测出用户针对姆种广告的点击率。 [0029] CTR prediction means for using the predictive model to predict the test sample set, the user hits the prediction for ads Farm species.

[0030] 优选的,所述构建训练样本集合模块,用于以预置的采样比对原始训练样本进行采样,并构建与所述采样比相对应的训练样本集合,其中,所述预置的采样比为通过统计得出的点击数据和未点击数据的比值。 [0030] Preferably, said training sample set constructed module for sampling a preset sampling ratio of the original training samples, and constructing the corresponding sample set of samples than training, wherein said preset sampling ratio is derived by statistical and data not click data click ratio.

[0031] 优选的,所述构建原始样本集合模块,用于从投放数据中提取某段时间内的数据作为样本数据构建原始样本集合;并提取每个样本数据对应的样本特征,所述样本特征用于描述样本数据;其中,将投放数据中用户的点击次数作为点击数据,用户的未点击次数作为未点击数据。 [0031] Preferably, the construct original sample collection module, for extracting the data from a certain time in the delivery data as the original sample data sample set constructed; and extracted sample feature data corresponding to each sample, the sample feature It is used to describe the sample data; wherein the data served user clicks as click data, not user clicks as not click data.

[0032] 优选的,所述的系统还包括: [0032] Preferably, the system further comprises:

[0033] 构建测试样本集合模块,用于针对广告投放页面,提取点击广告投放页面的用户作为测试的样本数据构建测试样本集合。 [0033] Construction of a test sample set collection module for the user for advertising page, click on the advertising pages to extract a sample of test data to build the test sample.

[0034] 展示模块,用于针对所述测试样本集合中的用户,在页面中向所述用户展示点击率最高的广告。 [0034] Display module for the user for the test sample set, showing the highest CTR on a page to the user.

[0035] 与现有技术相比,本申请包括以下优点: [0035] Compared with the prior art, the present application includes the following advantages:

[0036] 本申请提取样本数据构建原始样本集合,其中所述样本数据包括用户的点击数据与未点击数据,然后通过对所述原始样本集合进行采样构建训练样本集合。 [0036] This application extracts the sample data set to build the original sample, wherein the sample data comprises a user click data and click data is not set, then construct the original training sample by sample sets sampled. 本申请没有直接将原始样本集合作为训练样本集合,而是对原始样本集合进行优化来构建训练样本集合,这样就可以消除原始样本集合中点击数据和未点击数据之间比例严重失衡的问题,构建相对均衡的训练样本集合,并且此时噪声数据对所述训练样本集合中点击数据的影响小于噪声数据对原始样本集合中点击数据的影响。 This does not apply directly to the original sample set as a training sample set, but the original sample set is optimized to build a sample set of training, so that you can eliminate the problem of the ratio between the original sample set and click data click data is not serious imbalance, build a relatively balanced set of training samples, and at this time the impact of noise on the data in the training sample set is less than that of the noise data is click data for the original sample set click data. 再以所述训练样本集合中的样本数据为模型參数构建预测模型,比例相对均衡的样本数据使得预测模型对点击数据的识别率比较高,利用所述预测模型对测试样本集合进行预测,预测用户针对每种广告的点击率,此时预测模型不会严重偏向于未点击数据,提高了点击率预测的准确性。 Then the training sample to sample data set as a model parameter forecast model, relatively balanced proportion of the sample data so that the prediction model click data recognition rate is relatively high, using the prediction model to predict a set of test samples, prediction users click-through rate for each ad, the model does not predict at this time heavily skewed in favor not click data to improve the accuracy of prediction hits.

[0037] 其次,本申请所述的采样比所对应的点击数据和未点击数据的比值,是依据多次实验的统计结果得出的,具有统计的准确性和客观性,因此针对采样比进行采样构建的训练样本集合中的数据也具有准确性和客观性,进ー步的提高了点击率预测的准确性。 [0037] Next, the sampling ratio corresponding to click data of the present application and data not click ratio, is based on the statistical results of many experiments obtained, the statistical accuracy and objectivity, thus performed for the sampling ratio training sample data set sample also has built in the accuracy and objectivity, into ー steps to improve the accuracy of prediction hits.

[0038] 再次,本申请可以以预置的采样比对所述原始样本集合进行采样,在采样中将所有点击数据均加入所述训练样本集合中,在保证点击数据数量不变的情况下,減少了训练样本集合中的数据,使得执行模型训练构建预测模型的数据比较少,減少了系统的负担,カロ快了数据的处理速度,提高模型训练的效率。 [0038] Again, the present application may be pre-sampling ratio of the original sample set sampled data in the sampling of all clicks are added to the training sample set, the number of hits in the case of guaranteed constant data, reducing data sample set of training, so that the implementation of the model training data to build a predictive model is relatively small, reducing the burden on the system, ka ro faster data processing speed, improve the efficiency of the model training.

[0039] 再次,本申请以预置的采样比为取样数量的依据,因此在构建测试样本集合时,可以选取在最符合所述集合特征的样本,提高样本的准确性和针对性,进ー步提高了测试的准确性。 [0039] Again, the present application at a preset sampling ratio of the number of sampling basis, so when constructing a set of test sample, the sample can be selected that best meet the set of features to improve the accuracy and pertinence of the sample, into ーfurther improve the accuracy of the test.

附图说明 BRIEF DESCRIPTION

[0040] 图I是本申请实施例所述ー种广告的点击率预测方法流程图; [0040] Figure I is a CTR prediction method of the present embodiment application of the advertisement flowchart ー species;

[0041] 图2是本申请优选实施例所述ー种广告的点击率预测方法流程图; [0041] FIG 2 is a flowchart CTR prediction method of the preferred embodiment of the present application advertisement ー species;

[0042] 图3是本申请优选实施例所述ー种广告的点击率预测方法中基于特征空间决策面示意图; [0042] FIG. 3 is a schematic view of a feature space based on the decision surface CTR prediction method of the preferred embodiment of the present application advertisement ー species;

[0043] 图4是本申请优选实施例所述ー种广告的点击率预测方法示意图; [0043] FIG. 4 is a schematic view of a preferred embodiment of the present application CTR prediction method according to the types of advertisement ー;

[0044] 图5是本申请实施例所述ー种广告的点击率预测系统结构图。 [0044] FIG 5 is a configuration diagram of CTR ー prediction system according to the kinds of ads embodiment of the present application.

具体实施方式 Detailed ways

[0045] 为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。 [0045] The above object of the present application, features and advantages can be more fully understood in conjunction with the accompanying drawings and the following specific embodiments of the present application will be further described in detail.

[0046] 现有技术中,原始样本集合中点击数据和未点击数据之间比例严重失衡,导致预测模型的预测结果偏向于未点击数据,从而影响了点击率预测的性能和准确性。 [0046] In the prior art, the original sample set and click data is not a serious imbalance between the click rate data, forecast results in the predictive model is biased not click data, thus affecting the CTR and performance prediction accuracy.

[0047] 本申请对原始样本集合进行优化来构建训练样本集合,采用比例相对均衡的训练样本集合中的样本数据构建预测模型,利用所述预测模型对测试样本集合进行预测,可以预测用户针对每种广告的点击率,此时预测模型不会偏向于未点击数据,提高了点击率预测的准确性。 [0047] This application is optimized for the original sample set of training sample set is constructed, the sample data set of samples using a relatively balanced proportion of the forecast model training, using the prediction model to predict a set of test sample can be predicted for each user CTR kinds of advertising, the model does not predict at this time tend to not click data to improve the accuracy of prediction hits. [0048] 參照图1,其给出了本申请实施例所述ー种广告的点击率预测方法流程图。 [0048] Referring to FIG. 1, which shows the application of the present embodiment, the kinds of ads ー CTR prediction method flowchart.

[0049] 步骤11,提取样本数据构建原始样本集合,其中所述样本数据包括用户的点击数据与未点击数据; [0049] Step 11, the sample data extraction establishing the original sample set, wherein the sample data comprises a user click data and click data is not;

[0050] 首先要提取样本数据来构建原始样本集合,对点击率进行预测,因此样本数据中包括用户的点击数据和未点击数据。 [0050] First, the sample to be extracted to construct the original data sample set of prediction hits, so the sample data includes a user click data and click data is not.

[0051] 其中,所述原始样本集合中存在点击数据和未点击数据之间比例严重失衡的情况。 [0051] wherein, where the ratio between the click data and click data is not a serious imbalance in the presence of the original sample set.

[0052] 步骤12,通过对所述原始样本集合进行采样构建训练样本集合; [0052] Step 12, the original sample by sample set of training sample set constructed;

[0053] 上述论述可知,若直接将所述原始样本集合作为训练样本集合构造预测模型,会导致预测模型的预测结果偏向于点击数据。 [0053] The foregoing discussion can be seen, when the original sample is directly set as the training sample set constructed prediction model, the prediction result of the prediction model will lead to a biased click data.

[0054]为了解决这个问题,本申请中不是直接将所述原始样本集合作为训练样本集合,而是对原始样本集合进行采样,使用采样后的样本数据来构建训练样本集合。 [0054] In order to solve this problem, the present application is not directly in the original sample set as the training sample set, but the original sample set sampled using the sample data sampled to construct the training sample set. 因此所述训练样本集合中的样本数据也包括点击数据和未点击数据。 Thus the training data sample set not also include click data and click data.

[0055] 采样后构建的训练样本集合中的样本数据分布比较均衡,则若含噪声数据,则噪声数据对点击数据和未点击数据的影响也比较均衡。 [0055] The sample data of training sample sets sampled constructed more balanced distribution, if the data is noisy, the influence on the data and click data is click data is not noise is more balanced. 点击数据本身就能提供足够的统计数据将其与噪声数据相区分,抗噪能力相对较高。 Click on the data itself will be able to provide sufficient statistical data to distinguish data with noise, noise immunity is relatively high. 从而即使含有少量的噪声样本,也不会影响训练模型建立和预测结果,因此构建的预测模型结构比较准确。 So that even a small amount of noise samples, it will not affect the establishment and training model to predict results, so predictive model structure built more accurate.

[0056] 步骤13,以所述训练样本集合为模型參数构建预测模型; [0056] Step 13, the training sample set to construct a predictive model for model parameters;

[0057] 训练样本集合构造完成后,可以以所述训练样本集合中的样本数据为模型參数,来构建预测模型。 [0057] After the training sample set constructed, the training data may be set as a sample model parameter to forecast model. 其中预测模型有多种,可根据具体需求选择,本申请对此不做限定。 Wherein the predictive model variety, can be selected according to the needs, the present application is not limited to this.

[0058] 步骤14,利用所述预测模型对测试样本集合进行预测,预测出用户针对每种广告的点击率。 [0058] Step 14, using the prediction model to predict a set of test sample, the user predicted CTR for each advertisement.

[0059] 对测试样本集合进行预测时,可以预测出用户针对每种广告的点击率,例如,针对电子商务类广告的点击率为50%,针对网游类广告的点击率为20%,针对网站推介类广告的点击率为15%,其它为15%。 When [0059] the test sample set to predict, the user can predict the CTR for each ad, for example, for e-commerce ad click rate of 50%, click-through rate for online advertising category 20% for the website No click-through rate of 15% ad category, the other 15%.

[0060] 综上所述,本申请提取样本数据构建原始样本集合,其中所述样本数据包括用户的点击数据与未点击数据,然后通过对所述原始样本集合进行采样构建训练样本集合。 [0060] In summary, the present application extracts the sample data set to build the original sample, wherein the sample data comprises a user click data and click data is not, then by sampling the original sample set of training sample set constructed. 本申请没有直接将原始样本集合作为训练样本集合,而是对原始样本集合进行优化来构建训练样本集合,这样就可以消除原始样本集合中点击数据和未点击数据之间比例严重失衡的问题,构建相对均衡的训练样本集合,并且此时噪声数据对所述训练样本集合中点击数据的影响小于噪声数据对原始样本集合中点击数据的影响。 This does not apply directly to the original sample set as a training sample set, but the original sample set is optimized to build a sample set of training, so that you can eliminate the problem of the ratio between the original sample set and click data click data is not serious imbalance, build a relatively balanced set of training samples, and at this time the impact of noise on the data in the training sample set is less than that of the noise data is click data for the original sample set click data. 再以所述训练样本集合中的样本数据为模型參数构建预测模型,比例相对均衡的样本数据使得预测模型对点击数据的识别率比较高,利用所述预测模型对测试样本集合进行预测,预测用户针对每种广告的点击率,此时预测模型不会严重偏向于未点击数据,提高了点击率预测的准确性。 Then the training sample to sample data set as a model parameter forecast model, relatively balanced proportion of the sample data so that the prediction model click data recognition rate is relatively high, using the prediction model to predict a set of test samples, prediction users click-through rate for each ad, the model does not predict at this time heavily skewed in favor not click data to improve the accuracy of prediction hits.

[0061] 參照图2,其给出了本申请优选实施例所述ー种广告的点击率预测方法流程图。 [0061] Referring to Figure 2, which shows a preferred embodiment of the present application its CTR ー species flowchart prediction method. [0062] 步骤21,从投放数据中提取某段时间内的数据作为样本数据构建原始样本集合; [0062] Step 21, extracting data from a certain time in the delivery data as the original sample data sample set constructed;

[0063] 例如,针对广告的点击率进行测试时,存在ー份投放数据,其中包括某种广告的投放次数,在这些投放次数中,对应用户的点击次数和未点击次数。 [0063] For example, when tested against CTR, the presence ー parts delivery data, including a certain number of ads served, the number of those serving, the user clicks and the corresponding non-clicks.

[0064] 因此可以从投放数据中获取某一段时间内的数据,统计针对某种广告,用户的点击次数作为点击数据,用户的未点击次数作为未点击数据,将所述点击数据和未点击数据作为样本数据构建样本集合。 [0064] Therefore, the data can be acquired within a certain period of time from the delivery data, the statistics for a certain ad, the user clicks as click data, not the user clicks as not click data, the data did not click and click data Construction sample set as the sample data.

[0065] 例如,若预测广告的点击率,则样本数据中包括是否点击所投放的广告。 [0065] For example, if the predicted CTR, the sample data including whether to click on the ads you run. 若预测网页中娱乐新闻的点击率,则样本数据中包括针对所述娱乐新闻的点击次数和未点击次数。 If the forecast pages of entertainment news hits, the sample data includes the number of clicks for entertainment news and not clicks.

[0066] 统计某一段时间内投放数据中用户是否点击了所投放的广告,可以统计出样本数据中有多少点击数据,和多少未点击数据。 [0066] Statistical data delivery within a certain period of time whether the user clicks on an ad you run, you can count the number of click data, and the number of click data sample data is not there. 如,在最近的3个月内,针对某广告投放了1000次,通过统计可能得到点击数据为2,未点击数据为998。 For example, in the last three months, for a 1000 advertising times, you might get click data is 2, 998 not click data through statistics.

[0067] 当然样本数据中还包括其他的数据,本申请对此不做限定。 [0067] Of course, the sample data also includes other data, which is not defined in the present application. 例如,针对广告的点击率预测: For example, for CTR's forecast:

[0068] 首先从广告投放日志中提取出需要投放的广告的样本数据,一个样本数据可以包括该次投放的用户标识ID或用户的IP(Internet Protocol,网络之间互连的协议)地址、广告的投放时间、所投放广告的url (Uniform Resource Locator,统ー资源定位符),及用户是否点击了所投放的广告。 [0068] First, the extracted sample data need to serve ads from ad log, a sample data may include a user identifier or user ID of the sub-served IP (Internet Protocol, the interconnection between the network protocol) address, advertising delivery time, the advertising url (Uniform resource locator, the system ー resource locator), and whether the user clicks on an ad served.

[0069] 步骤22,提取每个样本数据对应的样本特征,所述样本特征用于描述样本数据; [0069] Step 22, the extracted sample feature data corresponding to each sample, wherein the sample is used to describe the sample data;

[0070] 构建ー个样本集合,除了提取样本数据以外,还要提取对应的样本特征,其中样本特征用于描述样本数据,每ー维特征都由一个对应的特征值进行量化,通过特征值可以区分不同的样本。 [0070] Construction ー sample set, in addition to extracting data other than the sample, but also taking a sample corresponding features, wherein the sample is used to describe features of the sample data, each characteristic value ー characterized by a dimension corresponding to quantize characteristic value by distinguish between different samples.

[0071] 其中,通过特征值对样本特征进行量化的方法很多,例如,采用样本特征出现的频度来进行度量,又如某个网站的访问特征可以使用固定时间内的访问量除以注册用户的总数等,本申请对此不做限定。 [0071] wherein, the quantizing method wherein characteristic values ​​of many samples, e.g., using the frequency of occurrence of sample characteristics to measure, and if a site visit characteristics may be used to access a fixed amount of time divided by registered users the total number, which is not defined in the present application.

[0072] 从网络报文日志中提取投放样本数据对应的样本特征,其中样本特征至少包含以下一项特征: [0072] extracted from the network message delivery log data corresponding to a sample wherein the sample, wherein the sample comprises at least one of the following characteristic features:

[0073] 用户的相关特征、投放url相关特征和广告的相关特征。 Related features [0073] users and deliver relevant characteristics url relevant features and advertising.

[0074] 其中,用户的相关特征可以包括:用户历史所访问的网站或网页、用户使用的查询词信息、用户以前所点击过的广告信息等。 [0074] where the relevant characteristics of the user may include: historical site users visit web pages or query word information used by the user, the user previously clicked on the ad information.

[0075] 投放url相关特征包括:内容特征、锚文本特征、url所对应的查询词特征及超链接特征等。 [0075] launch url-related features include: feature content, anchor text features, characteristics url query words corresponding to the hyperlink feature and so on.

[0076] 广告的相关特征可以包括:广告维度,广告的着陆页(Landing Page,有时被称为首要捕获用户页)相关特征、竞价词、描述信息、广告行业等。 [0076]-related features ads can include: advertising dimensions ad's landing page (Landing Page, sometimes referred to as the primary capture user page) related features, bidding terms, descriptions, advertising and other industries.

[0077] 例如,广告的点击率预测中,样本特征为电子商务广告,还可以包括对应的链接信息和竞价词等。 [0077] For example, the predicted CTR, the sample feature for e-commerce advertising, you can also include the link for information and bid phrase and so on. 又如,网页中娱乐新闻的点击率预测,则样本特征可以为特征词和内容特征 In another example, the entertainment news Web page hits forecast, the sample can be characterized by the word features and content features

坐寸ο Sit inch ο

[0078] 步骤23,以预置的采样比对原始训练样本进行采样,并构建与所述采样比相对应的训练样本集合; [0078] Step 23, at a preset sampling ratio of the original sampling training samples, and constructing the sampling ratio corresponding training sample set;

[0079] 本申请可以预先设置一个采样比P,对原始训练样本进行采样时可以按照预置的采样比进行采样。 [0079] This application may be previously set a sampling ratio of P, the sampling ratio may be sampled according to a preset training samples when the original sample. 其中所述预置的采样比是通过统计得出的点击数据和未点击数据的比值,使二者分布相对原始样本集合更加均衡。 Wherein the preset ratio is obtained by statistically derived sample of click data and click data is not a ratio of the relative distribution of both the original sample set more balanced.

[0080] 所述采样比的确定方法可以包括: Determination of [0080] the sampling ratio may include:

[0081] I、设置i个采样比,针对每一个采样比,分别对原始样本集合进行采样构建训练样本集合,再构建预测模型。 [0081] I, the i-th sampling ratio is provided, for each sampling ratio, respectively, of the original sample set sampled training sample set constructed, then forecast model. 每个预测模型对应ー份预测指标,所述预测指标是关于广告投放的准确率和召回率的指标。 Each prediction model corresponding parts ー predictor, the prediction accuracy is an indicator and recall regarding advertising.

[0082] 其中,第一个采样比对应的预测指标为A1,......,第i个采样比对应的预测指标 [0082] wherein the first sampling ratio for the predictor corresponding to A1, ......, i-th sample index than the corresponding prediction

为AiO For the AiO

[0083] 2、直接使用原始样本集合构建预测模型,该预测模型对应的预测指标为B。 [0083] 2, using the raw sample set constructed prediction model, the prediction model for the corresponding predictor B.

[0084] 针对i个采样比对应i个预测指标Ai,将每ー个预测指标Ai分别与预测指标B进行比较,多次重复上述过程,统计比较后的結果。 [0084] corresponding to the i-th predictor than Ai for i samples, each index Ai ー prediction predictors are compared with B, the above process is repeated a plurality of times, the statistical comparison result.

[0085] 其中,若预测指标Ai大于等于预测指标B,则为合适的采样比,可以用来对原始样本集合进行采样。 [0085] wherein, if predictor predictor Ai greater than or equal to B, a suitable sampling ratio, can be used to sample the original sample set. 否则,为不合适的采样比,不能用来对原始样本集合进行采样。 Otherwise, it is not appropriate sampling ratio can not be used to sample the original sample set.

[0086] 针对原始样本集合中点击数据和未点击数据之间比例严重失衡的问题,本申请所述选取的采样比P的取值范围可以为I : 2〜I : 10,适用于原始样本集合中点击数据和未点击数据之间比例大于I : 10的情况。 [0086] click data and the click rate is not a serious problem of imbalance between the data for the original sample set, the application of the selected sampling ratio of P in the range may be I: 2~I: 10, suitable for the original sample set the ratio between the click data and click data is not greater than I: 10 cases.

[0087] 根据预置的采样比P进行采样后,采样出的点击数据和未点击数据即可作为样本数据构建对应的训练样本集合。 [0087] The samples after a preset sampling ratio P, the sampling is not the click data and click data corresponding to the training sample set constructed as the sample data.

[0088] 在采样中,可以将原始样本集合中所有点击数据均加入所述训练样本集合中,此时可以在训练样本集合中最大限度的保留点击数据。 [0088] In the sampling, the sample set may be all the raw click data is added to the set of training samples, the training samples can now be set to maximize the retention in the click data. 若原始样本集合中点击数据为η个,则训练样本集合中点击数据为η个,未点击数据为η*Ρ个。 If the original sample set is η a click data, the training sample set is η a click data, click data is not η * Ρ months.

[0089] 例如,原始样本集合中样本数据为1000个,其中点击数据与未点击数据的比值为2 : 998,则采样时可以按照预置的采样比P= I : 10进行采样,将原始样本数据中所有点击数据加入构建的训练样本集合中,则训练样本集合中样本数据为22个,其中点击数据为2个,未点击数据为20个。 [0089] For example, the original sample is the sample data set 1000, where the data is not click data is click ratio of 2: 998, the sampling ratio P = I can be preset by the sampling: sampling 10, the original sample All click data is added to the data sample set constructed training, the training sample set in the sample data is 22, two of which click data is not for the click data 20.

[0090] 此时保留了全部的点击数据,并且改善了原始样本集合中样本数据分布非常不均衡的问题。 [0090] In this case retains all the click data, and improve the original sample set of sample data distribution is very uneven problems. 其中预置的采样比是通过统计得出的,具有客观性和准确性。 In which the pre-sampling ratio is derived by statistical, objectivity and accuracy.

[0091] 步骤24,以所述训练样本的样本数据集合为模型參数构建预测模型; [0091] Step 24, the training data samples to a set of model parameters forecast model;

[0092] 以所述训练样本集合的样本数据为模型參数构建预测模型,例如构建中根据需求选择对应的预测模型,如BT (behavioral targeting,用户行为定向)模型、CM (contextualmatch,内容匹配)模型或捜索触发模型。 [0092] In the training sample data sample set construct a predictive model for the model parameters, for example, selecting a prediction model corresponding to the demand according to as BT (behavioral targeting, user behavior orientation) model, CM (contextualmatch, content matching) Construction Dissatisfied with the cable model or trigger model.

[0093] 例如,所述预测模型为基于概率估计的贝叶斯模型,若以原始样本集合中的样本数据为模型參数,点击数据较少时,对可能会产生广告点击的特征进行概率估计的准确率将会降低,导致对可能产生点击数据的识别率下降。 [0093] For example, the predictive model based on Bayesian model probability estimate, when the sample data In terms of the original sample set of model parameters, small click data, the feature may generate an ad click probability estimation the accuracy rate will be reduced, leading to possible click data recognition rate.

[0094] 而以所述训练样本集合中的样本数据为模型參数,点击数据与未点击数据分布比较均衡,可以提高对可能会产生广告点击的特征进行概率估计的准确率,并且提升对可能产生点击数据的识别率。 [0094] and the training sample to sample data set of model parameters, and click data click data distribution is not more balanced, can improve the accuracy of the ad clicks may produce characteristic of probability estimates, and upgrade possible generating an identification rate of click data.

[0095] 參照图3,给出了本申请优选实施例所述ー种点击率预测方法中基于特征空间决策面示意图。 [0095] Referring to FIG 3, it shows a preferred embodiment of the present application types ー schematic decision surface feature space prediction method based on the CTR.

[0096] 又如,所述预测模型为基于特征空间决策面,该预测模型目的在于寻找使结构风险最小的最优决策面。 [0096] In another example, the predictive model based on the decision surface feature space, the object is to find the smallest predictive model of the risk that the optimal decision surface structure. 当训练样本不均衡时,所选取的支持向量分布也会分布不均衡,在计算结构风险最小过程中,模型会忽略掉点击数据对结构风险的影响,从而扩大了非点击数据的决策边界,导致模型得到的实际决策面与最优决策面产生偏差。 When the training sample is not balanced, the selected support vector distribution will be uneven distribution in the calculation of structural risk minimization process, the model ignores the impact of click data on structural risk, thus expanding the boundaries of non-decision-click data, resulting in the actual decision-surface model to get the optimal decision surface bias. [0097] 图3为特征维度为2吋,点击数据和非点击数据示例图,其中圆圈表示未点击数据本,方框表示点击数据,虚线为最优决策面,实线表示实际决策面。 [0097] FIG. 3 is a characteristic dimension of 2 inches, a non-click data and click data exemplary diagram in which the circle represents click data is not present, represents click data block, the dashed line is the optimal decision surface, the solid line represents the actual decision surface.

[0098] 其中图(a)为当样本分布不均衡吋,由于噪音数据的存在,点击数据和未点击数据在最优决策面附近相互重叠,由于未点击数据在样本量上更占优势,导致模型得到的实际决策面(实线所示)偏向于未点击数据。 [0098] FIG wherein (a) when the sample is unevenly distributed inches, due to the presence of noise data, the click data and click data not overlap each other in the vicinity of the optimal decision surfaces, since no click data on the amount of sample is more dominant, resulting in the actual decision surface (solid line) is biased to the data model has not clicked.

[0099] 图(b)为经过为采样比为I : 2的比例进行采样后,构建的训练样本集合的样本数据分布实例。 [0099] FIG. (B) is through the sampling ratio of I: 2 ratio after sampling, the sample data of the training sample set constructed distribution example. 通过采样,有效抑制了未点击数据对决策面产生的影响,得到的实际决策面更接近于最优决策面。 By sampling, effectively suppresses the influence on the decision surface not click data generated, the actual decision to get closer to the surface optimal decision surface.

[0100] 步骤25,针对广告投放页面,提取点击广告投放页面的用户作为测试的样本数据构建测试样本集合; [0100] Step 25, for advertising pages, extract pages users click on advertising to build a test sample set as a sample test data;

[0101] 在进行点击率预测时,预测模型需要对测试样本集合进行测试,才能得到预测的点击率,因此需要构建测试样本集合。 [0101] When making the prediction hits, forecasting model requires the collection of test samples for testing in order to obtain the predicted CTR, it is necessary to build a test sample set.

[0102] 广告的一次投放对应着具体的广告投放页面,例如,在网页A中投放了广告,则网页A即为广告投放页面。 [0102] corresponds to a specific advertising advertising pages, e.g., page A serving ads in the web page A is the advertising page. 用户点击了某网站的页面,会生成并发送页面请求给所述网站的服务器,若该页面为广告投放页面,则该网站的服务器也会发送广告请求给广告服务器,广告服务器会对用户可能点击的广告进行预测,因此对提取点击投放页面的用户作为测试的样本数据构建测试样本集合,同时还会提取样本数据对应的样本特征。 The user clicks on a site page, the page request is generated and sent to the site's server, if the page is advertising page, the site's server will send an ad request to the ad server, the user may click on ad server advertising forecast, so the user clicks to extract pages serve as a test of the sample data to build a test sample set, while also extracting sample data corresponding sample characteristics.

[0103] 其中,所述测试样本集合中的样本数据可以与训练样本集合的样本数据基本一致,包括用户标识ID或用户的IP地址,广告的投放时间、所投放广告的url等。 [0103] wherein, said test sample data sample in the sample data set with the set of training samples may be basically the same, including scheduling a user identification ID or IP address of the user, advertisement, url served ads and the like.

[0104] 步骤26,利用所述预测模型对测试样本集合进行预测,预测出用户针对每种广告的点击率; [0104] Step 26, using the prediction model to predict a set of test sample, the user predicted CTR for each advertisement;

[0105] 例如,所述预测模型为基于特征空间的线性决策面,则可以区分每个特征能够产生点击数据的权重,因此在使用预测模型对测试样本进行预测时,就可以通过计算该样本每ー维特征值与该维度上特征权重之积,并对所有特征维度上的积求和,从而预测出包含ー些样本特征的样本数据对应用户为可能点击某种广告的用户,并且预测出用户针对所述广告的点击率,而不包含这些样本特征的样本数据对应用户为不可能点击该广告的用户。 [0105] For example, the predictive model based on a linear decision surfaces in feature space, you can distinguish each feature capable of producing heavy weights click data, so the use of predictive models to predict the test sample, the sample can be calculated by eachー dimensional feature value and feature weights on the weight of the product dimensions, and product summation over all feature dimensions in order to predict some of the sample data comprising ー sample feature corresponding to the user may click on a particular ad users, and predicts the user CTR sample data for the advertisement, the sample does not contain these characteristics corresponding to possible user user clicks on the ad.

[0106] 例如,针对网游类广告,则样本特征中曾经点击过网游网站的用户为可能点击网游类广告的用户,而样本特征中没有点击过网游网站的用户为不可能点击网游类广告的用户。 [0106] For example, ads for online classes, the sample characteristics had clicked the user is likely to click on online websites like online advertising users, and sample characteristics are not clicked on the website for online users can not click on online advertising category of user .

[0107] 通过上述的方法可以预测出的用户可能点击的广告以及针对该广告的点击率,并且用户可能点击的广告有多种,例如,针对电子商务类广告的点击率为50%,针对网游类广告的点击率为20%,针对网站推介类广告的点击率为15%,其它为15%。 [0107] can be predicted by the method described above and a user may click on ads CTR for the ad, and the user may click on a variety of advertising, for example, e-commerce for the ad click rate of 50%, for online games class advertising click-through rate of 20%, click-through rate for the class website No advertising 15%, other 15%.

[0108] 步骤27,针对所述测试样本集合中的用户,在页面中向所述用户展示点击率最高的广告。 [0108] Step 27 for the test sample set of users, show the highest CTR on a page to the user.

[0109] 上述预测出用户针对每种广告的点击率,可以将所有广告的点击率进行排序,选取点击率最高的广告,在用户打开ー个页面时,可以展示所述点击率最高的广告,即该用户的本次访问行为最有可能点击的广告,进而可以提高对应广告的点击率。 [0109] the prediction that the user click-through rate for each ad, you can click-through rate on all ads sort, select the ad with the highest CTR, open ー pages, can show the ad with the highest CTR user, that this access behavior of the user is most likely to click the ad, in turn, can increase the corresponding ad CTR. 此时不同用户即使打开的是同一个页面,显示的广告也可能是不同的。 At this time, different users even open to the same page, display advertising may also be different.

[0110] 例如,预测结果显示用户I可能点击电子商务类广告,用户2可能点击网游广告,因此在用户I和用户2点击同一个网站的首页时,用户I看到的是电子商务类广告,用户2看到的是网游类广告。 [0110] For example, the predicted results show that users I might click on an e-commerce advertising, a user might click online advertising 2, so when a user clicks I and 2 users with a website home page, the user I see e-commerce advertising, 2 users see is the class online advertising.

[0111] 又如,上述预测出用户针对电子商务类广告的点击率为50%,针对网游类广告的点击率为20%,针对网站推介类广告的点击率为15%,其它为15%。 [0111] In another example, the prediction for the user's e-commerce ad click rate of 50%, click-through rate for online advertising category 20% click-through rate for the class website No advertising 15%, other 15%. 则对所有的广告的点击率进行排序后,该用户针对电子商务类广告的点击率最高,因此可以向用户展示电子商务类广告。 After all of the CTR of sort, the highest user for e-commerce advertising click-through rates, so we can show ads to users of e-commerce category.

[0112] 针对,广告的点击率预測,目前进行精准广告投放的主要策略包括捜索触发(sponsored search)、内容匹配、用户行为定向(behavioral targeting, BT)几种方式。 [0112] For, predicted CTR, currently accurate advertising strategies include Dissatisfied with the main trigger cable (sponsored search), content matching, the user behavioral targeting (behavioral targeting, BT) in several ways.

[0113] 其中捜索触发的广告是根据用户向搜索引擎提交的关键词进行广告检索,由于关键词直接反映了用户当前的兴趣,故可以向用户推送与当前捜索内容相关的广告。 [0113] in which the ad is triggered Dissatisfied with cable advertising search by keyword submitted by the user to the search engine, because the keywords are a direct reflection of the user's current interest, it can push advertising related to the current Dissatisfied with cable content to the user.

[0114] 内容匹配则是对用户正在浏览的网页的内容进行建模分析,向用户展示与网页内容相近的广告。 [0114] content is matched to the content the user is browsing the web page modeling analysis, display ads and web content close to the user.

[0115] 用户行为定向可以在根据用户的历史行为记录,如用户的搜索历史、网页浏览历史记录、广告展示和点击记录等,对用户的兴趣和行为进行建模和预测,选取符合该用户兴趣的广告进行展示。 [0115] user behavioral targeting can be based on the historical behavior of the user logged in as user's search history, browser history, ad impressions and clicks records, the user's interests and behavior modeling and forecasting, selected in line with the user's interest ads from appearing.

[0116] 可以根据需求,选择对应的策略建立预测模型,进行点击率预測。 [0116] according to the needs, select the corresponding strategy to establish forecasting model, click-through rate prediction.

[0117] 參照图4,给出了本申请优选实施例所述ー种广告的点击率预测方法示意图。 [0117] Referring to Figure 4, shows a schematic CTR prediction method of the preferred embodiment of the present application advertisement ー species.

[0118] 在广告点击率预测中广告的点击数据的样本数据分布严重失衡。 [01] a serious imbalance in the distribution of sample data predicted CTR ad click data. 在实际工作中,为了保证足够量的小类样本数据(点击数据),这种失衡一方面导致训练样本数据急剧膨胀,増加计算实际和存储资源。 In practice, in order to ensure a sufficient amount of small class sample data (clicks data), on the one hand this imbalance leads to rapid expansion of the training data, to increase in the actual calculation and storage resources. 另ー方面,样本数据失衡可能会对模型的训练性能产生负面影响。 Another ー aspect, the sample data imbalance training performance model may have a negative impact.

[0119] 基于这种观察,本专利提供ー种欠采样策略,在这种样本数据失衡情况下,一方面減少多类样本数据数量,从而降低训练样本集合規模,节省所需要的存储空间和计算资源,提供训练效率;另一方面,由于样本数据分布相对均衡,可以有效避免样本集和本身对模型性能产生的负面影响,提高预测效果。 [0119] Based on this observation, the present patent provides ー species undersampling strategies, in this imbalance sample data, the sample data on the one hand reduce the number of multi-class, thereby reducing the size of the training set samples, to save storage space and computing required resources to provide training efficiency; on the other hand, due to the relatively balanced distribution of sample data, can effectively avoid the negative effects of sample sets and model itself on the performance and improve prediction.

[0120] 下面论述广告的点击率预测方法。 [0120] ad hits discussed below forecast.

[0121] 针对广告的点击率预测,首先可以从网络报文日志中提取样本特征,并从广告投放日志中提取广告投放的样本,所述广告投放的样本包括点击数据和未点击数据。 [0121] CTR for prediction, first network packet can be extracted from the sample wherein the log, and the log samples extracted from the advertisement serving advertising, the advertising sample comprising click data and click data is not. 然后根据所述样本和样本特征构建原始样本集合,通常所述原始样本集合中点击数据和未点击数据分布非常不均衡。 Then the original sample set constructed in accordance with the sample and sample characteristics, typically the original sample set not click data and click data distribution is very uneven. 因此对所述原始样本集合进行采样,构建训练样本集合,所述训练样本集合中点击数据和未点击数据分布比较均衡。 Thus the original sample set sampled, the training sample set constructed, the training sample set and click data is not more balanced distribution of click data. 然后根据以所述训练样本集合为模型參数,构建预测模型,使用预测模型对测试样本集合进行点击率测试。 Then according to the training sample set of model parameters, the prediction model constructing, using a predictive model of the test sample collection test CTR.

[0122] 综上所述,本申请所述的采样比所对应的点击数据和未点击数据的比值,是依据多次实验的统计结果得出的,具有统计的准确性和客观性,因此针对采样比进行采样构建的训练样本集合中的数据也具有准确性和客观性,进ー步的提高了点击率预测的准确性。 [0122] In summary, the present application corresponds to the sampling ratio of the click data and click data is not the ratio is based on the statistical results of many experiments obtained, the statistical accuracy and objectivity, thus for sample set of training data sampling ratio of sampling also has built in the accuracy and objectivity, into ー steps to improve the accuracy of prediction hits.

[0123] 其次,由于点击数据和未点击数据之间比例严重失衡,通常训练样本集合中的数据相应的比较多,因此执行模型训练的数据比较多,系统负担较大,处理速度较慢,影响模型训练的效率。 [0123] Second, since the click data and the click rate is not a serious imbalance between the data, corresponding generally more training data sample set, so the implementation of the model training data more, the burden of a larger system, processing speed is slow, influence efficiency model training. 本申请可以以预置的采样比对所述原始样本集合进行采样,在采样中将所有点击数据均加入所述训练样本集合中,在保证点击数据数量不变的情况下,減少了训练样本集合中的数据,使得执行模型训练构建预测模型的数据比较少,減少了系统的负担,カロ快了数据的处理速度,提高模型训练的效率。 This application may be pre-sampling the sampling ratio of the original sample set in the sample were added to the click data of all the training set samples, to ensure that the number of hits in a case where the same data, reduces the training sample set data, so that the implementation of the model training data to build a predictive model is relatively small, reducing the burden on the system, ka ro faster data processing speed, improve the efficiency of the model training.

[0124] 再次,本申请以预置的采样比为取样数量的依据,因此在构建测试样本集合时,可以选取在最符合所述集合特征的样本,提高样本的准确性和针对性,进ー步提高了测试的准确性。 [0124] Again, the present application at a preset sampling ratio of the number of sampling basis, so when constructing a set of test sample, the sample can be selected that best meet the set of features to improve the accuracy and pertinence of the sample, into ーfurther improve the accuracy of the test.

[0125] 參照图5,给出了本申请实施例所述ー种点击率预测系统结构图。 [0125] Referring to FIG 5, shows the application of the present embodiment, the prediction ー species CTR system configuration of FIG.

[0126] 相应的,本申请还提供了一种点击率的预测系统,该系统中的每个模块均可由计算机实现。 [0126] Accordingly, the present application further provides a system for prediction of CTR, each module in the system may be implemented by a computer. 所述的系统包括:构建原始样本集合模块11、构建训练样本集合模块12、构建预测模型模块13和点击率预测模块15,其中: Said system comprising: a module 11 constructed of the original sample set, the training sample set constructed module 12, module 13 and the forecast model prediction module CTR 15, wherein:

[0127] 构建原始样本集合模块11,用于提取样本数据构建原始样本集合,其中所述样本数据包括用户的点击数据与未点击数据; [0127] Construction of the original sample set module 11 for extracting the sample data set to build the original sample, wherein the sample data comprises a user click data and click data is not;

[0128] 构建训练样本集合模块12,用于通过对所述原始样本集合进行采样,构建训练样本集合; [0128] Construction of the training sample set module 12 for sampling by the original sample set, constructing the training sample set;

[0129] 构建预测模型模块13,用于以所述训练样本集合中的样本数据为模型參数构建预测模型; [0129] forecast model module 13, for the training sample data to the sample set of model parameters forecast model;

[0130] 点击率预测模块15,用于利用所述预测模型对测试样本集合进行预测,预测出用户针对姆种广告的点击率。 [0130] CTR prediction module 15, using the predictive model for the test sample sets are forecasted for the user CTR ads Farm species.

[0131] 优选的,所述构建原始样本集合模块11用于从投放数据中提取某段时间内的数据作为样本数据构建原始样本集合;并提取每个样本数据对应的样本特征,所述样本特征用于描述样本数据。 [0131] Preferably, the original sample set constructed for extracting the data module 11 within a certain period of time from the delivery of the data sample set constructed as the original sample data; and each sample data corresponding to the extracted sample feature, the characteristics of the sample It is used to describe the sample data.

[0132] 其中,将投放数据中用户的点击次数作为点击数据,用户的未点击次数作为未点击数据。 [0132] wherein the data served user clicks as click data, the user does not click data as a non-clicks.

[0133] 优选的,所述构建训练样本集合模块12,用于以预置的采样比对原始训练样本进行采样,并构建与所述采样比相对应的训练样本集合。 [0133] Preferably, said training sample set constructed module 12, for sampling a preset sampling ratio of the original training samples, and constructing the sampling ratio corresponding training sample set.

[0134] 其中,所述预置的采样比为通过统计得出的点击数据和未点击数据的比值。 [0134] wherein, the preset sampling ratio obtained through statistical data not click data and click ratio. 采样中,将所有点击数据均加入所述训练样本集合中。 Sampling, all the click data are added to the training sample set.

[0135] 优选的,所述的系统还包括: [0135] Preferably, the system further comprises:

[0136] 构建测试样本集合模块14,用于针对广告投放页面,提取点击广告投放页面的用户作为测试的样本数据构建测试样本集合。 [0136] build a test sample collection module 14, used to serve page for the ad, the user is extracted click advertising pages to build a test sample set as a sample test data.

[0137] 展示模块16,用于针对所述测试样本集合中的用户,在页面中向所述用户展示点 [0137] Display module 16, a user set for the test sample, showing the point in the page to the user

击率最高的广告。 Hit the highest rate of advertising.

[0138] 对于系统实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处參见方法实施例的部分说明即可。 [0138] For system embodiments, since the method of the embodiment which is substantially similar, the description of a relatively simple, some embodiments of the methods see relevant point can be described.

[0139] 本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相參见即可。 [0139] In the present specification, various embodiments are described in a progressive way, differences from the embodiment and the other embodiments each of which emphasizes embodiment, the same or similar portions between the various embodiments refer to each other .

[0140] 尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。 [0140] While the present disclosure has been described with preferred embodiments, but those skilled in the art from the underlying inventive concept can make further modifications and variations to these embodiments. 所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。 Therefore, the appended claims are intended to explain embodiments including the preferred embodiment as fall within the scope of this application and all changes and modifications.

[0141] 本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。 [0141] skilled in the art should understand that the embodiments of the present disclosure may provide a method, system, or computer program product. 因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。 Accordingly, the present disclosure may be of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in conjunction with the form of software and hardware aspects. 而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。 Further, the present application may take the form of a computer program product embodied in one or more of which comprises a computer usable storage medium having computer-usable program code (including but not limited to, disk storage, CD-ROM, optical memory, etc.).

[0142] 本申请是參照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。 [0142] The present application is a method according to an embodiment of the present application, a flowchart of a computer program product and apparatus (systems) and / or described with reference to block diagrams. 应理解可由计算机程序指令实现流程图和/或方框图中的每ー流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。 It should be understood and implemented by computer program instructions and block, and the flowchart / or per process block diagrams ー / or flowchart illustrations and / or block diagrams of processes and / or blocks. 可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生ー个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图ー个流程或多个流程和/或方框图ー个方框或多个方框中指定的功能的装置。 These computer program instructions may be provided to a processor a general purpose computer, special purpose computer, embedded processor or other programmable data processing apparatus to produce a machine ー, such that the instructions executed by the processor of the computer or other programmable data processing apparatus to produce for one or more processes and / or devices ー block diagram block or blocks one of the functions specified in the flowchart ー achieved.

[0143] 这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图ー个流程或多个流程和/或方框图ー个方框或多个方框中指定的功能。 [0143] These computer program instructions may also be stored in a computer can direct a computer or other programmable data processing apparatus to function in a particular manner readable memory produce an article of manufacture such that the storage instruction means comprises a memory in the computer-readable instructions the instruction means implementing the functions ー one or more processes and / or block diagram block or blocks ー a specified function.

[0144] 这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图ー个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。 [0144] These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps on the computer or other programmable apparatus to produce a computer implemented so that the computer or other programmable apparatus instructions for executing the steps ー provide one or more processes and / or block diagram block or blocks a function specified in the flowchart achieved.

[0145] 本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。 [0145] The present application may be described in the general context of computer-executable instructions, executed by a computer, such as program modules. 一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。 Generally, program modules include performing particular tasks or implement particular abstract data types routines, programs, objects, components, data structures, and the like. 也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。 This application may be practiced in a distributed computing environment, the distributed computing environments, where tasks are performed by remote processing devices that are linked through a communications network. 在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。 In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices in.

[0146] 最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另ー个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。 [0146] Finally, it should be noted that, herein, relational terms such as first and second and the like are only used to distinguish one entity or operation from another entity or operation ー separate, without necessarily requiring or implying any such actual relationship or order between these entities or operations. 而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或 Further, the term "comprising", "containing" or any other variation thereof are intended to cover a non-exclusive inclusion, such that a process, method, article, goods or equipment not include only those elements but not expressly listed further comprising the other elements, or further comprising such process, method, article, or

者设备所固有的要素。 Are inherent elements of the device. 在没有更多限制的情况下,由语句“包括ー个......”限定的要素, Without additional restrictions, by the wording "include a ー ......" element is defined,

并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。 Does not exclude the existence of additional identical elements in the process comprises the element, method, article, or apparatus.

[0147] 以上对本申请所提供的一种广告的点击率预测方法及系统,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。 [0147] or more click-through rate prediction and system for an advertisement herein provided, described in detail herein through specific examples of the principles and embodiments of the present application are set forth in description of the above embodiment except that assist in understanding the method and core ideas of the present application; Meanwhile, those of ordinary skill in the art based on the idea of ​​the present application, in the specific embodiments and application scope of the change, of the specification It shall not be construed as limiting the present application.

Claims (11)

1. 一种广告的点击率预测方法,其特征在于,包括: 提取样本数据构建原始样本集合,其中所述样本数据包括用户的点击数据与未点击数据; 通过对所述原始样本集合进行采样,构建训练样本集合; 以所述训练样本集合中的样本数据为模型參数构建预测模型; 利用所述预测模型对测试样本集合进行预测,预测出用户针对每种广告的点击率。 CTR prediction method for advertising, characterized in that, comprising: extracting the sample data set to build the original sample, wherein the sample data comprises a user click data and click data is not; by sampling the original sample set, Construction of the training sample set; forecast model to the training data sample set of model parameters; using the prediction model to predict a set of test sample, the user predicted CTR for each advertisement.
2.根据权利要求I所述的方法,其特征在于,所述通过对所述原始样本集合进行采样构建训练样本集合,包括: 以预置的采样比对原始训练样本进行采样,并构建与所述采样比相对应的训练样本集合,其中,所述预置的采样比为通过统计得出的点击数据和未点击数据的比值。 2. The method as claimed in claim I, wherein said sample by the original set of training sample set constructed sample, comprising: sampling a preset sampling ratio of the original training samples, and the construction and said set of samples than the corresponding training sample, wherein said preset sampling ratio obtained through statistical data not click data and click ratio.
3.根据权利要求I所述的方法,其特征在于,通过对所述原始样本集合进行采样,构建训练样本集合,包括: 采样中,将所有点击数据均加入所述训练样本集合中。 3. The method as claimed in claim I, characterized by sampling the original sample set, constructing the training sample set, comprising: sampling, click data are all added to the training sample set.
4.根据权利要求I所述的方法,其特征在于,所述提取样本数据构建原始样本集合,包括: 从投放数据中提取某段时间内的数据作为样本数据构建原始样本集合; 并提取原始样本集合中每个样本数据对应的样本特征,所述样本特征用于描述样本数据; 其中,将投放数据中用户的点击次数作为点击数据,用户的未点击次数作为未点击数据。 4. The method as claimed in claim I, wherein said sample data extraction establishing the original sample set, comprising: extracting data from within a certain period of time the data serving as the sample data establishing the original sample set; and extracts the original sample the set of data samples corresponding to each sample wherein said sample wherein the sample is used to describe data; wherein the data served user clicks as click data, not user clicks as not click data.
5.根据权利要求I所述的方法,其特征在干,测试样本集合为: 针对广告投放页面,提取点击广告投放页面的用户作为测试的样本数据后,构建的测试样本集合。 5. The method of claim I, wherein the dry test sample is set: for advertising page, extracts a user click advertising page as a sample after the test data, the test sample set constructed.
6.根据权利要求5所述的方法,其特征在于,还包括: 针对所述测试样本集合中的用户,在页面中向所述用户展示点击率最高的广告。 6. The method according to claim 5, characterized in that, further comprising: a test sample set for the user, highest CTR display to the user on the page.
7. 一种广告的点击率预测系统,其特征在于,包括: 构建原始样本集合模块,用于提取样本数据构建原始样本集合,其中所述样本数据包括用户的点击数据与未点击数据; 构建训练样本集合模块,用于通过对所述原始样本集合进行采样构建训练样本集合; 构建预测模型模块,用于以所述训练样本集合中的样本数据为模型參数构建预测模型; 点击率预测模块,用于利用所述预测模型对测试样本集合进行预测,预测出用户针对每种广告的点击率。 A prediction system ad click rate, characterized by comprising: constructing a set of original sample module, for extracting the sample data set to build the original sample, wherein the sample data comprises a user click data and click data is not; Construction Training sample collection module, for the original sample set by sampling the training sample set constructed; forecasting model module configured to sample data in the training sample set constructed as a model parameter prediction model; CTR prediction module the predictive model for use of the test sample set are forecasted the user click-through rate for each ad.
8.根据权利要求7所述的系统,其特征在于,包括: 所述构建训练样本集合模块,用于以预置的采样比对原始训练样本进行采样,并构建与所述采样比相对应的训练样本集合,其中,所述预置的采样比为通过统计得出的点击数据和未点击数据的比值。 8. The system according to claim 7, characterized in that, comprising: a training sample set constructed module for sampling a preset sampling ratio of the original training samples, and constructing the corresponding sampling ratio training sample set, wherein the pre-sampling ratio obtained through statistical data not click data and click ratio.
9.根据权利要求7所述的系统,其特征在于,包括: 所述构建原始样本集合模块,用于从投放数据中提取某段时间内的数据作为样本数据构建原始样本集合;并提取每个样本数据对应的样本特征,所述样本特征用于描述样本数据;其中,将投放数据中用户的点击次数作为点击数据,用户的未点击次数作为未点击数据。 9. The system of claim 7, further comprising: constructing a set of original sample module, for extracting the data from a certain time in the delivery data as the original sample data sample set constructed; and extracts each wherein sample data corresponding to the sample, wherein the sample is used to describe the sample data; wherein the data served user clicks as click data, not user clicks as not click data.
10.根据权利要求9所述系统,其特征在于,还包括: 构建测试样本集合模块,用于针对广告投放页面,提取点击广告投放页面的用户作为测试的样本数据构建测试样本集合。 10. The system according to claim 9, characterized in that, further comprising: constructing a test sample collection module configured for advertising page, extracts a user click advertising page building test sample set as a sample test data.
11.根据权利要求10所述系统,其特征在于,还包括: 展示模块,用于针对所述测试样本集合中的用户,在页面中向所述用户展示点击率最闻的广告。 11. The system as claimed in claim 10, characterized in that, further comprising: a display module configured to set for the test sample of the user, display the most audible CTR advertisement to the user on the page.
CN201210074541XA 2012-03-20 2012-03-20 Method and system for prediction of advertisement clicking rate CN102663617A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210074541XA CN102663617A (en) 2012-03-20 2012-03-20 Method and system for prediction of advertisement clicking rate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210074541XA CN102663617A (en) 2012-03-20 2012-03-20 Method and system for prediction of advertisement clicking rate

Publications (1)

Publication Number Publication Date
CN102663617A true CN102663617A (en) 2012-09-12

Family

ID=46773097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210074541XA CN102663617A (en) 2012-03-20 2012-03-20 Method and system for prediction of advertisement clicking rate

Country Status (1)

Country Link
CN (1) CN102663617A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880688A (en) * 2012-09-14 2013-01-16 北京百度网讯科技有限公司 Method, device and equipment for evaluating webpage
CN103235893A (en) * 2013-05-06 2013-08-07 重庆大学 User-commodity click rate self-adaptive prediction device and method
CN103246985A (en) * 2013-04-26 2013-08-14 北京亿赞普网络技术有限公司 Advertisement click rate predicting method and device
CN103310003A (en) * 2013-06-28 2013-09-18 华东师范大学 Method and system for predicting click rate of new advertisement based on click log
CN103345512A (en) * 2013-07-06 2013-10-09 北京品友互动信息技术有限公司 Online advertising click-through rate forecasting method and device based on user attribute
CN103746898A (en) * 2013-12-25 2014-04-23 新浪网技术(中国)有限公司 Sampling analysis based e-mail sending method and system
CN103853711A (en) * 2012-11-28 2014-06-11 中国移动通信集团广西有限公司 Text information processing method and device
CN103914475A (en) * 2013-01-05 2014-07-09 腾讯科技(北京)有限公司 Method, system and device for predicting video views
CN104090919A (en) * 2014-06-16 2014-10-08 华为技术有限公司 Advertisement recommending method and advertisement recommending server
CN104268644A (en) * 2014-09-23 2015-01-07 新浪网技术(中国)有限公司 Method and device for predicting click frequency of advertisement at advertising position
CN104536983A (en) * 2014-12-08 2015-04-22 北京掌阔技术有限公司 Method and device for predicting advertisement click rate
CN104778608A (en) * 2015-04-13 2015-07-15 合一信息技术(北京)有限公司 N+ advertisement putting and optimizing method
CN104933075A (en) * 2014-03-20 2015-09-23 百度在线网络技术(北京)有限公司 User attribute predicting platform and method
CN104951965A (en) * 2015-06-26 2015-09-30 深圳市腾讯计算机系统有限公司 Advertisement delivery method and device
CN105095625A (en) * 2014-05-14 2015-11-25 阿里巴巴集团控股有限公司 Click Through Ratio (CTR) prediction model establishing method and device, information providing method and information providing system
CN105654200A (en) * 2015-12-30 2016-06-08 上海珍岛信息技术有限公司 Deep learning-based advertisement click-through rate prediction method and device
CN105915438A (en) * 2016-04-15 2016-08-31 北京奇虎科技有限公司 Message pushing method, apparatus, and system
CN106227743A (en) * 2016-07-12 2016-12-14 精硕世纪科技(北京)有限公司 Advertisement target group touches and reaches ratio estimation method and device
WO2017107571A1 (en) * 2015-12-24 2017-06-29 北京大学 Method and system for determining quality of application on basis of user behaviors of application management
CN107124320A (en) * 2017-06-30 2017-09-01 北京金山安全软件有限公司 Monitoring method, device and the server of data on flows

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101203875A (en) * 2005-03-30 2008-06-18 谷歌公司 Adjusting an advertising cost, such as a per-ad impression cost, using a likelihood that the ad will be sensed or perceived by users
CN101385018A (en) * 2005-12-30 2009-03-11 谷歌公司 Using estimated ad qualities for ad filtering, ranking and promotion
CN101390118A (en) * 2005-12-30 2009-03-18 谷歌公司 Predicting ad quality
CN102110265A (en) * 2009-12-23 2011-06-29 深圳市腾讯计算机系统有限公司 Network advertisement effect estimating method and network advertisement effect estimating system
CN102346899A (en) * 2011-10-08 2012-02-08 亿赞普(北京)科技有限公司 Method and device for predicting advertisement click rate based on user behaviors

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101203875A (en) * 2005-03-30 2008-06-18 谷歌公司 Adjusting an advertising cost, such as a per-ad impression cost, using a likelihood that the ad will be sensed or perceived by users
CN101385018A (en) * 2005-12-30 2009-03-11 谷歌公司 Using estimated ad qualities for ad filtering, ranking and promotion
CN101390118A (en) * 2005-12-30 2009-03-18 谷歌公司 Predicting ad quality
CN102110265A (en) * 2009-12-23 2011-06-29 深圳市腾讯计算机系统有限公司 Network advertisement effect estimating method and network advertisement effect estimating system
CN102346899A (en) * 2011-10-08 2012-02-08 亿赞普(北京)科技有限公司 Method and device for predicting advertisement click rate based on user behaviors

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880688B (en) * 2012-09-14 2016-07-27 北京百度网讯科技有限公司 A method for evaluating web pages, devices and equipment
CN102880688A (en) * 2012-09-14 2013-01-16 北京百度网讯科技有限公司 Method, device and equipment for evaluating webpage
CN103853711A (en) * 2012-11-28 2014-06-11 中国移动通信集团广西有限公司 Text information processing method and device
CN103853711B (en) * 2012-11-28 2017-02-08 中国移动通信集团广西有限公司 Text information processing method and device
CN103914475B (en) * 2013-01-05 2018-05-04 腾讯科技(北京)有限公司 A kind of Forecasting Methodology, system and the device of video playing amount
CN103914475A (en) * 2013-01-05 2014-07-09 腾讯科技(北京)有限公司 Method, system and device for predicting video views
CN103246985A (en) * 2013-04-26 2013-08-14 北京亿赞普网络技术有限公司 Advertisement click rate predicting method and device
CN103246985B (en) * 2013-04-26 2016-12-28 北京亿赞普网络技术有限公司 A kind of ad click rate Forecasting Methodology and device
CN103235893B (en) * 2013-05-06 2016-03-23 重庆大学 A user - Product CTR adaptive prediction method and the prediction means
CN103235893A (en) * 2013-05-06 2013-08-07 重庆大学 User-commodity click rate self-adaptive prediction device and method
CN103310003A (en) * 2013-06-28 2013-09-18 华东师范大学 Method and system for predicting click rate of new advertisement based on click log
CN103345512A (en) * 2013-07-06 2013-10-09 北京品友互动信息技术有限公司 Online advertising click-through rate forecasting method and device based on user attribute
CN103746898A (en) * 2013-12-25 2014-04-23 新浪网技术(中国)有限公司 Sampling analysis based e-mail sending method and system
CN104933075A (en) * 2014-03-20 2015-09-23 百度在线网络技术(北京)有限公司 User attribute predicting platform and method
CN105095625A (en) * 2014-05-14 2015-11-25 阿里巴巴集团控股有限公司 Click Through Ratio (CTR) prediction model establishing method and device, information providing method and information providing system
WO2015192667A1 (en) * 2014-06-16 2015-12-23 华为技术有限公司 Advertisement recommending method and advertisement recommending server
CN104090919A (en) * 2014-06-16 2014-10-08 华为技术有限公司 Advertisement recommending method and advertisement recommending server
CN104090919B (en) * 2014-06-16 2017-04-19 华为技术有限公司 Advertisement recommending method and advertisement recommending server
CN104268644A (en) * 2014-09-23 2015-01-07 新浪网技术(中国)有限公司 Method and device for predicting click frequency of advertisement at advertising position
CN104536983A (en) * 2014-12-08 2015-04-22 北京掌阔技术有限公司 Method and device for predicting advertisement click rate
CN104778608A (en) * 2015-04-13 2015-07-15 合一信息技术(北京)有限公司 N+ advertisement putting and optimizing method
CN104951965A (en) * 2015-06-26 2015-09-30 深圳市腾讯计算机系统有限公司 Advertisement delivery method and device
WO2017107571A1 (en) * 2015-12-24 2017-06-29 北京大学 Method and system for determining quality of application on basis of user behaviors of application management
CN105654200A (en) * 2015-12-30 2016-06-08 上海珍岛信息技术有限公司 Deep learning-based advertisement click-through rate prediction method and device
CN105915438A (en) * 2016-04-15 2016-08-31 北京奇虎科技有限公司 Message pushing method, apparatus, and system
CN105915438B (en) * 2016-04-15 2019-02-19 北京奇虎科技有限公司 Information push method, apparatus and system
CN106227743A (en) * 2016-07-12 2016-12-14 精硕世纪科技(北京)有限公司 Advertisement target group touches and reaches ratio estimation method and device
CN106227743B (en) * 2016-07-12 2019-09-24 精硕科技(北京)股份有限公司 Advertisement target group touching reaches ratio estimation method and device
CN107124320A (en) * 2017-06-30 2017-09-01 北京金山安全软件有限公司 Monitoring method, device and the server of data on flows

Similar Documents

Publication Publication Date Title
US8126874B2 (en) Systems and methods for generating statistics from search engine query logs
CN101071424B (en) Personalized information push system and method
US8291075B1 (en) Detecting events of interest
US8572075B1 (en) Framework for evaluating web search scoring functions
US8799069B2 (en) Mobile click fraud prevention
US10346436B2 (en) Method and medium for a personalized content delivery system
JP5450051B2 (en) Behavioral targeting system
US20130246383A1 (en) Cursor Activity Evaluation For Search Result Enhancement
JP2013506189A (en) Retrieving information based on general query attributes
US7519588B2 (en) Keyword characterization and application
US9479609B2 (en) System for prefetching digital tags
TWI512508B (en) Recommended methods and systems for recommending information
US20110246285A1 (en) Clickable Terms for Contextual Advertising
US10110687B2 (en) Session based web usage reporter
US8244752B2 (en) Classifying search query traffic
US20130110823A1 (en) System and method for recommending content based on search history and trending topics
US9996844B2 (en) Age-targeted online marketing using inferred age range information
US7974970B2 (en) Detection of undesirable web pages
US8352466B2 (en) System and method of geo-based prediction in search result selection
CN101025737A (en) Attention degree based same source information search engine aggregation display method and its related system
CN102346899A (en) Method and device for predicting advertisement click rate based on user behaviors
US20090327224A1 (en) Automatic Classification of Search Engine Quality
US20130275235A1 (en) Using linear and log-linear model combinations for estimating probabilities of events
CN104254851A (en) Method and system for recommending content to a user
CN103562946A (en) Multiple attribution models with return on ad spend

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C12 Rejection of a patent application after its publication