CN107423992A - Determine the method and device of the prediction model of ad click rate - Google Patents

Determine the method and device of the prediction model of ad click rate Download PDF

Info

Publication number
CN107423992A
CN107423992A CN201610346009.7A CN201610346009A CN107423992A CN 107423992 A CN107423992 A CN 107423992A CN 201610346009 A CN201610346009 A CN 201610346009A CN 107423992 A CN107423992 A CN 107423992A
Authority
CN
China
Prior art keywords
advertisement
click
time
data
time window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610346009.7A
Other languages
Chinese (zh)
Inventor
贾东
李斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Che Hui Interactive Advertising Co., Ltd.
Original Assignee
BEIJING YICHE INTERNET INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING YICHE INTERNET INFORMATION TECHNOLOGY Co Ltd filed Critical BEIJING YICHE INTERNET INFORMATION TECHNOLOGY Co Ltd
Priority to CN201610346009.7A priority Critical patent/CN107423992A/en
Publication of CN107423992A publication Critical patent/CN107423992A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a kind of prediction model method and device for determining ad click rate, including:Extracted from the history exposure data of multiple advertisements each advertisement time for exposure first and for the advertisement first click on the time between lag time it is poor, and time window is determined based on the lag time difference of multiple advertisements, based on time window, classification annotation is carried out to each advertisement by the history exposure data of multiple advertisements, the advertisement labeled data of multiple advertisements after classification annotation is trained with Logic Regression Models using semi-supervised supporting vector machine model, to determine to be used for the prediction model for estimating ad click rate.By the present invention, avoid and directly regard as the advertisement exposure for not having to click within time window, so as to reduce the imbalance for estimating middle training data, to further increase the accuracy that ad click rate is estimated also without situation about clicking on outside time window.

Description

Determine the method and device of the prediction model of ad click rate
Technical field
The present invention relates to field of computer technology, specifically, estimating for ad click rate is determined the present invention relates to a kind of The method of model, and a kind of device for the prediction model for determining ad click rate.
Background technology
With the rise of internet, the web advertisement turns into major portal website, search engine, social networks and each terminal The main profit mode of communication apparatus application program.The market of Internet advertising at present increases with surprising rapidity, interconnection The effectiveness that net advertisement plays increasingly seems important.But because the number of advertisement position is limited, it is therefore desirable to a kind of accurate The scheme of ad click rate is estimated, strong reliable reference frame and decision information are provided for advertisement putting side and the side of showing.It is existing Have in technology, generally preset a time window, then whether click on advertisement exposure in the time window according to user Data estimate the clicking rate of advertisement, and still, the behavior of advertisement exposure and user have one between clicking on the behavior of the exposure advertisement Fixed time-lag effect, therefore the feedback of ad click and the feedback of advertisement exposure have regular hour hysteresis quality, will cause existing There is the error estimated between scenario outcomes and actual value of technology larger, it is impossible to be carried out accurately to the clicking rate of advertisement to be predicted Estimate, so as to which good data reference foundation further can not be provided to improve advertisement delivery effect.
The content of the invention
To overcome above-mentioned technical problem or solving above-mentioned technical problem at least in part, spy proposes following technical scheme:
Embodiments of the invention propose a kind of method for the prediction model for determining ad click rate, including:
The time for exposure first of each advertisement and the head for the advertisement are extracted from the history exposure data of multiple advertisements Secondary lag time for clicking between the time is poor, and determines time window based on the lag time difference of the multiple advertisement;
Based on the time window, contingency table is carried out to each advertisement by the history exposure data of the multiple advertisement Note;
Using semi-supervised supporting vector machine model and Logic Regression Models to the multiple advertisement after classification annotation Advertisement labeled data is trained, to determine to be used for the prediction model for estimating ad click rate.
The lag time difference of the multiple advertisement is preferably based on to determine time window, is specifically included:
The desired value of lag time difference is estimated by the average value for the lag time difference for calculating the multiple advertisement;
Time window is determined according to the desired value.
The time window is preferably based on, each advertisement is carried out by the history exposure data of the multiple advertisement Classification annotation, specifically include:
Extract the click feedback information based on the time window that the history exposure data of the multiple advertisement includes;
Classification annotation is carried out to each advertisement based on the click feedback information.
Wherein, classification annotation, including but not limited to scenario described below are carried out to each advertisement based on click feedback information:
Ad classification corresponding to the click feedback information that will have click in the time window is labeled as positive class data;
Ad classification corresponding to the click feedback information without click in the time window is labeled as negative class data;
In the time window no label data will be labeled as without the ad classification for clicking on feedback information.
Preferably, using semi-supervised supporting vector machine model with Logic Regression Models to the multiple after classification annotation The advertisement labeled data of advertisement is trained, to determine to be used for the prediction model for estimating ad click rate, including:
Semi-supervised SVMs mould is trained using the advertisement labeled data of the multiple advertisement after classification annotation Type, to determine corresponding decision function;
Based on the decision function, by training Logic Regression Models to determine to estimate mould for estimate ad click rate Type.
Wherein, the history exposure data of the advertisement includes but is not limited to:
Time for exposure first;For the click time first of advertisement;Click on feedback information.
Wherein, the advertisement mark packet is included but is not limited to:
Classification annotation information;Advertisement correlated characteristic information.
Another embodiment of the present invention proposes a kind of device for the prediction model for determining ad click rate, including:
Determining module, for extracting time for exposure first and the pin of each advertisement from the history exposure data of multiple advertisements It is poor to the lag time clicked on first between the time of the advertisement, and during based on the lag time difference of the multiple advertisement to determine Between window;
Classification annotation module, for based on the time window, by the history exposure data of the multiple advertisement to each Individual advertisement carries out classification annotation;
Training module, for utilizing semi-supervised supporting vector machine model with Logic Regression Models to the institute after classification annotation The advertisement labeled data for stating multiple advertisements is trained, to determine to be used for the prediction model for estimating ad click rate.
Preferably, the determining module specifically includes:
Unit is estimated, when the average value for the lag time difference by calculating the multiple advertisement is to estimate the hysteresis Between poor desired value;
Determining unit, for determining time window according to the desired value.
Preferably, the classification annotation module specifically includes:
Extraction unit, history exposure data for extracting the multiple advertisement include based on the time window Click on feedback information;
Classification annotation unit, for carrying out classification annotation to each advertisement based on the click feedback information.
Wherein, classification annotation, including but not limited to scenario described below are carried out to each advertisement based on click feedback information:
Ad classification corresponding to the click feedback information that will have click in the time window is labeled as positive class data;
Ad classification corresponding to the click feedback information without click in the time window is labeled as negative class data;
In the time window no label data will be labeled as without the ad classification for clicking on feedback information.
Preferably, the training module includes:
First training unit, half prison is trained for the advertisement labeled data using the multiple advertisement after classification annotation The supporting vector machine model superintended and directed, to determine corresponding decision function;
Second training unit, for based on the decision function, being used to estimate extensively by training Logic Regression Models to determine Accuse the prediction model of clicking rate.
Wherein, the history exposure data of the advertisement includes but is not limited to:
Time for exposure first;For the click time first of advertisement;Click on feedback information.
Wherein, the advertisement mark packet is included but is not limited to:
Classification annotation information;Advertisement correlated characteristic information.
In embodiments of the invention, it is proposed that a kind of scheme for the prediction model for determining ad click rate, from multiple advertisements History exposure data in extract each advertisement time for exposure first and for the advertisement first click on the time between it is stagnant Time difference afterwards, and time window is determined based on the lag time difference of multiple advertisements, avoid in the time window artificially set In because of lag time caused by the feedback of ad click and the feedback of advertisement exposure and by the advertisement outside the time window Exposure is divided into the advertisement exposure do not clicked on, so that the feelings that the training data substantial deviation that sampling obtains truly is distributed Condition, greatly improve the accuracy that ad click rate is estimated;Time window then is based on, is exposed by the history of multiple advertisements Data carry out classification annotation to each advertisement, avoid and directly regard as the advertisement exposure for not having to click within time window Also, so as to reduce the imbalance for estimating middle training data, further increased wide without situation about clicking on outside time window Accuse the accuracy that clicking rate is estimated;Meanwhile using semi-supervised supporting vector machine model and Logic Regression Models to classification annotation The advertisement labeled data of multiple advertisements afterwards is trained, and ensure that the accuracy of the classification information in data well, so as to It can accurately be estimated to predicting the clicking rate of advertisement, further, good number be provided to improve advertisement delivery effect According to reference frame.
The additional aspect of the present invention and advantage will be set forth in part in the description, and these will become from the following description Obtain substantially, or recognized by the practice of the present invention.
Brief description of the drawings
Of the invention above-mentioned and/or additional aspect and advantage will become from the following description of the accompanying drawings of embodiments Substantially and it is readily appreciated that, wherein:
Fig. 1 is the flow chart of the method for the prediction model of the determination ad click rate of one embodiment in the present invention;
Fig. 2 is the structural representation of the device of the prediction model of the determination ad click rate of another embodiment in the present invention.
Embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, is only used for explaining the present invention, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singulative " one " used herein, " one It is individual ", " described " and "the" may also comprise plural form.It is to be further understood that what is used in the specification of the present invention arranges Diction " comprising " refer to the feature, integer, step, operation, element and/or component be present, but it is not excluded that in the presence of or addition One or more other features, integer, step, operation, element, component and/or their groups.It should be understood that when we claim member Part is " connected " or during " coupled " to another element, and it can be directly connected or coupled to other elements, or there may also be Intermediary element.In addition, " connection " used herein or " coupling " can include wireless connection or wireless coupling.It is used herein to arrange Taking leave "and/or" includes whole or any cell and all combinations of one or more associated list items.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art Language and scientific terminology), there is the general understanding identical meaning with the those of ordinary skill in art of the present invention.Should also Understand, those terms defined in such as general dictionary, it should be understood that have with the context of prior art The consistent meaning of meaning, and unless by specific definitions as here, idealization or the implication of overly formal otherwise will not be used To explain.
Fig. 1 is the flow chart of the method for the prediction model of the determination ad click rate of one embodiment in the present invention.
Step S110:The time for exposure first that each advertisement is extracted from the history exposure data of multiple advertisements should with being directed to The lag time clicked on first between the time of advertisement is poor, and determines time window based on the lag time difference of multiple advertisements; Step S120:Based on time window, classification annotation is carried out to each advertisement by the history exposure data of multiple advertisements;Step S130:Marked using the advertisement of semi-supervised supporting vector machine model and Logic Regression Models to multiple advertisements after classification annotation Data are trained, to determine to be used for the prediction model for estimating ad click rate.
In embodiments of the invention, it is proposed that a kind of scheme for the prediction model for determining ad click rate, from multiple advertisements History exposure data in extract each advertisement time for exposure first and for the advertisement first click on the time between it is stagnant Time difference afterwards, and time window is determined based on the lag time difference of multiple advertisements, avoid in the time window artificially set In because of lag time caused by the feedback of ad click and the feedback of advertisement exposure and by the advertisement outside the time window Exposure is divided into the advertisement exposure do not clicked on, so that the feelings that the training data substantial deviation that sampling obtains truly is distributed Condition, greatly improve the accuracy that ad click rate is estimated;Time window then is based on, is exposed by the history of multiple advertisements Data carry out classification annotation to each advertisement, avoid and directly regard as the advertisement exposure for not having to click within time window Also, so as to reduce the imbalance for estimating middle training data, further increased wide without situation about clicking on outside time window Accuse the accuracy that clicking rate is estimated;Meanwhile using semi-supervised supporting vector machine model and Logic Regression Models to classification annotation The advertisement labeled data of multiple advertisements afterwards is trained, and ensure that the accuracy of the classification information in data well, so as to It can accurately be estimated to predicting the clicking rate of advertisement, further, good number be provided to improve advertisement delivery effect According to reference frame.
Step S110:The time for exposure first that each advertisement is extracted from the history exposure data of multiple advertisements should with being directed to The lag time clicked on first between the time of advertisement is poor, and determines time window based on the lag time difference of multiple advertisements.
Wherein, the history exposure data of advertisement includes but is not limited to:
Time for exposure first;For the click time first of advertisement;Click on feedback information.
Specifically, time for exposure first of each advertisement and wide for this is extracted from the history exposure data of multiple advertisements The click time first accused, it can obtain the time for exposure first of each advertisement and between the click time first of the advertisement Lag time is poor, and determines time window based on the lag time difference of multiple advertisements.
For example, the time for exposure first of each advertisement is extracted from advertisement A, advertisement B and advertisement C history exposure data respectively With the click time first for each advertisement, the time for exposure first for obtaining advertisement A is " 2015-12-12 01:00:00 " and The time of click first for advertisement A is " 2015-12-12 01:01:00 ", advertisement B time for exposure first are " 2015-12- 12 01:10:00 " and the time of click first for advertisement B be " 2015-12-12 01:15:00 ", advertisement C exposure first Time is " 2015-12-12 21:20:00 " and the time of click first for advertisement C be " 2015-12-12 21:23:00 ", Then it can obtain advertisement A, advertisement B and advertisement C time for exposure first and between advertisement A, B and C time of click first Lag time difference be respectively 1 minute, 5 minutes and 3 minutes, then based on advertisement A, advertisement B and advertisement C lag time difference come Determine time window.
Preferably, step S110 includes step S111 (not shown)s and step S112 (not shown)s.Step S111:By calculate multiple advertisements lag time difference average value come estimate lag time difference desired value;Step S112:According to Time window is determined according to desired value.
For example, advertisement A, advertisement B and advertisement C lag time difference are respectively 1 minute, 5 minutes and 3 minutes, pass through calculating The average value of advertisement A, advertisement B and advertisement C lag time difference calculates (1+5+3)/3, when obtaining advertisement A, B and C hysteresis Between poor average value be 3 minutes, the desired value that can estimate lag time difference is 3 minutes, and the time is determined according to desired value 3 minutes Window is to the ad click period within subsequent 3 minutes from the time that advertisement exposes first.
Step S120:Based on time window, contingency table is carried out to each advertisement by the history exposure data of multiple advertisements Note.
Preferably, step S120 includes step S121 (not shown)s and step S122 (not shown)s.Step S121:Extract the click feedback information based on time window that the history exposure data of multiple advertisements includes;Step S122:Base Classification annotation is carried out to each advertisement in clicking on feedback information.
Wherein, classification annotation, including but not limited to scenario described below are carried out to each advertisement based on click feedback information:
Ad classification corresponding to the click feedback information for having click in time window is labeled as positive class data;
Ad classification corresponding to the click feedback information without click in time window is labeled as negative class data;
In time window no label data will be labeled as without the ad classification for clicking on feedback information.
Specifically, the click feedback information based on time window that the history exposure data of multiple advertisements includes is extracted, Based on feedback information is clicked on, ad classification corresponding to the click feedback information for having click in time window is labeled as positive class number According to, ad classification corresponding to the click feedback information without click in time window is labeled as negative class data, will be in time window It is intraoral to be labeled as no label data without the ad classification for clicking on feedback information.
For example, the click feedback letter based on respective time window that extraction advertisement A, B and C history exposure data include Breath, it is " having click " to obtain click feedback informations of the advertisement A in its time window, clicks of the advertisement B in its time window Feedback information is " no click on ", and click feedback informations of the advertisement C in its time window is " no to click on feedback ", based on advertisement A, B and C click feedback information, it is positive class data by advertisement A classification annotations, is negative class data by advertisement B classification annotations, by advertisement C classification annotations are without label data.
Step S130:Using semi-supervised supporting vector machine model and Logic Regression Models to multiple wide after classification annotation The advertisement labeled data of announcement is trained, to determine to be used for the prediction model for estimating ad click rate.
Wherein, advertisement mark packet is included but is not limited to:
Classification annotation information;
Advertisement correlated characteristic information, such as the text information of advertisement, the pictorial information of advertisement, the audio-frequency information, wide of advertisement Time etc. is checked in the click for accusing the push hobby of user, the sex of advertisement pushing user, advertisement pushing user.
Preferably, step S130 includes step S131 (not shown)s and step S132 (not shown)s.Step S131:Semi-supervised supporting vector machine model is trained using the advertisement labeled data of multiple advertisements after classification annotation, with true Fixed corresponding decision function;Step S132:Based on decision function, it is used to estimate advertisement point by training Logic Regression Models to determine Hit the prediction model of rate.
Wherein, semi-supervised supporting vector machine model such as formula (1):
W is the normal vector of obtained classification plane in formula (1), and b is the biasing of obtained classification plane, ξiObtain Classification plane has i-th the cost of the sample data misclassification of label;ζjIt is that j-th of unlabeled exemplars is divided into just by classifying face handle The wrong cost of class;δjIt is the wrong cost that j-th of unlabeled exemplars is divided into negative class by classifying face.
Wherein, decision function is f (x)=wTX+b, the function are a linear classification planes, and w is linear classification face Normal vector, b are the biasings in linear classification face.
Wherein, the model of logistic regression such as formula (2):
The functional value that f (x) is obtained for training data by semisupervised support vector machines in formula (2), P (y=1 | f (x)) To obtain estimating the probability for clicking on advertisement.
For example, the respective advertisement labeled data of advertisement A, B and C after classification annotation, include advertisement A classification annotation information Be negative class data for positive class data, advertisement B classification annotation information, advertisement C classification annotation information be no label data and The respective correlated characteristic information of advertisement A, B and C, the picture letter of text information, advertisement A such as the respective advertisement of advertisement A, B and C Breath, the click of the audio-frequency information of advertisement, the hobby of advertisement pushing user, the sex of advertisement pushing user, advertisement pushing user are looked into Time etc. is seen, semi-supervised SVMs is trained using the respective advertisement labeled data of advertisement A, B and C after classification annotation Model such as formula (1), to determine corresponding decision function f (x)=wTX+b, then based on decision function, by training logic to return Return model such as formula (2) to determine the prediction model for estimating ad click rate, can then obtain estimating the probability for clicking on advertisement.
In a preferred embodiment, this method also includes, and based on the prediction model of obtained ad click rate, works as needs When being estimated to a certain new advertisement D clicking rate, according to advertisement D labeled data, by training estimating for ad click rate Model be can obtain advertisement D estimate clicking rate.
Fig. 2 is the structural representation of the device of the prediction model of the determination ad click rate of another embodiment in the present invention.
Determining module 210 extracts the time for exposure first of each advertisement with being directed to from the history exposure data of multiple advertisements The lag time clicked on first between the time of the advertisement is poor, and determines time window based on the lag time difference of multiple advertisements Mouthful;Based on time window, classification annotation module 220 carries out contingency table by the history exposure data of multiple advertisements to each advertisement Note;Training module 230 is using semi-supervised supporting vector machine model with Logic Regression Models to multiple advertisements after classification annotation Advertisement labeled data be trained, to determine to be used for estimate the prediction model of ad click rate.
In embodiments of the invention, it is proposed that a kind of scheme for the prediction model for determining ad click rate, from multiple advertisements History exposure data in extract each advertisement time for exposure first and for the advertisement first click on the time between it is stagnant Time difference afterwards, and time window is determined based on the lag time difference of multiple advertisements, avoid in the time window artificially set In because of lag time caused by the feedback of ad click and the feedback of advertisement exposure and by the advertisement outside the time window Exposure is divided into the advertisement exposure do not clicked on, so that the feelings that the training data substantial deviation that sampling obtains truly is distributed Condition, greatly improve the accuracy that ad click rate is estimated;Time window then is based on, is exposed by the history of multiple advertisements Data carry out classification annotation to each advertisement, avoid and directly regard as the advertisement exposure for not having to click within time window Also, so as to reduce the imbalance for estimating middle training data, further increased wide without situation about clicking on outside time window Accuse the accuracy that clicking rate is estimated;Meanwhile using semi-supervised supporting vector machine model and Logic Regression Models to classification annotation The advertisement labeled data of multiple advertisements afterwards is trained, and ensure that the accuracy of the classification information in data well, so as to It can accurately be estimated to predicting the clicking rate of advertisement, further, good number be provided to improve advertisement delivery effect According to reference frame.
Determining module 210 extracts the time for exposure first of each advertisement with being directed to from the history exposure data of multiple advertisements The lag time clicked on first between the time of the advertisement is poor, and determines time window based on the lag time difference of multiple advertisements Mouthful.
Wherein, the history exposure data of advertisement includes but is not limited to:
Time for exposure first;For the click time first of advertisement;Click on feedback information.
Specifically, time for exposure first of each advertisement and wide for this is extracted from the history exposure data of multiple advertisements The click time first accused, it can obtain the time for exposure first of each advertisement and between the click time first of the advertisement Lag time is poor, and determines time window based on the lag time difference of multiple advertisements.
For example, the time for exposure first of each advertisement is extracted from advertisement A, advertisement B and advertisement C history exposure data respectively With the click time first for each advertisement, the time for exposure first for obtaining advertisement A is " 2015-12-12 01:00:00 " and The time of click first for advertisement A is " 2015-12-12 01:01:00 ", advertisement B time for exposure first are " 2015-12- 12 01:10:00 " and the time of click first for advertisement B be " 2015-12-12 01:15:00 ", advertisement C exposure first Time is " 2015-12-12 21:20:00 " and the time of click first for advertisement C be " 2015-12-12 21:23:00 ", Then it can obtain advertisement A, advertisement B and advertisement C time for exposure first and between advertisement A, B and C time of click first Lag time difference be respectively 1 minute, 5 minutes and 3 minutes, then based on advertisement A, advertisement B and advertisement C lag time difference come Determine time window.
Preferably, determining module includes estimating unit (not shown) and determining unit (not shown).Estimate list Member by calculate multiple advertisements lag time difference average value come estimate lag time difference desired value;Determining unit is according to the phase Prestige value determines time window.
For example, advertisement A, advertisement B and advertisement C lag time difference are respectively 1 minute, 5 minutes and 3 minutes, pass through calculating The average value of advertisement A, advertisement B and advertisement C lag time difference calculates (1+5+3)/3, when obtaining advertisement A, B and C hysteresis Between poor average value be 3 minutes, the desired value that can estimate lag time difference is 3 minutes, and the time is determined according to desired value 3 minutes Window is to the ad click period within subsequent 3 minutes from the time that advertisement exposes first.
Based on time window, classification annotation module 220 is carried out by the history exposure data of multiple advertisements to each advertisement Classification annotation.
Preferably, classification annotation module includes extraction unit (not shown) and classification annotation unit (does not show in figure Go out).Extraction unit extracts the click feedback information based on time window that the history exposure data of multiple advertisements includes;Classification Mark unit and be based on clicking on feedback information to each advertisement progress classification annotation.
Wherein, classification annotation, including but not limited to scenario described below are carried out to each advertisement based on click feedback information:
Ad classification corresponding to the click feedback information for having click in time window is labeled as positive class data;
Ad classification corresponding to the click feedback information without click in time window is labeled as negative class data;
In time window no label data will be labeled as without the ad classification for clicking on feedback information.
Specifically, the click feedback information based on time window that the history exposure data of multiple advertisements includes is extracted, Based on feedback information is clicked on, ad classification corresponding to the click feedback information for having click in time window is labeled as positive class number According to, ad classification corresponding to the click feedback information without click in time window is labeled as negative class data, will be in time window It is intraoral to be labeled as no label data without the ad classification for clicking on feedback information.
For example, the click feedback letter based on respective time window that extraction advertisement A, B and C history exposure data include Breath, it is " having click " to obtain click feedback informations of the advertisement A in its time window, clicks of the advertisement B in its time window Feedback information is " no click on ", and click feedback informations of the advertisement C in its time window is " no to click on feedback ", based on advertisement A, B and C click feedback information, it is positive class data by advertisement A classification annotations, is negative class data by advertisement B classification annotations, by advertisement C classification annotations are without label data.
Training module 230 is using semi-supervised supporting vector machine model with Logic Regression Models to multiple after classification annotation The advertisement labeled data of advertisement is trained, to determine to be used for the prediction model for estimating ad click rate.
Wherein, advertisement mark packet is included but is not limited to:
Classification annotation information;
Advertisement correlated characteristic information, such as the text information of advertisement, the pictorial information of advertisement, the audio-frequency information, wide of advertisement Time etc. is checked in the click for accusing the push hobby of user, the sex of advertisement pushing user, advertisement pushing user.
Preferably, training module includes the first training unit (not shown) and the second training unit (does not show in figure Go out).First training unit trains semi-supervised supporting vector using the advertisement labeled data of multiple advertisements after classification annotation Machine model, to determine corresponding decision function;Based on decision function, the second training unit is by training Logic Regression Models to determine For estimating the prediction model of ad click rate.
Wherein, semi-supervised supporting vector machine model such as formula (1).
Wherein, the model of logistic regression such as formula (2).
For example, the respective advertisement labeled data of advertisement A, B and C after classification annotation, include advertisement A classification annotation information Be negative class data for positive class data, advertisement B classification annotation information, advertisement C classification annotation information be no label data and The respective correlated characteristic information of advertisement A, B and C, the picture letter of text information, advertisement A such as the respective advertisement of advertisement A, B and C Breath, the click of the audio-frequency information of advertisement, the hobby of advertisement pushing user, the sex of advertisement pushing user, advertisement pushing user are looked into Time etc. is seen, semi-supervised SVMs is trained using the respective advertisement labeled data of advertisement A, B and C after classification annotation Model such as formula (1), to determine corresponding decision function f (x)=wTX+b, then based on decision function, by training logic to return Return model such as formula (2) to determine the prediction model for estimating ad click rate, can then obtain estimating the probability for clicking on advertisement.
In a preferred embodiment, the prediction model based on obtained ad click rate, when needing to a certain new advertisement D Clicking rate when being estimated, according to advertisement D labeled data, by training the prediction model of ad click rate to can obtain extensively That accuses D estimates clicking rate.
Those skilled in the art of the present technique are appreciated that the present invention includes being related to for performing in operation described herein One or more equipment.These equipment can specially be designed and manufactured for required purpose, or can also be included general Known device in computer.These equipment have the computer program being stored in it, and these computer programs are optionally Activation or reconstruct.Such computer program can be stored in equipment (for example, computer) computer-readable recording medium or be stored in E-command and it is coupled to respectively in any kind of medium of bus suitable for storage, the computer-readable medium is included but not Be limited to any kind of disk (including floppy disk, hard disk, CD, CD-ROM and magneto-optic disk), ROM (Read-Only Memory, only Read memory), RAM (Random Access Memory, immediately memory), EPROM (Erasable Programmable Read-Only Memory, Erarable Programmable Read only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory, EEPROM), flash memory, magnetic card or light card Piece.It is, computer-readable recording medium includes storing or transmitting any Jie of information in the form of it can read by equipment (for example, computer) Matter.
Those skilled in the art of the present technique be appreciated that can with computer program instructions come realize these structure charts and/or The combination of each frame and these structure charts and/or the frame in block diagram and/or flow graph in block diagram and/or flow graph.This technology is led Field technique personnel be appreciated that these computer program instructions can be supplied to all-purpose computer, special purpose computer or other The processor of programmable data processing method is realized, so as to pass through the processing of computer or other programmable data processing methods Device performs the scheme specified in the frame of structure chart and/or block diagram and/or flow graph disclosed by the invention or multiple frames.
Those skilled in the art of the present technique are appreciated that in the various operations discussed in the present invention, method, flow Step, measure, scheme can be replaced, changed, combined or deleted.Further, it is each with having been discussed in the present invention Kind operation, method, other steps in flow, measure, scheme can also be replaced, changed, reset, decomposed, combined or deleted. Further, it is of the prior art to have and the step in the various operations disclosed in the present invention, method, flow, measure, scheme It can also be replaced, changed, reset, decomposed, combined or deleted.
Described above is only some embodiments of the present invention, it is noted that for the ordinary skill people of the art For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (14)

1. a kind of method for the prediction model for determining ad click rate, including:
The time for exposure first of each advertisement and the point first for the advertisement are extracted from the history exposure data of multiple advertisements The lag time hit between the time is poor, and determines time window based on the lag time difference of the multiple advertisement;
Based on the time window, classification annotation is carried out to each advertisement by the history exposure data of the multiple advertisement;
Utilize semi-supervised supporting vector machine model and advertisement of the Logic Regression Models to the multiple advertisement after classification annotation Labeled data is trained, to determine to be used for the prediction model for estimating ad click rate.
2. the method for the prediction model according to claim 1 for determining ad click rate, wherein, based on the multiple advertisement Lag time difference determine time window, specifically include:
The desired value of lag time difference is estimated by the average value for the lag time difference for calculating the multiple advertisement;
Time window is determined according to the desired value.
3. the method for the prediction model according to claim 1 or 2 for determining ad click rate, based on the time window, Classification annotation is carried out to each advertisement by the history exposure data of the multiple advertisement, specifically included:
Extract the click feedback information based on the time window that the history exposure data of the multiple advertisement includes;
Classification annotation is carried out to each advertisement based on the click feedback information.
4. the method for the prediction model according to claim 3 for determining ad click rate, wherein, based on click feedback information Classification annotation is carried out to each advertisement, including at least scenario described below:
Ad classification corresponding to the click feedback information that will have click in the time window is labeled as positive class data;
Ad classification corresponding to the click feedback information without click in the time window is labeled as negative class data;
In the time window no label data will be labeled as without the ad classification for clicking on feedback information.
5. the method for the prediction model according to claim 1 for determining ad click rate, wherein, utilize semi-supervised support Vector machine model is trained with Logic Regression Models to the advertisement labeled data of the multiple advertisement after classification annotation, with true The fixed prediction model for being used to estimate ad click rate, including:
Semi-supervised supporting vector machine model is trained using the advertisement labeled data of the multiple advertisement after classification annotation, with It is determined that corresponding decision function;
Based on the decision function, by training Logic Regression Models to determine the prediction model for estimating ad click rate.
6. the method for the prediction model according to claim 1 for determining ad click rate, wherein, the history of the advertisement exposes Light data comprises at least:
Time for exposure first;For the click time first of advertisement;Click on feedback information.
7. the method for the prediction model according to claim 1 for determining ad click rate, wherein, the advertisement labeled data Comprise at least:
Classification annotation information;Advertisement correlated characteristic information.
8. a kind of device for the prediction model for determining ad click rate, including:
Determining module, the time for exposure first for extracting each advertisement from the history exposure data of multiple advertisements should with being directed to The lag time clicked on first between the time of advertisement is poor, and determines time window based on the lag time difference of the multiple advertisement Mouthful;
Classification annotation module, for based on the time window, by the history exposure data of the multiple advertisement to each wide Accuse and carry out classification annotation;
Training module, for utilizing semi-supervised supporting vector machine model with Logic Regression Models to described more after classification annotation The advertisement labeled data of individual advertisement is trained, to determine to be used for the prediction model for estimating ad click rate.
9. the device of the prediction model according to claim 8 for determining ad click rate, wherein, the determining module is specific Including:
Unit is estimated, the average value for the lag time difference by calculating the multiple advertisement is poor to estimate the lag time Desired value;
Determining unit, for determining time window according to the desired value.
10. the device of the prediction model of determination ad click rate according to claim 8 or claim 9, the classification annotation module Specifically include:
Extraction unit, the click based on the time window that the history exposure data for extracting the multiple advertisement includes Feedback information;
Classification annotation unit, for carrying out classification annotation to each advertisement based on the click feedback information.
11. the device of the prediction model according to claim 10 for determining ad click rate, wherein, based on click feedback letter Breath carries out classification annotation to each advertisement, including at least scenario described below:
Ad classification corresponding to the click feedback information that will have click in the time window is labeled as positive class data;
Ad classification corresponding to the click feedback information without click in the time window is labeled as negative class data;
In the time window no label data will be labeled as without the ad classification for clicking on feedback information.
12. the device of the prediction model according to claim 8 for determining ad click rate, wherein, the training module bag Include:
First training unit, it is semi-supervised to train for the advertisement labeled data using the multiple advertisement after classification annotation Supporting vector machine model, to determine corresponding decision function;
Second training unit, for based on the decision function, being used to estimate advertisement point by training Logic Regression Models to determine Hit the prediction model of rate.
13. the device of the prediction model according to claim 8 for determining ad click rate, wherein, the history of the advertisement Exposure data comprises at least:
Time for exposure first;For the click time first of advertisement;Click on feedback information.
14. the device of the prediction model according to claim 8 for determining ad click rate, wherein, the advertisement marks number According to including at least:
Classification annotation information;Advertisement correlated characteristic information.
CN201610346009.7A 2016-05-23 2016-05-23 Determine the method and device of the prediction model of ad click rate Pending CN107423992A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610346009.7A CN107423992A (en) 2016-05-23 2016-05-23 Determine the method and device of the prediction model of ad click rate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610346009.7A CN107423992A (en) 2016-05-23 2016-05-23 Determine the method and device of the prediction model of ad click rate

Publications (1)

Publication Number Publication Date
CN107423992A true CN107423992A (en) 2017-12-01

Family

ID=60422302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610346009.7A Pending CN107423992A (en) 2016-05-23 2016-05-23 Determine the method and device of the prediction model of ad click rate

Country Status (1)

Country Link
CN (1) CN107423992A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111754251A (en) * 2019-03-29 2020-10-09 北京达佳互联信息技术有限公司 Advertisement putting method, device, server and storage medium
CN111882349A (en) * 2020-07-14 2020-11-03 腾讯科技(深圳)有限公司 Data processing method, device and storage medium
CN113344615A (en) * 2021-05-27 2021-09-03 上海数鸣人工智能科技有限公司 Marketing activity prediction method based on GBDT and DL fusion model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075908A (en) * 2006-11-08 2007-11-21 腾讯科技(深圳)有限公司 Method and system for accounting network click numbers
CN103345512A (en) * 2013-07-06 2013-10-09 北京品友互动信息技术有限公司 Online advertising click-through rate forecasting method and device based on user attribute

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075908A (en) * 2006-11-08 2007-11-21 腾讯科技(深圳)有限公司 Method and system for accounting network click numbers
CN103345512A (en) * 2013-07-06 2013-10-09 北京品友互动信息技术有限公司 Online advertising click-through rate forecasting method and device based on user attribute

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111754251A (en) * 2019-03-29 2020-10-09 北京达佳互联信息技术有限公司 Advertisement putting method, device, server and storage medium
CN111754251B (en) * 2019-03-29 2024-01-19 北京达佳互联信息技术有限公司 Advertisement putting method, advertisement putting device, server and storage medium
CN111882349A (en) * 2020-07-14 2020-11-03 腾讯科技(深圳)有限公司 Data processing method, device and storage medium
CN113344615A (en) * 2021-05-27 2021-09-03 上海数鸣人工智能科技有限公司 Marketing activity prediction method based on GBDT and DL fusion model
CN113344615B (en) * 2021-05-27 2023-12-05 上海数鸣人工智能科技有限公司 Marketing campaign prediction method based on GBDT and DL fusion model

Similar Documents

Publication Publication Date Title
US11659050B2 (en) Discovering signature of electronic social networks
Laws et al. Active learning with amazon mechanical turk
CN107818105B (en) Recommendation method of application program and server
US20090326947A1 (en) System and method for spoken topic or criterion recognition in digital media and contextual advertising
WO2015058558A1 (en) Question recommendation method, device and system
US8965867B2 (en) Measuring and altering topic influence on edited and unedited media
CN108985347A (en) Training method, the method and device of shop classification of disaggregated model
JP7271529B2 (en) Automated attribution modeling and measurement
CN102789449B (en) The method and apparatus that comment text is evaluated
US11631110B2 (en) Audience-based optimization of communication media
CN108829652A (en) A kind of picture labeling system based on crowdsourcing
US20140030681A1 (en) Activity-oriented Studying Method in an Online-to-offline Manner
JP2016540319A5 (en)
CN107423992A (en) Determine the method and device of the prediction model of ad click rate
CN104899335A (en) Method for performing sentiment classification on network public sentiment of information
US20190220903A1 (en) Audience-based optimization of communication media
Bhowmick et al. An agreement measure for determining inter-annotator reliability of human judgements on affective text
CN110472057B (en) Topic label generation method and device
US20190347296A1 (en) Method of recommending at least one skin care product to a user
CN103389981B (en) Network label automatic identification method and its system
CN105138572B (en) Method and device for acquiring relevance weight of user tag
Chaurasiya et al. Improving performance of product recommendations using user reviews
CN108228950A (en) A kind of information processing method and device
CN103345688A (en) Dual-feedback credit assessment system and method based on emotion and credit
US20200218740A1 (en) Data prioritization through relationship analysis mapping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20180125

Address after: 100044 Beijing city Haidian District Xizhimen Street No. 168 Tengda Building 29 room 01-07

Applicant after: Beijing Che Hui Interactive Advertising Co., Ltd.

Address before: 100044, Beijing, Haidian District Capital Stadium No. 6 South Road, New Century Hotel, office building ten, 3, D, E, F, G, H, J, units

Applicant before: BEIJING YICHE INTERNET INFORMATION TECHNOLOGY CO., LTD.

CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100044 Tengda Building, 168 Xizhimenwai Street, Haidian District, Beijing, 2101-2103 and 2105-2111 on the 21st floor

Applicant after: Beijing Chehui Technology Co., Ltd.

Address before: 100044 01-07, 29 story, Tengda tower, 168 west gate, Haidian District, Beijing.

Applicant before: Beijing Che Hui Interactive Advertising Co., Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171201