CN104133837B - A kind of internet information based on Distributed Calculation delivers channel optimization systems - Google Patents

A kind of internet information based on Distributed Calculation delivers channel optimization systems Download PDF

Info

Publication number
CN104133837B
CN104133837B CN201410289052.5A CN201410289052A CN104133837B CN 104133837 B CN104133837 B CN 104133837B CN 201410289052 A CN201410289052 A CN 201410289052A CN 104133837 B CN104133837 B CN 104133837B
Authority
CN
China
Prior art keywords
mrow
msub
msubsup
user
mtd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410289052.5A
Other languages
Chinese (zh)
Other versions
CN104133837A (en
Inventor
张娅
魏逸
王宇晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Media Intelligence Technology Co., Ltd.
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201410289052.5A priority Critical patent/CN104133837B/en
Publication of CN104133837A publication Critical patent/CN104133837A/en
Application granted granted Critical
Publication of CN104133837B publication Critical patent/CN104133837B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching

Abstract

Channel optimization systems are delivered the invention provides a kind of internet information based on Distributed Calculation, wherein data collection module collects user behavior;Data preprocessing module carries out data scrubbing, integrated, reduction, and the user behavior information being collected into is simplified, standardization;Training module is directed to the data in training set, and computing is iterated with class E M algorithms, obtains the parameter in the cumulative model of probability;The data in test set are brought into probability cumulative model again, so as to complete the contribution prediction and the prediction whether converted for user to each dispensing channel;The website high to contribution degree or type of recommendation enter row information dispensing, and are delivered to those users for being most likely to occur conversion.Meanwhile, the present invention carries out Distributed Calculation using Hadoop platform, and the complicated calculations for consuming ample resources are calculated by being distributed on multinode, it is achieved thereby that multi-node parallel is handled.

Description

A kind of internet information based on Distributed Calculation delivers channel optimization systems
Technical field
The present invention relates to Internet technical field, specifically a kind of internet information based on Distributed Calculation delivers canal Road optimizes system.
Background technology
In the past more than ten years, internet is able to explosive growth, increasing people's selection online is social, game, Shopping, internet information, which is recommended to also become, promotes one of product very good approach.This also allow each enterprise from It is middle to obtain substantial amounts of network data to track recommendation effect and rate of return on investment.
Information deliver channel contribution degree research purpose be exactly quantify different channels for user's conversion behavior influence it is big It is small.By quantifying the contribution degree of each channel, the value of different market channels can be compared, these market channels include electronics postal Part, Alliance marketing, display advertisement, search advertisements, Social Media etc., company can also determine future for not according to these data With the dynamics of investment for delivering channel, in the hope of obtaining the popular attention rate for information maximization with minimum cost.
In the prior art, the system of internet information dispensing channel contribution degree prediction generally has three kinds:1st, based on single source The internet information of attribution model delivers channel contribution degree forecasting system:In such a system model used will contribute all to An event in numerous events, is such as based on the system of last point hit method (last-click), based on first time point hit method (first-click) system etc..This system is considered as to be grossly inaccurate because it have ignored those in fact for Conversion behavior generates the event of influence.2nd, the internet information based on fraction attribution model delivers channel contribution degree prediction system System:Model used includes equal weight, customers' credit, U-shaped three kinds of modes of curve in such a system.Equal weight is exactly to give institute There is dispensing channel identical weight.Customers' credit is exactly, according to the effect delivered in the past, artificially to guess and assign different power Weight.U-shaped curve is to convert whole weights to conversion for the first time and last time, does not consider that average information delivers the shadow of effect Ring.Obviously, the convincingness of this system is also not enough, and in fact they are not also good for the Evaluated effect of contribution degree.3、 Internet information based on probability Distribution Model delivers channel contribution degree forecasting system:According to the information of user's mistake of interest to The influence of family conversion behavior, gives these channels different contribution degrees, and then the contribution degree to these channels is arranged, arranged Sequence, is assessed with completing to deliver channel contribution degree.The prediction that obvious this system is given is just more accurate, more rationally.
The content of the invention
In view of the shortcomings of the prior art, it is an object of the invention to provide a kind of internet information based on Distributed Calculation Channel optimization systems are delivered, optimizes the selection that information delivers channel by the navigation patterns of user, more accurately realizes interconnection Net information recommendation, meets user's request.
To achieve the above object, present invention employs following technical scheme:
The present invention provides a kind of internet information based on Distributed Calculation and delivers channel optimization systems, and the system includes: Data collection module, data preprocessing module, training module, information deliver channel contribution degree prediction module and conversion ratio prediction mould Block, wherein:
Data collection module, the module collects user behavior data by web server:By the user behavior being collected into point For two parts, part of records whole navigation patterns of certain user, another part have recorded the different channels of same information Access feature;
Data preprocessing module, the module be to server collect user behavior data cleared up, integrated, reduction, The user behavior information being collected into is simplified, standardization;
Training module, the input of the module is training set, and is iterated with class E-M algorithms computing, and iteration is tired to probability Plus the factor this two parameter convergence that the customer impact intensity factor in model and influence decay with the time, complete to the two parameters Parameter Estimation.
Information delivers channel contribution degree prediction module, and the input of the module is test set, builds information and delivers channel m contributions Degree, the affiliated web site or type for delivering channel m further according to each information is summed up, and draws each website and all types of contributions Degree;Finally according to each website and all types of contribution degrees, it is ranked up, is come from website in the top or type from high to low Enter row information push, obtained with this and preferably deliver effect;
Conversion ratio prediction module, the input of the module is test set, is scored using survival function to each user, The user for being most likely to occur conversion behavior is predicted, and internet information is pushed to this certain customers.
It is related to calculating section in Distributed Calculation based on Hadoop platform, all of above module, it is flat in Hadoop Platform is carried out, and complicated calculating is distributed on multiple nodes and carried out by we, realizes the parallel processing of multitask, reduces task Between wait so that resource allocation is more reasonable, and arithmetic speed is greatly enhanced.
Compared with prior art, the invention has the advantages that:
Internet information based on Distributed Calculation proposed by the invention delivers channel optimization systems, can greatly improve The accuracy of channel contribution degree prediction is delivered for information, so that the convenient maximally effective website of selection or type carry out impression information; And the user crowd for being most likely to occur conversion is have selected, makes information recommendation more targeted.Therefore, it is possible to minimum cost Exchange best recommendation effect for.In addition, the data processing of the present invention is all based on Hadoop platform, multiple computers are realized Parallel processing, greatly reduces the requirement for computer operational capability and internal memory when handling big data, meanwhile, greatly improve fortune Calculate speed.
Brief description of the drawings
By reading the detailed description made with reference to the following drawings to non-limiting example, further feature of the invention, Objects and advantages will become more apparent upon:
Fig. 1 is the information domestic model figure based on server in one embodiment of the invention;
Fig. 2 is the internet information dispensing channel optimization systems based on Distributed Calculation in one embodiment of the invention;
Fig. 3 is distributed computing framework figure in one embodiment of the invention;
Fig. 4 is the performance comparision figure of the system and existing system in one embodiment of the invention.
Embodiment
With reference to specific embodiment, the present invention is described in detail.Following examples will be helpful to the technology of this area Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that to the ordinary skill of this area For personnel, without departing from the inventive concept of the premise, various modifications and improvements can be made.These belong to the present invention Protection domain.
As shown in figure 1, being explicitly shown user profile receipts in the information domestic model figure based on server, figure in the present invention Collection, the formation of User profile, and the recommending module that the present invention is built all are stored in server, and are carried out by server Processing.And the client computer used in user is not responsible storage, processing user profile.
As shown in Fig. 2 the internet information dispensing channel optimization systems based on Distributed Calculation include in the present invention:
Data collection module, collects user behavior using web server, the user behavior being collected into is divided into two parts: Web page browsing message, Information message.Wherein, web page browsing message accounting whole navigation patterns of certain user, it can To reflect that this user browses the correlated characteristic of webpage;The access feature of the different channels of the same information of Information message accounting, It reflects the click history and feature that channel is delivered for information.
Data preprocessing module, data scrubbing, integrated, reduction are carried out to the user behavior data that server is collected.
Training module, input training intensive data, based on maximum likelihood estimate, computing is iterated with class E-M algorithms, So as to complete the parameter Estimation to the cumulative model of probability;
Information delivers channel contribution degree prediction module and conversion ratio prediction module, calls the parameter obtained from training set, Test data is brought into, so as to complete the prediction that information is delivered the prediction of channel contribution degree and whether converted to user.
As shown in figure 3, distributed computing framework figure shows the Distributed Calculation based on Hadoop platform in the present invention.Base Delivered in the internet information of Distributed Calculation in channel optimization systems and be related to calculating section in all modules, in Hadoop Platform is carried out, and complicated calculating is distributed on multinode and carried out by we, parallel processing is realized, so as to save substantial amounts of system System resource, and greatly accelerate arithmetic speed.
As shown in figure 4, the present embodiment, which provides a kind of internet information based on Distributed Calculation, delivers channel optimization systems, And be trained and test using True Data collection.The present embodiment chooses current internet information and delivers contribution degree prediction field fortune The system returned with the system most widely based on last point hit method and logic-based is compared.Test result indicates that, this Either in the degree of accuracy of contribution degree for predicting different channels, or in prediction user the standard of conversion behavior may occur for invention In exactness, above two kinds of systems are better than.The final present invention can also provide the preceding N user that is most likely to occur conversion behavior and most Effective information delivers channel.
The present embodiment is that methods described is applied into the optimization that information in internet delivers channel, and the system includes:
1st, data collection module
The module is based on web server, and whole navigation patterns of certain user are recorded using the method for behavior tracking;Adopt With the method for Web log mining, the access feature of the different channels of same information is recorded;The collection for user profile is completed, and will User profile is stored in web server.
2nd, data preprocessing module
The module carries out data scrubbing, integrated, reduction.Wherein, data scrubbing, which is mainly taken, ignores first ancestral and removal redundancy Method because in the data being collected into, the data proportion of void value is very small;Data integration is mainly unification The unit of collected data;Hough transformation is substantially carried out quantity stipulations, and the time of will click on is converted into model parameter, and finally Formed and deliver channel, time comprising ID, information and click on the data set in this four domains;Again by the part in this data set Extract, be used as training set;The data that remainder data is concentrated are used as test set.So far, the user profile of specification can be formed, It is also convenient for the application next for data.
3rd, training module
The module is responsible for being trained with the data in training set, completes the parameter Estimation to the cumulative model of probability.
The situation that training module is delivered according to actual information first makes hypothesis below:
(1) conversion of the information exhibition to user produces an influence power every time;
(2) information displaying decays to the influence power of the conversion of user with the time every time;
(3) same information is consistent with the rate of decay to the influence power of all users;
(4) influence power for the information that different channels are delivered can linear superposition;
(5) the instantaneous conversion probability of user is directly proportional to influence power.
Based on assumed above, training module can set up probability and add up model, i.e. user behavior conditional intensity function lambdau (t):
Wherein:Wherein:Remember user for set { 1 ..., U }, information channel for set { 1 ..., n }, it was observed that user's row For for set { C1,......,Cu, the structure of user u behavior record is WhereinIt is the information dispensing channel id of user's u ith behaviors,It is the time of user's u ith behaviors, xuIt is that user turns Change result (xu=1 represents user's conversion, xu=0 is anti-), l_u is the total degree of user's u behaviors, if user u is converted, tuTransformation time is represented, observing time window node is otherwise represented.α be different channels deliver information to customer impact intensity because Son, the factor that ω decays for influence with the time, k is that information delivers channel id, a_k, w_k difference representative information dispensing channel k The factor that influence intensity factor and influence decay with the time, Tu represents transformation time or observing time window node.
Then to represent user's conversion ratio, survival function S is set upu(t):
Then class EM algorithms are passed through:
Wherein E-step:
M-step:
Order It can obtain:
Training process can be completed.
4th, information delivers channel contribution degree prediction module
The module is responsible for bringing test set into the cumulative model of the probability for having completed training process into, obtains each different information and throws Put the contribution degree of channel.
The contribution degree that information delivers channel m can be written as:
The affiliated web site or type for delivering channel m further according to each information are summed up, and draw each website and all types of Contribution degree.Finally, choose the high website of contribution degree or type pushes to enter row information, to ensure that chooses pushes the efficient of channel Property.
5th, conversion ratio prediction module
The module is responsible for whether prediction user's conversion behavior can occur.The module utilizes 1-S (Tu) each user is carried out Scoring, sequence from low to high is then carried out to user's fraction, fraction highest top n user is selected, it is believed that they are possible Occurs the user of conversion behavior.Then, the user that conversion behavior can be occurred by being predicted to these enters row information push, pushes away information Recommend more targetedly, so as to improve push effect.
6th, the Distributed Calculation based on Hadoop platform.Data in data set are assigned to by programming multiple different Among mapper, a collection of intermediate result is formed<Key, value>, and reducer can be then handled intermediate result, will be had The item for having identical key is merged.Amalgamation result is finally obtained into the result α, ω of current iteration as output.This is tied again Fruit is re-entered in mapper as parameter, realizes the interative computation of parameter Estimation.So, just a complicated task is divided into Many more fine-grained subtasks.And these subtasks can be dispatched between idle processing node, make processing speed faster The more task of node processing, so as to avoid the slow node of processing speed from extending the deadline of whole task, carried with reaching The effect of high arithmetic speed.Meanwhile, it is capable to which the wait between avoiding task, saves system resource.
Implementation result
Above-mentioned technical proposal, uses real data set.
First, the present invention according to F1 fractions come the quality of assessment system.
Wherein, the method for F1 fractions is as follows:
Wherein, P is accuracy rate, is to call together equal to (predict the outcome be actually consistent ID numbers)/(the total ID numbers predicted the outcome) R The rate of returning, equal to (predict the outcome be actually consistent the ID numbers for having conversion)/(ID that test is concentrated with conversion is total).
In Fig. 4 (a), it is evident that the score of F1 fractions of the present invention is higher than last point hit method and logistic regression, This also just illustrates that the degree of accuracy of conversion behavior user in predicting may occur for preceding N for the present invention, be significantly larger than latter two system System.
Then, using accuracy rate as abscissa, recall rate is compared as ordinate to 3 kinds of systems.From Fig. 4 (b) as can be seen that in the case of identical recall rate, accuracy rate of the invention is higher than remaining two kinds of system in.More worth one It is mentioned that, when recall rate reaches 0.9 or so, unusual good of effect of the invention, that is to say, that nearly all covering Under conditions of data, practicality of the invention is extremely good.
The test more than is as can be seen that the internet information based on Distributed Calculation of the present invention delivers channel optimization system System, can effectively improve the degree of accuracy and user's conversion prediction accuracy that different information deliver the prediction of channel contribution degree, from And preferably show prediction effect, meet the demand of user.The data processing of the present invention is all based on Hadoop platform, realizes The parallel processing of multiple computers, greatly reduces the requirement for computer operational capability and internal memory when handling big data, meanwhile, pole Improve arithmetic speed greatly.
The specific embodiment of the present invention is described above.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can make various deformations or amendments within the scope of the claims, this not shadow Ring the substantive content of the present invention.

Claims (6)

1. a kind of internet information based on Distributed Calculation delivers channel optimization systems, the system includes:
Data collection module, the module collects user behavior data by web server:The user behavior being collected into is divided into two Part, part of records whole navigation patterns of certain user, another part have recorded the access of the different channels of same information Feature;
Data preprocessing module, the module is that the user behavior data that web server is collected is handled, and data are carried out first Cleaning, takes the method ignored first ancestral and remove redundancy;Data integration is then carried out, the unit to collected data is carried out Unitized processing;Hough transformation is finally carried out, the time of will click on is converted into model parameter, and ultimately forms comprising ID, letter Breath delivers channel, time and clicks on the data set in this four domains;The part in this data set is extracted again, training is used as Collection;The data that remainder data is concentrated are used as test set;
Training module, the module is iterated computing, user of the iteration into the cumulative model of probability with class E-M algorithms to training set The factor ω that influence intensity factor α and influence decay with the time restrains, so that parameter alpha is obtained, ω;
Information delivers channel contribution degree prediction module, and the input of the module is test set, is trained using training module come not With channel is on customer impact intensity factor α and influences the factor ω decayed with the time as model parameter, builds and deliver channel m tributes Degree of offering, sums up further according to each dispensing channel m affiliated web site or type, draws each website and all types of contribution degrees; Finally according to each website and all types of contribution degrees, it is ranked up, is entered from website in the top or type from high to low Row information is delivered, and is optimized internet information with this and is delivered effect;
Conversion ratio prediction module, the input of the module is test set, first, sets up survival function Su(t) 1-S, is then utilizedu(t) Scored to each user, predict the user most possibly converted, and to this certain customers' pushed information.
2. the internet information according to claim 1 based on Distributed Calculation delivers channel optimization systems, its feature exists In the data collection module records whole navigation patterns of certain user using the method for behavior tracking;Dug using daily record The method of pick, records the access feature of the different channels of same information, completes the collection for user profile, and by user profile It is stored in web server.
3. the internet information according to claim 1 based on Distributed Calculation delivers channel optimization systems, its feature exists In the training module sets up probability cumulative model, i.e. user behavior conditional intensity function lambdau(t):
<mrow> <msub> <mi>&amp;lambda;</mi> <mi>u</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>&amp;Sigma;</mi> <mrow> <msubsup> <mi>t</mi> <mi>i</mi> <mi>u</mi> </msubsup> <mo>&lt;</mo> <mi>t</mi> </mrow> </msub> <msub> <mi>&amp;alpha;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <msub> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <msub> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <mo>(</mo> <mrow> <mi>t</mi> <mo>-</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>t</mi> <mo>&lt;</mo> <msub> <mi>T</mi> <mi>u</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> <mi>w</mi> <mi>i</mi> <mi>s</mi> <mi>e</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>
Wherein:Remember user for set { 1 ..., U }, information channel for set { 1 ..., n }, it was observed that user behavior for set {C1,......,Cu, the structure of user u behavior record isWhereinIt is user u The information of ith behavior delivers channel id,It is the time of user's u ith behaviors, XuIt is user's conversion results, Xu=1 represents User converts, Xu=0 anti-;luIt is the total degree of user's u behaviors, if user u is converted, tuTransformation time is represented, otherwise Represent observing time window node;α be the information delivered of different channels to customer impact intensity factor, ω declines for influence with the time The factor subtracted, k is that information delivers channel id, αk、ωkRepresentative information is delivered channel k influence intensity factor and influenceed at any time respectively Between the factor that decays, TuRepresent transformation time or observing time window node;
To represent user's conversion ratio, survival function S is set upu(t), wherein:
<mrow> <msub> <mi>S</mi> <mi>u</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <msubsup> <mo>&amp;Integral;</mo> <mn>0</mn> <mi>t</mi> </msubsup> <msub> <mi>&amp;lambda;</mi> <mi>u</mi> </msub> <mo>(</mo> <mi>v</mi> <mo>)</mo> <mi>d</mi> <mi>v</mi> <mo>)</mo> </mrow> </mrow>
Then target equation once is optimized by class EM algorithm iterations, L (θ) is obtained maximum;
<mfenced open = "" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>L</mi> <mrow> <mo>(</mo> <mi>&amp;theta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>&amp;Sigma;</mi> <mrow> <msub> <mi>X</mi> <mi>u</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </munder> <mrow> <mo>(</mo> <mi>log</mi> <mo>(</mo> <mrow> <munder> <mi>&amp;Sigma;</mi> <mi>i</mi> </munder> <msub> <mi>&amp;alpha;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <msub> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <mi>exp</mi> <mrow> <mo>(</mo> <mrow> <mo>-</mo> <msub> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <mrow> <mo>(</mo> <mrow> <msub> <mi>T</mi> <mi>u</mi> </msub> <mo>-</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mo>)</mo> </mrow> </mrow> <mo>)</mo> </mrow> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <munder> <mi>&amp;Sigma;</mi> <mi>u</mi> </munder> <mrow> <mo>(</mo> <mo>-</mo> <munder> <mi>&amp;Sigma;</mi> <mi>i</mi> </munder> <msub> <mi>&amp;alpha;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <mo>(</mo> <mrow> <mn>1</mn> <mo>-</mo> <mi>exp</mi> <mrow> <mo>(</mo> <mrow> <mo>-</mo> <msub> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <mrow> <mo>(</mo> <mrow> <msub> <mi>T</mi> <mi>u</mi> </msub> <mo>-</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mo>)</mo> </mrow> </mrow> <mo>)</mo> </mrow> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced>
Wherein E-step:
<mrow> <msubsup> <mi>p</mi> <mi>i</mi> <mi>u</mi> </msubsup> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mfrac> <mrow> <msub> <mi>&amp;alpha;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <msub> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <msub> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <mo>(</mo> <mrow> <msub> <mi>T</mi> <mi>u</mi> </msub> <mo>-</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&amp;Sigma;</mi> <mrow> <msubsup> <mi>t</mi> <mi>i</mi> <mi>u</mi> </msubsup> <mo>&lt;</mo> <msub> <mi>T</mi> <mi>u</mi> </msub> </mrow> </msub> <msub> <mi>&amp;alpha;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <msub> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <msub> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <mo>(</mo> <mrow> <msub> <mi>T</mi> <mi>u</mi> </msub> <mo>-</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <msub> <mi>X</mi> <mi>u</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <msub> <mi>X</mi> <mi>u</mi> </msub> <mo>=</mo> <mn>0</mn> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>
M-step:
Order
<mrow> <msub> <mi>&amp;alpha;</mi> <mi>k</mi> </msub> <mo>=</mo> <mfrac> <mrow> <msub> <mi>&amp;Sigma;</mi> <mrow> <msub> <mi>X</mi> <mi>u</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </msub> <msub> <mi>&amp;Sigma;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> <mo>=</mo> <mi>k</mi> </mrow> </msub> <msubsup> <mi>p</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mrow> <msub> <mi>&amp;Sigma;</mi> <mi>u</mi> </msub> <msub> <mi>&amp;Sigma;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> <mo>=</mo> <mi>k</mi> </mrow> </msub> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>exp</mi> <mo>(</mo> <mrow> <mo>-</mo> <msubsup> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mrow> <msub> <mi>T</mi> <mi>u</mi> </msub> <mo>-</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mo>)</mo> </mrow> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
<mrow> <msub> <mi>&amp;omega;</mi> <mi>k</mi> </msub> <mo>=</mo> <mfrac> <mrow> <msub> <mi>&amp;Sigma;</mi> <mrow> <msub> <mi>X</mi> <mi>u</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </msub> <msub> <mi>&amp;Sigma;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> <mo>=</mo> <mi>k</mi> </mrow> </msub> <msubsup> <mi>p</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mrow> <msub> <mi>&amp;Sigma;</mi> <mi>u</mi> </msub> <msub> <mi>&amp;Sigma;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> <mo>=</mo> <mi>k</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mi>u</mi> </msubsup> <mo>(</mo> <mrow> <msub> <mi>T</mi> <mi>u</mi> </msub> <mo>-</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mo>)</mo> <mo>+</mo> <msubsup> <mi>&amp;alpha;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </msubsup> <mo>(</mo> <mrow> <msub> <mi>T</mi> <mi>u</mi> </msub> <mo>-</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mo>)</mo> <mi>exp</mi> <mo>(</mo> <mrow> <mo>-</mo> <msubsup> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mrow> <msub> <mi>T</mi> <mi>u</mi> </msub> <mo>-</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mo>)</mo> </mrow> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
Complete training process.
4. the internet information according to claim 3 based on Distributed Calculation delivers channel optimization systems, its feature exists In described information delivers channel contribution degree prediction module and is responsible for bringing test set into the cumulative mould of the probability for having completed training process into Type, obtains the contribution degree that each difference delivers channel, and the contribution degree for delivering channel m is written as:
<mrow> <mi>c</mi> <mi>o</mi> <mi>n</mi> <mi>t</mi> <mi>r</mi> <mi>i</mi> <mi>b</mi> <mi>u</mi> <mi>t</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>&amp;Sigma;</mi> <mrow> <msub> <mi>X</mi> <mi>u</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </msub> <msub> <mi>&amp;Sigma;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> <mo>=</mo> <mi>m</mi> </mrow> </msub> <mfrac> <mrow> <msub> <mi>&amp;alpha;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <msub> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <msub> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <mo>(</mo> <mrow> <msub> <mi>T</mi> <mi>u</mi> </msub> <mo>-</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&amp;Sigma;</mi> <mrow> <msubsup> <mi>t</mi> <mi>i</mi> <mi>u</mi> </msubsup> <mo>&lt;</mo> <msub> <mi>T</mi> <mi>u</mi> </msub> </mrow> </msub> <msub> <mi>&amp;alpha;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <msub> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <msub> <mi>&amp;omega;</mi> <mrow> <msubsup> <mi>ad</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> </msub> <mo>(</mo> <mrow> <msub> <mi>T</mi> <mi>u</mi> </msub> <mo>-</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mi>u</mi> </msubsup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>/</mo> <msub> <mi>&amp;Sigma;</mi> <mrow> <msub> <mi>X</mi> <mi>u</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </msub> <mn>1</mn> <mo>;</mo> </mrow>
Summed up further according to each dispensing channel m affiliated web site or type, draw each website and all types of contribution degrees, most Afterwards, choose the high website of contribution degree or type is delivered to enter row information, complete to deliver internet information the optimization of channel.
5. the internet information according to claim 1 based on Distributed Calculation delivers channel optimization systems, its feature exists In, the conversion ratio prediction module, user u conversion ratio can be:1-Su(Tu), then user's fraction is carried out from low to high Sequence, selects fraction highest top n user, it is believed that they are the user for being most likely to occur conversion behavior, TuRepresent conversion Time or observing time window node.
6. the internet information based on Distributed Calculation according to claim any one of 1-5 delivers channel optimization systems, Characterized in that, the internet information delivers Distributed Calculation of the channel optimization systems based on Hadoop platform, it is described to be based on One complicated task is divided into many more fine-grained subtasks by the Distributed Calculation of Hadoop platform, and these subtasks can Dispatched between idle processing node, the more tasks of the node processing for making processing speed faster.
CN201410289052.5A 2014-06-24 2014-06-24 A kind of internet information based on Distributed Calculation delivers channel optimization systems Active CN104133837B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410289052.5A CN104133837B (en) 2014-06-24 2014-06-24 A kind of internet information based on Distributed Calculation delivers channel optimization systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410289052.5A CN104133837B (en) 2014-06-24 2014-06-24 A kind of internet information based on Distributed Calculation delivers channel optimization systems

Publications (2)

Publication Number Publication Date
CN104133837A CN104133837A (en) 2014-11-05
CN104133837B true CN104133837B (en) 2017-10-31

Family

ID=51806515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410289052.5A Active CN104133837B (en) 2014-06-24 2014-06-24 A kind of internet information based on Distributed Calculation delivers channel optimization systems

Country Status (1)

Country Link
CN (1) CN104133837B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862547A (en) * 2017-11-03 2018-03-30 网易乐得科技有限公司 Launch appraisal procedure, medium, device and the computing device of channel

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104602182A (en) * 2015-02-12 2015-05-06 中国联合网络通信集团有限公司 Channel value acquiring method and channel value acquiring device
CN104835066A (en) * 2015-05-25 2015-08-12 北京京东尚科信息技术有限公司 Embarking channel selection method and system
CN105005944A (en) * 2015-07-10 2015-10-28 国家电网公司 Method and device for expanding optimal payment channel on the basis of big-data regression analysis algorithm
CN106933880B (en) 2015-12-31 2020-08-11 阿里巴巴集团控股有限公司 Label data leakage channel detection method and device
CN106296305A (en) * 2016-08-23 2017-01-04 上海海事大学 Electric business website real-time recommendation System and method under big data environment
CN106844178B (en) * 2017-01-22 2019-11-15 腾云天宇科技(北京)有限公司 Prediction is presented the method for information transferring rate, calculates equipment, server and system
CN106919692B (en) * 2017-03-07 2021-02-19 阿里巴巴(中国)有限公司 Method and device for pushing message
CN107153843B (en) * 2017-05-03 2020-07-10 西安电子科技大学 Ground settlement prediction system and method based on support vector machine
WO2018236285A1 (en) * 2017-06-22 2018-12-27 Neitas Pte. Ltd. Information processing device
CN107273521B (en) * 2017-06-22 2020-10-09 百度在线网络技术(北京)有限公司 Feed content quality evaluation method and device
CN109242521A (en) * 2017-07-11 2019-01-18 阿里巴巴集团控股有限公司 A kind of distribution method and device of predetermined consumption amount
CN107481062A (en) * 2017-08-21 2017-12-15 小草数语(北京)科技有限公司 The distribution method and device of advertisement putting budget
CN108241472B (en) * 2017-12-01 2021-03-12 北京大学 Big data processing method and system supporting locality expression function
CN108564398A (en) * 2018-03-29 2018-09-21 北京酷云互动科技有限公司 A kind of appraisal procedure, system and the storage medium of channel ad conversion rates
CN109190669A (en) * 2018-08-01 2019-01-11 新疆玖富万卡信息技术有限公司 A kind of intelligent recommendation method, electronic equipment and computer readable storage medium
CN111126614B (en) * 2018-11-01 2024-01-16 百度在线网络技术(北京)有限公司 Attribution method, attribution device and storage medium
CN110148014B (en) * 2019-04-24 2023-12-05 深圳市元征科技股份有限公司 Information processing method, information processing device, blockchain node equipment and storage medium
CN111582934A (en) * 2020-05-07 2020-08-25 北京点众科技股份有限公司 Method, terminal and storage medium for determining option contribution degree for purchasing electronic book
CN112348559A (en) * 2020-09-27 2021-02-09 北京淇瑀信息科技有限公司 Channel resource consumption optimization method and device and electronic equipment
CN112348556A (en) * 2020-09-27 2021-02-09 北京淇瑀信息科技有限公司 Channel resource consumption optimization method and device and electronic equipment
CN112200618B (en) * 2020-10-29 2022-05-17 度小满科技(北京)有限公司 Message channel attribution method, device and system
CN112668673B (en) * 2021-03-16 2021-06-08 腾讯科技(深圳)有限公司 Data preprocessing method and device, computer equipment and storage medium
CN113177174B (en) * 2021-05-21 2024-02-06 脸萌有限公司 Feature construction method, content display method and related device
CN116468470A (en) * 2023-04-24 2023-07-21 朴道征信有限公司 Evaluation method and device of user source channel, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567477A (en) * 2011-06-16 2012-07-11 北京亿赞普网络技术有限公司 Website value evaluation method and device
CN102956009A (en) * 2011-08-16 2013-03-06 阿里巴巴集团控股有限公司 Electronic commerce information recommending method and electronic commerce information recommending device on basis of user behaviors
CN103593380A (en) * 2013-08-06 2014-02-19 北京爱真心信息科技有限公司 Attractive user recommending platform of online dating site
CN103678647A (en) * 2013-12-20 2014-03-26 Tcl集团股份有限公司 Method and system for recommending information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8412796B2 (en) * 2009-07-31 2013-04-02 University College Dublin—National University of Ireland, Dublin Real time information feed processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567477A (en) * 2011-06-16 2012-07-11 北京亿赞普网络技术有限公司 Website value evaluation method and device
CN102956009A (en) * 2011-08-16 2013-03-06 阿里巴巴集团控股有限公司 Electronic commerce information recommending method and electronic commerce information recommending device on basis of user behaviors
CN103593380A (en) * 2013-08-06 2014-02-19 北京爱真心信息科技有限公司 Attractive user recommending platform of online dating site
CN103678647A (en) * 2013-12-20 2014-03-26 Tcl集团股份有限公司 Method and system for recommending information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Learning the Hotness of Information Diffusions with Multi-dimensional Hawkes Processes;Yi Wei等;《Agents and Data Mining Interaction》;20140301;全文 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862547A (en) * 2017-11-03 2018-03-30 网易乐得科技有限公司 Launch appraisal procedure, medium, device and the computing device of channel
CN107862547B (en) * 2017-11-03 2021-06-11 网易乐得科技有限公司 Evaluation method, medium and device of delivery channel and computing equipment

Also Published As

Publication number Publication date
CN104133837A (en) 2014-11-05

Similar Documents

Publication Publication Date Title
CN104133837B (en) A kind of internet information based on Distributed Calculation delivers channel optimization systems
Liu et al. A convolutional click prediction model
CN103886074B (en) Commercial product recommending system based on social media
Sun et al. A survey of models and algorithms for social influence analysis
Thomas et al. Modelling and assessment of critical risks in BOT road projects
Zhu et al. Online purchase decisions for tourism e-commerce
US20210042767A1 (en) Digital content prioritization to accelerate hyper-targeting
CN106803190A (en) A kind of ad personalization supplying system and method
US8732015B1 (en) Social media pricing engine
CN107665444A (en) A kind of web advertisement acute effect evaluation method and system based on the online behavior of user
CN103793537A (en) System for recommending individual music based on multi-dimensional time series analysis and achieving method of system
CN109582876A (en) Tourism industry user portrait building method, device and computer equipment
US20150026105A1 (en) Systems and method for determining influence of entities with respect to contexts
CN111461778B (en) Advertisement pushing method and device
US9858526B2 (en) Method and system using association rules to form custom lists of cookies
US20140257972A1 (en) Method, computer readable medium and system for determining true scores for a plurality of touchpoint encounters
Sánchez-Juárez et al. Identification of key productive sectors in the Mexican economy
CN109146166A (en) A kind of personal share based on the marking of investor&#39;s content of the discussions slumps prediction model
Du et al. Behavior profiling for mobile advertising
Hanapi et al. Developed a hybrid sliding window and GARCH model for forecasting of crude palm oil prices in Malaysia
Paul et al. Big Data Analytics for Marketing Intelligence
CN111091410B (en) Node embedding and user behavior characteristic combined net point sales prediction method
CN116150470A (en) Content recommendation method, device, apparatus, storage medium and program product
CN113362034A (en) Position recommendation method
Gupta et al. Optimizing display advertisements based on historic user trails

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20181017

Address after: 200240 No. 800, Dongchuan Road, Shanghai, Minhang District

Co-patentee after: Wang Yanfeng

Patentee after: Zhang Ya

Address before: 200240 No. 800, Dongchuan Road, Shanghai, Minhang District

Patentee before: Shanghai Jiao Tong University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20181116

Address after: Room 387, Building 333, Hongqiao Road, Xuhui District, Shanghai 200030

Patentee after: Shanghai Media Intelligence Technology Co., Ltd.

Address before: 200240 No. 800, Dongchuan Road, Shanghai, Minhang District

Co-patentee before: Wang Yanfeng

Patentee before: Zhang Ya

TR01 Transfer of patent right