Background technology
At present, advertising message on website is thrown in exists various ways, such as net width advertisement, text link advertisement etc., initial internet advertising format is net width advertisement (banner), it is with GIF, JPG, the graphic file that the forms such as Flash are set up, be positioned in webpage, mostly be used for showing ad content, the language such as Java also can be used to make it produce interactivity simultaneously, expressive force is strengthened with inserter tools such as Shockwave, the advertisement of net width is divided into static state, dynamically, interactive three types, the advertisement of static network width is exactly on webpage, show the fixing picture of a width, the advertisement of Dynamic Networks width adopts the form of GIF89 usually, exactly sequence of images is linked up formation animation, the advertisement of most of Dynamic Networks width is made up of 2 to 20 frame pictures, by different pictures, the more information of viewer can be passed to, also be current topmost internet advertising format, the advertisement of interactive net width needs more directly mutual, viewer is allowed in the web advertisement, to insert data or selected by drop-down menu and choice box.Text link advertisement is the advertisement with Text Link, namely in the Web page of popular website, place the link of other websites that can directly access, by the access of popular website, attract the website of a part of flow clickthrough, this is that one is disturbed minimum to viewer, but comparatively effective internet advertising format.Based on above-mentioned dissimilar advertisement putting form and to the consideration of throwing in effect, advertisement putting person further provides demand fulfillment various dimensions, precisely and control the requirements such as the viewing frequency, thus derives new tolerance---the Nreach of a universal gradually website statistics.The abbreviation of ETL, Extraction-Transformation-Loading, Chinese is that data are extracted, change and loaded.Nreach is as a kind of routine data statistical measure of video website, and every day, the demand in different time interval became one of maximum demand, and data statistics result in the computation complexity of system, reduces the operational efficiency of server round different dimensions.
The present invention proposes a kind of computation rule based on map reduce (i.e. " mapping " and " abbreviation "), and process mass data, can generate various data result of several " various dimensions combinations ".Computing interval generates " stackable result of calculation " as data buffering layer, for later multiplexing, is convenient to the N reach data that calculating " current time " or " duration section " is as the criterion with certain index.This method both can calculate the N reach data result of multiple dimension combination, can do efficient speed-raising again to calculating accumulative Nreach program.
The present invention saves input file magnitude greatly, and then well reaches the object improving counting yield.In addition, the invention provides meet most of demand, stable, can the data result of increment be provided every day and decrease workload.
MapReduce described here is a kind of programming model, is mainly used in the concurrent operation of large-scale dataset (being greater than 1TB).Algorithm principle is: specify Map (mapping) function, one group of key-value pair is used for be mapped to one group of new key-value pair, specify concurrent Reduce (abbreviation) function, each being used for ensureing in the key-value pair of all mappings shares identical key group.
New tolerance Nreach is as the criterion with reach, user, certain dimension, and that " certain user " sees more than 1 time of advertisement under " certain dimension " is 1reach; See that certain advertisement is more than 2 times, so belong to 1reach and also belong to 2reach.By that analogy to Nreach, generally count on 20reach, later example is also be as the criterion with 20reach.Example: namely, the user watching advertisement once above advertisement under " certain dimension " is 1000 to the meaning of 1reach=1000.The meaning of 2reach=500 is the user watching advertisement more than twice advertisement under " certain dimension " is 500.The value of certain 2reach is less than or equal to 1reach certainly.Nreach is an important statistical measures of video industry, has wide significance.Usually can so as to observing the advertisement under a dimension for the arrival amount of user and the frequency.And advertiser wishes that advertisement can be rooted in the hearts of the people now, propose advertisement every day and a user is shown that being greater than more than three times just pays.The data result of Nreach effectively can add up the input effect of advertisement.Nreach data statistics is usually around a specific dimension.As an advertising contract.Again according to the whole period of the statistics execution of contract, according to: the dimensions such as date, website channel, user location, require that every day independent, or the Nreach data of a couple of days accumulation.
Summary of the invention
In view of problems of the prior art, the object of the present invention is to provide a kind of method reducing computation complexity when statistical information throws in the frequency, it comprises the steps: that step (1) user uses trigger trigger message to transfer; Step (2) CPU (central processing unit), transfers order to transferring unit transmission; Step (3) transfers the information that unit traversal information storage database obtains limit priority, and same priority exists transferring at random of multiple information; Step (4) writes corresponding daily record to daily record stored data base after transferring end, and daily record is the data message of different dimensions combination and corresponding value; Step (5) performs first time MapReduce to obtain the value and aggregate-value on the same day under the combination of a certain dimension according to log information; Step (6) performs second time MapReduce to calculate the Nreach data under described a certain dimension combination according to the result of first time Reduce.
Further, step (5) in the method and step (6) comprise further: first time Map, daily record the previous day is transferred as input by daily record stored data base, wherein Key1 is the combination of a certain dimension, the output valve of Value1 is two value v1, v2 of digit group type data type, first value is value on the same day, and second value is accumulated value; First time reduce, the value1 under cumulative identical key1, check the number of times of information for obtaining user under a certain dimension combination; Second time Map, described first time Reduce result is as the input of second time Map, Key2 is the combination of a certain dimension, Value2 is 40 values with 40 positional representations, what front 20 positions were put is reach data on the same day, and what rear 20 positions were put is accumulative reach data, if when v1 value is on the v1 location point of value2 on this v1 location point+1, time v1 is more than 20, as 20 process; When if v2 value is on the v2+20 location point of value2 on this v2+20 location point+1, v2 more than 20 time, as 40 process; Second time reduce, the result of second time Map is added up, N number of from, recurrence adds forward and to the 1st value, from the 40th, recurrence adds forward and to 40-N+1 value, wherein N is less than or equal to 20, finally obtains the value and the Nreach data of aggregate-value on the same day under the combination of a certain dimension respectively.
Further, when non-initial Time step rapid (5) and step (6) comprise further: first time Map be input as daily record the previous day or the previous day reduce result for the first time, exporting value1 value when being judged as daily record the previous day is 1,1, and exporting value1 value when being judged as first time the previous day reduce is 0,1.
Further, the dimension combination in described method refers to the various combination of contract, user, information classification, info-channel.
Further, the Nreach data in described method refer to that the information under the combination of a certain dimension arrives the frequency metric data of user.
In addition, present invention also offers a kind of system for reducing system-computed complexity, this system comprises: information storage database, stores the information of effective different priorities; Trigger, user transfers according to this trigger trigger message; CPU (central processing unit), transfers order to transferring unit transmission and write corresponding daily record to daily record stored data base after transferring end; Transfer unit, traversal information storage database obtains the information of limit priority, and same priority exists transferring at random of multiple information; Daily record stored data base, stores the data of different dimensions combination and corresponding value; Statistic unit, performs first time MapReduce to obtain the value and aggregate-value on the same day under the combination of a certain dimension according to log information; The Nreach data under described a certain dimension combination are calculated according to the result execution second time MapReduce of first time Reduce.
Of the present inventionly to have the following advantages: the present invention saves input file magnitude greatly, and then well reach the object improving counting yield.In addition, the invention provides meet most of demand, stable, the data result of increment can be provided every day and decrease workload, reduce computation complexity.
Embodiment
For making above-mentioned purpose of the present invention, feature and advantage become apparent more, and below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation:
As shown in Figure 1, the invention provides a kind of method for reducing system-computed complexity, it comprises the steps:
Step (1) user uses trigger trigger ad to transfer;
Step (2) CPU (central processing unit), transfers order to transferring unit transmission;
Step (3) transfers the advertising message that unit traversal ad storage database obtains limit priority, and same priority exists transferring at random of multiple advertising message;
Step (4) writes corresponding daily record to daily record stored data base after transferring end, and daily record is the data message of different dimensions combination and corresponding value;
Step (5) performs first time MapReduce to obtain the value and aggregate-value on the same day under the combination of a certain dimension according to log information;
Step (6) performs second time MapReduce to calculate the Nreach data under described a certain dimension combination according to the result of first time Reduce.
Present invention also offers a kind of system for reducing system-computed complexity, this system comprises:
Ad storage database, stores the advertising message of effective different priorities;
Trigger, user transfers according to this trigger trigger ad;
CPU (central processing unit), transfers order to transferring unit transmission and to daily record stored data base ad storage database after transferring end, stores the advertising message of effective different priorities;
Transfer unit, traversal ad storage database obtains the advertising message of limit priority, and same priority exists transferring at random of multiple advertising message;
Daily record stored data base, stores the data of different dimensions combination and corresponding value;
Statistic unit, performs first time MapReduce to obtain the value and aggregate-value on the same day under the combination of a certain dimension according to log information; The Nreach data under described a certain dimension combination are calculated according to the result execution second time MapReduce of first time Reduce.
Statistic procedure and statistic unit carry out with following means:
First in conjunction with the feature of Nreach demand.Usually all launch based on a kind of dimension, all comprise a same feature.For contract: be all continuous time.A contract is all from some day---some day terminates.Like this as cumulative data, can by the data on the cumulative data+same day of the previous day=from contract to whole cumulative datas on the same day.The optimization reduced in input file magnitude can be had like this---do not need contract to start all to run one time to all data files on the same day.
Example: on January 1st, 2012, user A1 has seen advertisement A2 bis-times, and so 1reach, 2reach user A contribute to once.On January 2nd, 2012, user A1 has seen again advertisement A2 mono-time, only needs to add one to 3reach.
In conjunction with the diversity of dimension in demand data and the consistency of dimension.Present invention employs map reduce algorithm.First, meet various dimensions, disposable loading file, repeatedly calculates.Because the magnitude of file is larger, few loading can reduce time cost; Secondly, we need accumulation, reusable intermediate result; Finally, the result of output can be the combination of multiple dimension, can distinguish.
Because relate to accumulative, working procedure is initially and later slightly different for the first time, illustrates respectively below:
Time initial:
First time Map: transfer daily record the previous day as input by daily record stored data base; Key1: the date, screen several dimension combination (inside is many-valued, such as contract, client, user profile certain classification, website channel ..., do not do here and illustrate one by one.Have how many combination, a daily record data will generate how many key1), dimension combination title (name is self-defined).Value1: two values.First value is value on the same day, and second value is accumulated value.It is all fixing that every bar record exports: output valve is the output valve of digit group type data type.Such as 1,1.
First time Reduce: the value1 under cumulative identical key1.The number of times under identical key1 can be obtained.Can be read as---under a certain dimension, user is seeing the number of times of advertisement, i.e. VV.
The result of first time Reduce, exports and preserves.
Second time Map: first time on the same day, Reduce result was as the input of second time Map.Key2: the date, screen several dimension combination, dimension combination title.Value2: with 40 values of 40 positional representations (what front 20 positions put is reach data on the same day, and what rear 20 positions were put is accumulative reach data), the same day, reach and accumulative reach was identical.The initial value of Value2 is that 40 positions are 0.Read key1 line by line.Two values of the value1 corresponding to Key1 are v1, v2 respectively.What v1 was corresponding is value on the same day; What v2 was corresponding is accumulated value.If when v1 value is on the v1 location point of value2 on this v1 location point+1.Time v1 is more than 20, as 20 process.If when v2 value is on the v2+20 location point of value2 on this v2+20 location point+1.Time v2 is more than 20, as 40 process.
Example: key1---2012-02-03, TV play (website channel), Beijing (user area), 8888 (contract id), dimension combination one.value1——5,5。So: key2---2012-02-03, TV play (website channel), Beijing (user area), 8888 (contract id), dimension combination one.value2——0、0、0、0、1、0、0、0、0、0、0、0、0、0、0、0、0、0、0、0、0、0、0、0、0、1、0、0、0、0、0、0、0、0、0、0、0、0、0、0。
As shown in the table:
Second time Reduce: the value2 under cumulative identical key2.The number of times under identical key2 can be obtained.I.e. front 20 values, from the 20th, recurrence adds forward and to the 1st value.Rear 20 values, from the 40th, recurrence adds forward and to the 21st value.) result is now 20reach data.N number of from the, recurrence adds forward and to the 1st value, and from the 40th, recurrence adds forward and to 40-N+1 value, wherein N is less than or equal to 20, finally obtains the Nreach data of a certain dimension combination value on the lower same day and accumulated value respectively.Namely under various dimensions combination is all Nreach data of the same day and accumulation.
Non-initial:
First time Map: be input as: daily record the previous day+the previous day is reduce result for the first time.Key1 is two kinds of situations, and the first situation is daily record the previous day: the date, screen several dimension combination, dimension combination title.The second situation is first time the previous day reduce result: the date, screen the combination of several dimension, dimension combination title.Value1: according to the difference of input file position, the daily record of the previous day or the result of first time the previous day reduce can be judged.Output valve is: two values.The same day is first value added up, and second value is aggregate-value.Every bar output valve is identical:
Output valve when being judged as daily record the previous day: 1,1
Be judged as the previous day first time reduce time output valve: 0,1
First time Reduce: the value1 under cumulative identical key1.Can be read as---one certain city day user see under the channel of website certain contract advertisement the same day/number of times of accumulation.The result of first time reduce, exports and preserves.
Second time Map: be input as first time reduce result.Key2: the date, screen several dimension combination, dimension combination title.Value2: with 40 values of 40 positional representations (what front 20 positions put is reach data on the same day, and what rear 20 positions were put is accumulative reach data).40 value initial values are all 0.Read key1 line by line.Two values of the value1 corresponding to Key1 are v1, v2 respectively.What v1 was corresponding is value on the same day; What v2 was corresponding is accumulated value.When if v1 value is on the v1 location point of value2 v1 location point on+1.Time v1 is more than 20, as 20 process.When if v2 value is on the v2+20 location point of value2 v2+20 location point on+1.Time v2 is more than 20, as 40 process.
Example: Key1_today---2012-02-03, TV play (website channel), Beijing (user area), why (user id), 8888 (contract id), dimension combination two.Value1_today---5,5Key1_accumulate---2012-02-03, TV play (website channel), Beijing (user area), why (user id), 8888 (contract id), dimension combination two.Value1_accumulate——0,1。
So: Key2---2012-02-03, TV play (website channel), Beijing (user area), why (user id), 8888 (contract id), dimension combination two.Value2——0、0、0、0、1、0、0、0、0、0、0、0、0、0、0、0、0、0、0、0、0、0、0、0、1、0、0、0、0、0、0、0、0、0、0、0、0、0、0、0。
As shown in the table:
Second time Reduce: do cumulative to the value under identical key, front 20 values, from the 20th, recurrence adds forward and to the 1st value.Rear 20 values, from the 40th, recurrence adds forward and to the 21st value.Result is now 20reach data.N number of from the, recurrence adds forward and to the 1st value, and from the 40th, recurrence adds forward and to 40-N+1 value, wherein N is less than or equal to 20, finally obtains the Nreach data of a certain dimension combination value on the lower same day and accumulated value respectively.Namely under various dimensions combination is all Nreach data of the same day and accumulation.
Be more than the detailed description of carrying out the preferred embodiments of the present invention, but those of ordinary skill in the art it should be appreciated that within the scope of the present invention, and guided by the spirit, various improvement, interpolation and replacement are all possible.These are all in the protection domain that claim of the present invention limits.