CN103870671A - Method and device for extracting user sample from Cookies - Google Patents

Method and device for extracting user sample from Cookies Download PDF

Info

Publication number
CN103870671A
CN103870671A CN201210552981.1A CN201210552981A CN103870671A CN 103870671 A CN103870671 A CN 103870671A CN 201210552981 A CN201210552981 A CN 201210552981A CN 103870671 A CN103870671 A CN 103870671A
Authority
CN
China
Prior art keywords
cookie
probability distribution
behavior
sample
kinds
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210552981.1A
Other languages
Chinese (zh)
Other versions
CN103870671B (en
Inventor
陈家耀
欧阳佑
冯是聪
吴明辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING SIBOTU INFORMATION TECHNOLOGY Co Ltd
Original Assignee
BEIJING SIBOTU INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING SIBOTU INFORMATION TECHNOLOGY Co Ltd filed Critical BEIJING SIBOTU INFORMATION TECHNOLOGY Co Ltd
Priority to CN201210552981.1A priority Critical patent/CN103870671B/en
Publication of CN103870671A publication Critical patent/CN103870671A/en
Application granted granted Critical
Publication of CN103870671B publication Critical patent/CN103870671B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and a device for extracting a user sample from Cookies, and relates to a Cookie technology of the Internet. The disclosed method for extracting the user sample from Cookies comprises the following steps: determining the similarity among all Cookies, and clustering Cookies among which similarity is up to a set value into one type of Cookies; generating a sample individual and the access behaviors thereof specific to each type Cookies: counting the probability distribution of the browsing behavior of each type Cookies at every moment in real time, establishing a probability distribution model by using the probability distribution, randomly simulating the browsing behavior of a sample individual user according to the probability distribution model, and calculating the weight of type according to the number of Cookies of the type. The invention further discloses a device for extracting a user sample from Cookies. By adopting the technical scheme, the sample representativeness is ensured, long service life of the sample is ensured, continuous behaviors are realized, and sample behaviors can be maintained conveniently in an incremental updating way.

Description

A kind of method and device that extracts user's sample from Cookie
Technical field
The present invention relates to the Cookie technology of internet, be specifically related to a kind of method and device that extracts user's sample from Cookie.
Background technology
Nowadays, the Cookie technology of internet is widely applied.Website use Cookie, the access behavior of following the tracks of and recording Internet user, analysis user browse custom, thereby provide Data support for link structure optimization, relevant information propelling movement, Internet advertising input plan etc.For example, B product, by the analysis to Cookie behavior, all can be bought after the user of discovery 80% buys A product, so push the sales promotion information of B product to having bought the user of A product in certain electric business website.For another example, in the time advertising input plan, find by analysis, went 60% user of A website all can access B website, and 20% user who only had A website went to C website, in order to cover more crowd under identical advertising budget, choose the input scheme of simultaneously throwing in advertisement in A and C website.
And when to user behavior analysis, on the one hand because the quantity of whole Cookie is often very large, likely reach up to ten million even more than one hundred million; Some analysis tools and system are because needs carry out some complicated calculations on the other hand, and treatable data scale is little.So need to extract a part as user's sample from all Cookie, only sample be carried out to Treatment Analysis.
At present, a kind of method of extracting user's sample from whole Cookie is random choose.There is following shortcoming in random choose method:
1) because Cookie has life cycle, and life cycle is shorter, so directly extract Cookie as sample, can only serve as the portion sampling of Cookie, and can not work as the portion sampling of doing netizen.
2) life cycle of different Cookie is different, and the individual of sample of different life is put together relatively, and to calculate be irrational in fact.For example, we need to estimate with sample the access crowd's of A every day on January 1 to January 31, B, tri-websites of C degree of overlapping, and random choose Cookie out likely a part be only in survival in first half of the month, a part is only survived in the second half of the month.The data that estimate by such sample just have problem.
2) behavior of Cookie does not continue, and can not meet some systematic analysis demands.For example, system will be analyzed the incidence relation of " having bought certain brand article in A website " and " browsing certain web advertisement in 6 months before " two kinds of behaviors, if at this moment most of individual of sample is only no more than the bimestrial behavior of browsing, just cannot carry out such analysis.
A kind of method that also has the user's of extraction sample is to select sample from the sufficiently long Cookie of life cycle.For example need the degree of overlapping with the access crowd of sample estimation January 1 to January 31 A, B, tri-websites of C, cover randomly drawing sample the Cookie in whole January from the time-to-live.The following shortcoming of existence of this kind of method:
1) the long sample of life cycle, its behavior is distributed with may be inconsistent with population distribution, and the representativeness of sample can be poor.
2) some system need to be calculated the data of long period, and for example time span exceedes 6 months, and the time-to-live of most Cookie is no more than 6 months, and now wanting to extract the sample of life cycle energy coverage period will be very difficult.
Summary of the invention
Technical matters to be solved by this invention is, a kind of method of extracting user's sample from Cookie is provided, to ensure to extract the reliability of sample, and representativeness to netizen crowd.
In order to solve the problems of the technologies described above, the invention discloses a kind of method of extracting user's sample from Cookie, comprising:
Determine the similarity between all Cookie, it is a class Cookie that the Cookie that similarity is reached to setting value gathers;
For each class Cookie, generate respectively a individual of sample and access behavior thereof, all individual of samples that generate are formed to a sample, wherein, generate respectively in the following manner individual of sample and access behavior thereof that all kinds of Cookie are corresponding:
The probability distribution of the behavior of browsing in real-time statistics each moment of all kinds of Cookie, utilize described probability distribution to set up probability Distribution Model, go out individual of sample user's the behavior of browsing according to described probability Distribution Model stochastic simulation, and according to the number of such Cookie, calculate such weight.
Preferably, in said method, determine that the similarity between all Cookie refers to:
Calculate the similarity between all Cookie according to the behavior of browsing of all Cookie; Or
Calculate the similarity between all Cookie according to the information of all Cookie and the behavior of browsing.
Preferably, said method also comprises:
Fashionable when there being new Cookie to add, the similarity of definite Cookie newly adding, assigns to the Cookie newly adding in the Cookie of corresponding classification according to determined similarity;
Again simulation newly adds the individual of sample user's of this type of Cookie of Cookie the behavior of browsing.
Preferably, in said method, according to the number of such Cookie, the weight of calculating such refers to:
For all kinds of Cookie, the number of this type of Cookie that statistics was survived in every day simultaneously, the weight using added up maximal value as such Cookie.
Preferably, in said method, the probability distribution of the behavior of browsing in real-time statistics each moment of all kinds of Cookie refers to:
All kinds of Cookie of real-time statistics independently browse probability distribution on each website; Or
The joint probability distribution of all kinds of Cookie of real-time statistics on multiple websites.
Preferably, in said method, described probability Distribution Model is Bai Song distributed model.
The invention also discloses a kind of device that extracts user's sample from Cookie, comprising:
Taxon, determines the similarity between all Cookie, and the Cookie that similarity is reached to setting value is divided into a class Cookie;
The first storage unit, stores all kinds of Cookie and clustering information thereof that described computing unit is divided;
Statistic unit, the probability distribution of the behavior of browsing in each moment of all kinds of Cookie in the first storage unit described in real-time statistics;
Model construction device, all kinds of Cookie that the probability distribution of utilizing described statistic unit to add up for all kinds of Cookie is respectively in described the first storage unit set up probability Distribution Model, go out corresponding individual of sample user's the behavior of browsing according to the probability Distribution Model stochastic simulation of all kinds of Cookie;
The second storage unit, stores the corresponding individual of sample user's of all kinds of Cookie that described model construction device simulates the behavior of browsing, and according to the number of such Cookie, calculates such weight of storage;
Unit the 3rd, forms a sample by all individual of samples of described the second cell stores.
Preferably, in said apparatus, described taxon determines that the similarity between all Cookie refers to:
Calculate the similarity between all Cookie according to the behavior of browsing of all Cookie; Or
Calculate the similarity between all Cookie according to the information of all Cookie and the behavior of browsing.
Preferably, said apparatus also comprises track record unit, described track record unit, and the information of all Cookie of track record and browse behavior, fashionable when there being new Cookie to add, by the information of the Cookie newly adding with browse behavior and send to described taxon;
Described taxon, the similarity of definite Cookie newly adding, is divided into the Cookie newly adding in the Cookie of corresponding classification according to determined similarity;
The first storage unit, that class Cookie and clustering information thereof that the Cookie newly adding that renewal is stored divides;
Statistic unit, the probability distribution of the behavior of browsing in each moment of Cookie class that real-time statistics is upgraded;
Model construction device, utilizes probability distribution that Cookie class that described statistic unit is renewal is added up to upgrade such Cookie and sets up probability Distribution Model, goes out the behavior of browsing of sample of users according to the probability Distribution Model stochastic simulation of all kinds of Cookie;
The second storage unit, stores described model construction device and upgrades the corresponding individual of sample user's of all kinds of Cookie that simulate the behavior of browsing, and according to the number of such Cookie, calculate the weight of storing such.
Preferably, in said apparatus, according to the number of such Cookie, the weight of calculating such refers to:
For all kinds of Cookie, the number of this type of Cookie that statistics was survived in every day simultaneously, the weight using added up maximal value as such Cookie.
Preferably, in said apparatus, the probability distribution of the behavior of browsing in real-time statistics each moment of all kinds of Cookie refers to:
All kinds of Cookie of real-time statistics independently browse probability distribution on each website; Or
The joint probability distribution of all kinds of Cookie of real-time statistics on multiple websites.
Preferably, in said apparatus, described model construction device, the probability Distribution Model that the probability distribution of utilizing described statistic unit to add up for all kinds of Cookie is respectively all kinds of Cookie foundation is Bai Song distributed model.
Present techniques scheme has ensured representativeness of sample, has ensured sample life cycle long enough, has lasting behavior, and ensured sample behavior can incremental update to maintain easily.
Embodiment
Fig. 1 is the apparatus structure schematic diagram that extracts user's sample in the present embodiment from Cookie;
Fig. 2 is the apparatus structure schematic diagram that extracts user's sample in this preferred embodiment from Cookie.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in connection with accompanying drawing, technical solution of the present invention is described in further detail.It should be noted that, in the situation that not conflicting, the feature in the application's embodiment and embodiment can combine arbitrarily mutually.
Embodiment 1
The present embodiment provides a kind of method of extracting user's sample from Cookie, and it mainly carries out cluster to all Cookie, and Cookie higher similarity is assigned in same class, then for each class Cookie, generates a individual of sample and access behavior thereof.Particularly, the implementation procedure of the method is as follows:
Determine the similarity between all Cookie, it is a class Cookie that the Cookie that similarity is reached to setting value gathers;
For each class Cookie, generate respectively a individual of sample and access behavior thereof, the more all individual of samples that generate are formed to a sample, wherein, generate respectively in the following manner individual of sample and access behavior thereof that all kinds of Cookie are corresponding:
The probability distribution of the behavior of browsing in real-time statistics each moment of all kinds of Cookie, utilize described probability distribution to set up probability Distribution Model, go out individual of sample user's the behavior of browsing according to described probability Distribution Model stochastic simulation, and according to the number of such Cookie, calculate such weight.
It should be noted that, while determining the similarity between all Cookie, can calculate Cookie similarity according to browsing behavior.Further preferred, can also be by the information of Cookie, such as machine information, IP information etc. are also included in similarity calculating, calculate the similarity between all Cookie according to the information of all Cookie and the behavior of browsing.
And according to the number of all kinds of Cookie, while calculating the weight of such Cookie, can be for all kinds of Cookie, the number of this type of Cookie that statistics was survived in every day simultaneously, the weight using added up maximal value as such Cookie.
In addition, the probability distribution of the behavior of browsing in real-time statistics each moment of all kinds of Cookie can be that all kinds of Cookie of real-time statistics independently browse probability distribution on each website.Also can the joint probability distribution of all kinds of Cookie of real-time statistics on multiple websites.Probability Distribution Model can adopt different probability Distribution Model, such as Bai Song distributed model etc. as the case may be.
And determined in the manner described above after individual of sample that all kinds of Cookie are corresponding and corresponding weight, after all individual of samples are weighted and are processed, can obtain a sample.
Extracting after user's sample according to said method, also may have new Cookie adds, therefore, preferred version based on the above method, renewal to user's sample is proposed, in follow-up time, find that there is new Cookie add fashionable, according to the similarity of this Cookie, newly add Cookie to assign in the Cookie class the most similar to it this, more again simulate the individual of sample user's of such Cookie the behavior of browsing.Thereby ensure incremental update and the behavior continuation of sample behavior.And the specific implementation of again simulating the individual of sample user's of Cookie class the behavior of browsing can be referring to the method for extracting user's sample from Cookie described in the present embodiment.
Embodiment 2
The present embodiment is introduced a kind of device that extracts user's sample from Cookie, and its structure as shown in Figure 1, comprises taxon, the first storage unit, statistic unit, Model Construction device, the second storage unit and Unit the 3rd.
Taxon, determines the similarity between all Cookie, and the Cookie that similarity is reached to setting value is divided into a class Cookie;
Wherein, taxon is calculated the similarity between all Cookie according to the behavior of browsing of all Cookie; Or calculate the similarity between all Cookie according to the information of all Cookie and the behavior of browsing.
The first storage unit, all kinds of Cookie and clustering information thereof that storage computing unit is divided.
Statistic unit, the probability distribution of the behavior of browsing in each moment of all kinds of Cookie in real-time statistics the first storage unit;
Statistic unit, all kinds of Cookie of real-time statistics independently browse probability distribution on each website; Or the joint probability distribution of all kinds of Cookie of real-time statistics on multiple websites.
Model construction device, all kinds of Cookie that the probability distribution of utilizing statistic unit to add up for all kinds of Cookie is respectively in the first storage unit set up probability Distribution Model, go out individual of sample user's the behavior of browsing according to the probability Distribution Model stochastic simulation of all kinds of Cookie;
The probability Distribution Model that the probability distribution that model construction device is added up for all kinds of Cookie is respectively all kinds of Cookie foundation can be Bai Song distributed model.
The second storage unit, the individual of sample user's of all kinds of Cookie that memory model structure device simulates the behavior of browsing, and according to the number of such Cookie, calculate the weight of storing such;
Wherein, the second storage unit, for all kinds of Cookie, the number of this type of Cookie that statistics was survived in every day simultaneously, the weight using added up maximal value as such Cookie.
Unit the 3rd, forms a sample by all individual of samples of the second cell stores.
Particularly, in said apparatus, taxon can be stored the information of all Cookie and be browsed record, distribute unique ID to each Cookie, record OS Type and version, browser type and the version of each Cookie, the total degree of browsing, number of times, the browsing time etc. of browsing each website channel.Browse behavior according to Cookie again, calculate the similarity between two between Cookie, as shown in table 1.
Table 1
Visiting ID The ratio of A browses web sites The ratio of B browses web sites ...
Cookie1 10889560 4% 8% ...
Cookie2 10889561 10% 1% ...
Similarity between two Cookie equals the weighted sum of the similarity of two each browsing informations of Cookie.The weight of giving to browsing information 1 (ratio of browsing of website channel A) is w1, the weight of giving to browsing information 2 (ratio of browsing of website channel B) is w2, the like, the similarity of similarity=∑ Wi* browsing information i of Cookie1 and Cookie2.
In the present embodiment, Cookie higher similarity is polymerized to a class with clustering algorithm k-means, and clustering information is left in the first storage unit.Use clustering algorithm k-means to carry out cluster to each Cookie, choosing of k is consistent with the individual of sample quantity that finally will extract.As finally extracted the sample of a capacity 1,000,000, k is 1,000,000.In the first storage unit, store clustering information: classification 1 and belong to all Cookie ID of classification 1, classification 2 and belong to all CookieID of classification 2 ..., classification k and belong to all Cookie ID of classification k.
The second storage unit, while calculating the weight of corresponding individual of sample of each class, to each class Cookie, the Cookie number of survival when adding up in every day, with the maximum Cookie number of a day of the Cookie of surviving, as such weight, for example, the Cookie quantity that a certain class was survived February 1 is simultaneously maximum, is 1000, and such weight is 1000.
Model construction device, builds each moment of such individual of sample to browse the probability Distribution Model { P (x=n) } of each website channel, and the Cookie number/sample weights of behavior is browsed in P (x=n)=generation for n time.For example, to a certain class, in certain hour, the Cookie number of having accessed A website 1 time is 4% of sample weights, accessed 2 times be 7%, successively statistics, setting up such this hour of Cookie is { P (x=1)=0.04 to the probability Distribution Model of A website, P (x=2)=0.07 ... }.
Building after probability Distribution Model, to each class, produce a individual of sample, use the method for stochastic simulation, simulation produces individual of sample behavior: certain moment individual of sample browse some websites n time this this website of individual of sample of probability=this moment browse probability P (x=n).The sample behavior producing is left in storage unit 3.For example, according to class 1 probability Distribution Model { P (x=1)=0.04 to A website when the t, P (x=7)=0.07, ..., produce the random number in [0, a 1.0] interval, if random number drops on [0,0.04], in scope, think that such individual of sample browsed 1 time of A website in the time of t; If random number drops in [0.04,0.04+0.07] scope, such individual of sample has been browsed 2 times of A website in the time of t.The like.
Separately there are some preferred versions to propose, on the basis of said apparatus, increase track record unit:
Track record unit, the information of all Cookie of track record and browse behavior, fashionable when there being new Cookie to add, by the information of the Cookie newly adding with browse behavior and send to taxon.
Now, taxon is determined the similarity of the Cookie newly adding, the Cookie newly adding is divided in the Cookie of corresponding classification according to determined similarity;
The first storage unit, that class Cookie and clustering information thereof that the Cookie newly adding that renewal is stored divides;
Statistic unit, the probability distribution of the behavior of browsing in each moment of Cookie class that real-time statistics is upgraded;
Model construction device, utilizes probability distribution that Cookie class that described statistic unit is renewal is added up to upgrade such Cookie and sets up probability Distribution Model, goes out individual of sample user's the behavior of browsing according to the probability Distribution Model stochastic simulation of all kinds of Cookie;
The second storage unit, stores described model construction device and upgrades the individual of sample user's of all kinds of Cookie that simulate the behavior of browsing, and according to the number of such Cookie, calculate the weight of storing such.
Particularly, in the time need to generating the sample behavior of follow-up a period of time, from Cookie, extract the device of user's sample as shown in Figure 2, track record unit, store OS Type and version, browser type and the version of new Cookie, the total degree of browsing, number of times, the browsing time etc. of browsing each website channel; OS Type and version, browser type and the version of each Cookie of the individual of sample that storage has been extracted in the second storage unit, the total degree of browsing, number of times, the browsing time etc. of browsing each website channel.
For each new Cookie, taxon is calculated the similarity of new Cookie and existing individual of sample.The similarity of new Cookie and existing individual of sample equals the weighted sum of the similarity of each browsing information.
For each new Cookie, be referred to a sample class the highest with its similarity, and revise the classified information of sample class, leave in the first storage unit: classification 1 and belong to all Cookie ID (comprising new CookieID) of classification 1, classification 2 and belong to all Cookie ID (comprising new CookieID) of classification 2, ..., classification k and belong to all Cookie ID (comprising new CookieID) of classification k.
To each class, with all Cookie of surviving in such (comprising new Cookie and original Cookie), in model construction device, build the behavior probability distributed model in the new time period.For example, to a certain class, in certain hour, the Cookie that has accessed A website 1 time is 4% of individual of sample weight, accessed 2 times be individual of sample weight 7%, statistics successively, setting up such this hour of Cookie is { P (x=1)=0.04 to the probability Distribution Model of A website, P (x=7)=0.07 ... }, wherein P (x=n) browses the probability of n time.
To each class, produce individual of sample in the newly behavior of browsing of time period by the method simulation of stochastic simulation, leave in the second storage unit.For example, according to class 1 probability Distribution Model { P (x=1)=0.04 to A website when the t, P (x=7)=0.07, ..., produce the random number in [0, a 1.0] interval, if random number drops on [0,0.04], in scope, think that such corresponding individual of sample browsed 1 time of A website in the time of t; If random number drops in [0.04,0.04+0.07] scope, such corresponding individual of sample has been browsed 2 times of A website in the time of t.The like.
Finally, Unit the 3rd, reconstitutes a sample by all individual of samples that upgrade storage in the second storage unit.
One of ordinary skill in the art will appreciate that all or part of step in said method can carry out instruction related hardware by program and complete, described program can be stored in computer-readable recording medium, as ROM (read-only memory), disk or CD etc.Alternatively, all or part of step of above-described embodiment also can realize with one or more integrated circuit.Correspondingly, the each module/unit in above-described embodiment can adopt the form of hardware to realize, and also can adopt the form of software function module to realize.The application is not restricted to the combination of the hardware and software of any particular form.
The above, be only preferred embodiments of the present invention, is not intended to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any amendment of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (12)

1. a method of extracting user's sample from Cookie, is characterized in that, the method comprises:
Determine the similarity between all Cookie, it is a class Cookie that the Cookie that similarity is reached to setting value gathers;
For each class Cookie, generate respectively a individual of sample and access behavior thereof, all individual of samples that generate are formed to a sample, wherein, generate respectively in such a way individual of sample and access behavior thereof that all kinds of Cookie are corresponding:
The probability distribution of the behavior of browsing in real-time statistics each moment of all kinds of Cookie, utilize described probability distribution to set up probability Distribution Model, go out individual of sample user's the behavior of browsing according to described probability Distribution Model stochastic simulation, and according to the number of such Cookie, calculate such weight.
2. the method for claim 1, is characterized in that, determines that the similarity between all Cookie refers to:
Calculate the similarity between all Cookie according to the behavior of browsing of all Cookie; Or
Calculate the similarity between all Cookie according to the information of all Cookie and the behavior of browsing.
3. method as claimed in claim 1 or 2, is characterized in that, the method also comprises:
Fashionable when there being new Cookie to add, the similarity of definite Cookie newly adding, assigns to the Cookie newly adding in the Cookie of corresponding classification according to determined similarity;
Again simulation newly add this type of Cookie of Cookie corresponding individual of sample user's the behavior of browsing.
4. method as claimed in claim 1 or 2, is characterized in that, according to the number of such Cookie, the weight of calculating such refers to:
For all kinds of Cookie, the number of this type of Cookie that statistics was survived in every day simultaneously, the weight using added up maximal value as such Cookie.
5. method as claimed in claim 1 or 2, is characterized in that, the probability distribution of the behavior of browsing in real-time statistics each moment of all kinds of Cookie refers to:
All kinds of Cookie of real-time statistics independently browse probability distribution on each website; Or
The joint probability distribution of all kinds of Cookie of real-time statistics on multiple websites.
6. method as claimed in claim 5, is characterized in that,
Described probability Distribution Model is Bai Song distributed model.
7. a device that extracts user's sample from Cookie, is characterized in that, this device comprises:
Taxon, determines the similarity between all Cookie, and the Cookie that similarity is reached to setting value is divided into a class Cookie;
The first storage unit, stores all kinds of Cookie and clustering information thereof that described computing unit is divided;
Statistic unit, the probability distribution of the behavior of browsing in each moment of all kinds of Cookie in the first storage unit described in real-time statistics;
Model construction device, all kinds of Cookie that the probability distribution of utilizing described statistic unit to add up for all kinds of Cookie is respectively in described the first storage unit set up probability Distribution Model, go out individual of sample user's the behavior of browsing according to the probability Distribution Model stochastic simulation of all kinds of Cookie;
The second storage unit, stores the individual of sample user's of all kinds of Cookie that described model construction device simulates the behavior of browsing, and according to the number of such Cookie, calculates such weight of storage;
Unit the 3rd, forms a sample by all individual of samples of described the second cell stores.
8. device as claimed in claim 7, is characterized in that, described taxon determines that the similarity between all Cookie refers to:
Calculate the similarity between all Cookie according to the behavior of browsing of all Cookie; Or
Calculate the similarity between all Cookie according to the information of all Cookie and the behavior of browsing.
9. install as claimed in claim 7 or 8, it is characterized in that, this device also comprises track record unit:
Described track record unit, the information of all Cookie of track record and browse behavior, fashionable when there being new Cookie to add, by the information of the Cookie newly adding with browse behavior and send to described taxon;
Described taxon, the similarity of definite Cookie newly adding, is divided into the Cookie newly adding in the Cookie of corresponding classification according to determined similarity;
The first storage unit, that class Cookie and clustering information thereof that the Cookie newly adding that renewal is stored divides;
Statistic unit, the probability distribution of the behavior of browsing in each moment of Cookie class that real-time statistics is upgraded;
Model construction device, utilizes probability distribution that Cookie class that described statistic unit is renewal is added up to upgrade such Cookie and sets up probability Distribution Model, goes out individual of sample user's the behavior of browsing according to the probability Distribution Model stochastic simulation of all kinds of Cookie;
The second storage unit, stores described model construction device and upgrades the individual of sample user's of all kinds of Cookie that simulate the behavior of browsing, and according to the number of such Cookie, calculate the weight of storing such.
10. install as claimed in claim 7 or 8, it is characterized in that, according to the number of such Cookie, the weight of calculating such refers to:
For all kinds of Cookie, the number of this type of Cookie that statistics was survived in every day simultaneously, the weight using added up maximal value as such Cookie.
11. install as claimed in claim 7 or 8, it is characterized in that, the probability distribution of the behavior of browsing in real-time statistics each moment of all kinds of Cookie refers to:
All kinds of Cookie of real-time statistics independently browse probability distribution on each website; Or
The joint probability distribution of all kinds of Cookie of real-time statistics on multiple websites.
12. devices as claimed in claim 11, is characterized in that,
Described model construction device, the probability Distribution Model that the probability distribution of utilizing described statistic unit to add up for all kinds of Cookie is respectively all kinds of Cookie foundation is Bai Song distributed model.
CN201210552981.1A 2012-12-18 2012-12-18 A kind of method and device that user's sample is extracted from Cookie Active CN103870671B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210552981.1A CN103870671B (en) 2012-12-18 2012-12-18 A kind of method and device that user's sample is extracted from Cookie

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210552981.1A CN103870671B (en) 2012-12-18 2012-12-18 A kind of method and device that user's sample is extracted from Cookie

Publications (2)

Publication Number Publication Date
CN103870671A true CN103870671A (en) 2014-06-18
CN103870671B CN103870671B (en) 2017-05-31

Family

ID=50909198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210552981.1A Active CN103870671B (en) 2012-12-18 2012-12-18 A kind of method and device that user's sample is extracted from Cookie

Country Status (1)

Country Link
CN (1) CN103870671B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447148A (en) * 2015-11-26 2016-03-30 上海晶赞科技发展有限公司 Cookie identifier association method and apparatus
CN106157067A (en) * 2015-03-23 2016-11-23 北京思博途信息技术有限公司 A kind of method and apparatus promoting hotline service quality and assessment media advertisement effect
CN106295513A (en) * 2016-07-26 2017-01-04 中电海康集团有限公司 Demographic method based on residence time probability distribution and device
CN108255880A (en) * 2016-12-29 2018-07-06 北京国双科技有限公司 Data processing method and device
CN112104703A (en) * 2020-08-18 2020-12-18 厦门网宿有限公司 Cookie management method, intermediate node and webvpn system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7103616B1 (en) * 2003-02-19 2006-09-05 Veritas Operating Corporation Cookie-based directory name lookup cache for a cluster file system
CN101079063A (en) * 2007-06-25 2007-11-28 腾讯科技(深圳)有限公司 Method, system and apparatus for transmitting advertisement based on scene information
US20080243822A1 (en) * 2007-03-28 2008-10-02 Bruce Campbell System and method for associating a geographic location with an Internet protocol address
CN101685521A (en) * 2008-09-23 2010-03-31 北京搜狗科技发展有限公司 Method for showing advertisements in webpage and system
CN102103603A (en) * 2009-12-18 2011-06-22 百度在线网络技术(北京)有限公司 User behavior data analysis method and device
CN102681999A (en) * 2011-03-08 2012-09-19 阿里巴巴集团控股有限公司 Method and device for collecting and sending user action information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7103616B1 (en) * 2003-02-19 2006-09-05 Veritas Operating Corporation Cookie-based directory name lookup cache for a cluster file system
US20080243822A1 (en) * 2007-03-28 2008-10-02 Bruce Campbell System and method for associating a geographic location with an Internet protocol address
CN101079063A (en) * 2007-06-25 2007-11-28 腾讯科技(深圳)有限公司 Method, system and apparatus for transmitting advertisement based on scene information
CN101685521A (en) * 2008-09-23 2010-03-31 北京搜狗科技发展有限公司 Method for showing advertisements in webpage and system
CN102103603A (en) * 2009-12-18 2011-06-22 百度在线网络技术(北京)有限公司 User behavior data analysis method and device
CN102681999A (en) * 2011-03-08 2012-09-19 阿里巴巴集团控股有限公司 Method and device for collecting and sending user action information

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106157067A (en) * 2015-03-23 2016-11-23 北京思博途信息技术有限公司 A kind of method and apparatus promoting hotline service quality and assessment media advertisement effect
CN105447148A (en) * 2015-11-26 2016-03-30 上海晶赞科技发展有限公司 Cookie identifier association method and apparatus
CN105447148B (en) * 2015-11-26 2018-12-21 上海晶赞科技发展有限公司 A kind of Cookie mark correlating method and device
CN106295513A (en) * 2016-07-26 2017-01-04 中电海康集团有限公司 Demographic method based on residence time probability distribution and device
CN108255880A (en) * 2016-12-29 2018-07-06 北京国双科技有限公司 Data processing method and device
CN108255880B (en) * 2016-12-29 2021-08-17 北京国双科技有限公司 Data processing method and device
CN112104703A (en) * 2020-08-18 2020-12-18 厦门网宿有限公司 Cookie management method, intermediate node and webvpn system

Also Published As

Publication number Publication date
CN103870671B (en) 2017-05-31

Similar Documents

Publication Publication Date Title
CN110400169B (en) Information pushing method, device and equipment
CN108521439B (en) Message pushing method and device
CA2700030C (en) Touchpoint customization system
CN103443781B (en) Data deliver
CN106528693A (en) Individualized learning-oriented educational resource recommendation method and system
Tirado et al. Predictive data grouping and placement for cloud-based elastic server infrastructures
CN105898209A (en) Video platform monitoring and analyzing system
CN103329151A (en) Recommendations based on topic clusters
CN105045916A (en) Mobile game recommendation system and recommendation method thereof
CN106557513A (en) Event information method for pushing and event information pusher
CN103870671A (en) Method and device for extracting user sample from Cookies
CN104394118A (en) User identity identification method and system
CN104967607A (en) Information processing method, terminal and server
CN104462593A (en) Method and device for providing user personalized resource message pushing
CN102831114B (en) Realize method and the device of internet user access Statistic Analysis
CN102811371A (en) Method, system and device for recommending intelligent television application program
CN103001796A (en) Method and device for processing weblog data by server
CN104065672A (en) Advertisement pushing method, client and advertisement pushing system
CN104462594A (en) Method and device for providing user personalized resource message pushing
US20110225288A1 (en) Method and system for efficient storage and retrieval of analytics data
CN107145556B (en) Universal distributed acquisition system
CN105653545A (en) Method and device for providing service object information in page
CN104598518A (en) Content pushing method and device
CN104991898A (en) Processing method and apparatus for pushing information
CN103530304B (en) On-line recommendation method, system and mobile terminal based on self-adaption distributed computation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100102 Beijing, Chaoyang District Fu Tong East Street, building 1, room 5, room 321008

Applicant after: The second hand information technology Co. Ltd.

Address before: 100012 Chaoyang District, Beiyuan Road, No. 32, a security building, No. 1, A District, Room 202, room two

Applicant before: Beijing Sibotu Information Technology Co., Ltd.

GR01 Patent grant
GR01 Patent grant