CN103051637A - User identification method and device - Google Patents

User identification method and device Download PDF

Info

Publication number
CN103051637A
CN103051637A CN2012105932268A CN201210593226A CN103051637A CN 103051637 A CN103051637 A CN 103051637A CN 2012105932268 A CN2012105932268 A CN 2012105932268A CN 201210593226 A CN201210593226 A CN 201210593226A CN 103051637 A CN103051637 A CN 103051637A
Authority
CN
China
Prior art keywords
user
cookie
websites
information
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012105932268A
Other languages
Chinese (zh)
Inventor
罗峰
黄苏支
李娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING IZP TECHNOLOGIES Co Ltd
Original Assignee
BEIJING IZP TECHNOLOGIES Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING IZP TECHNOLOGIES Co Ltd filed Critical BEIJING IZP TECHNOLOGIES Co Ltd
Priority to CN2012105932268A priority Critical patent/CN103051637A/en
Publication of CN103051637A publication Critical patent/CN103051637A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a user identification method and device, wherein the user identification method comprises the steps of: selecting messages from network access log messages, wherein the messages are same with a user identification within a set time period and are in one-to-one correspondence to website COOKIEs of a set website; obtaining tetrad information from the obtained messages, wherein the tetrad information comprises domain names of user access websites indicated by the user identification, the user identification, COOKIE fields of the user access websites, and values of the COOKITE fields; performing statistics on the tetrad information to obtain access information of each user access website; filtering the access information of each user access website to obtain the COOKIE filed of the identification user of each user access website; and establishing the correspondence of the obtained COOKIE field and the user identification, and identifying the user according to the correspondence. According to the invention, the effect of accurately identifying the user is achieved.

Description

User identification method and device
Technical field
The present invention relates to networking technology area, particularly relate to a kind of user identification method and device.
Background technology
At present along with Internet technology use more and more extensive, the routine work that people are a lot of and amusement are all carried out at network.Under a lot of network application scenes, the user is when accesses network, and server need to be identified the user.The user identification method that comparatively extensively adopts comprises by the IP Address Recognition, by the identification of ADSL account and the COOKIE identification by the website etc.
Wherein, during by IP Address Recognition user, because IP resource-constrained, and the Internet user is more and more, present broadband user generally uses dynamic IP, also takies valuable IP resource to avoid the user not surf the Net, like this, just cause same IP address to be used by a plurality of different users, can't accurately identify the user.
During by ADSL account identification user, the browser version that usually uses with UA(user) combine, but the granularity of this mode by ADSL+UA is too thick, a plurality of users of ADSL correspondence can occur, is difficult to equally accurately determine the user.
When the COOKIE by the website identifies the user, the behavior of COOKIE Technical Follow-Up user under this website adopted in the website, yet each website can only be followed the tracks of the user in this website or be embedded the access behavior of the third party website of this website COOKIE, can't follow the tracks of the user behavior of whole the Internet, thereby also can't accurately identify the user.
As seen, no matter above-mentioned which kind of situation, all can't reach accurate identification user, and only have server accurately identify customer end and user, just can carry out follow-up high precision manipulation, throw in advertisement etc. such as high accuracy, to lower information interaction cost and information interaction amount, promote the user access of network is experienced.
Summary of the invention
The invention provides a kind of user identification method and device, can't accurately identify user's problem to solve existing scheme.
In order to address the above problem, the invention discloses a kind of user identification method, comprising: from access to netwoks daily record message, obtain in the setting-up time section user ID identical, and described user ID and the website COOKIE that sets website message one to one; From the described message that obtains, obtain quaternary group information, wherein, described quaternary group information comprises domain name, described user ID, the COOKIE field of described user's access websites and the value of described COOKIE field of user's access websites of described user ID indication; Described quaternary group information is added up, obtained the visit information of each described user's access websites; Visit information to described each user's access websites filters, and obtains the COOKIE field that described each user's access websites identifies described user; The described COOKIE field that foundation is obtained and the corresponding relation of described user ID are identified described user according to described corresponding relation.
Preferably, described user ID comprises user account and browser version number; The visit information of described user's access websites comprises: the domain name of described user's access websites, the page browsing amount of domain name, the page browsing amount accounting of domain name, the COOKIE field of described user's access websites, the number of page views that user ID is identical, the number of page views that user ID is different, the number of page views ratio that described user ID is different, independent visitor's number of times that user ID is identical, independent visitor's number of times that user ID is different, independent visitor's ratio that described user ID is different.
Preferably, before the described step that the visit information of described each user's access websites is filtered, also comprise: sort according to the visit information of the different independent visitor's ratio of the different number of page views ratio of described user ID and/or described user ID to described each user's access websites.
Preferably, described visit information to described each user's access websites filters, obtaining the step that described each user's access websites identifies described user's COOKIE field comprises: the page browsing amount or the mutual information that use domain name, perhaps information gain, visit information to described each user's access websites filters, and obtains the COOKIE field that described each user's access websites identifies described alone family.
Preferably, described user identification method also comprises: according to the described message that obtains, obtain the website access information of two identical websites of COOKIE name, wherein, described website access information comprises: the COOKIE field of described two websites, the value of described COOKIE field, the domain name of described two websites, the number of page views that user ID is identical, the number of page views that user ID is different, the number of page views ratio that described user ID is different, independent visitor's number of times that user ID is identical, independent visitor's number of times that user ID is different, independent visitor's ratio that described user ID is different; Sort according to the visit information of the different independent visitor's ratio of the different number of page views ratio of described user ID and/or described user ID to described two websites; Described visit information after the ordering is filtered, determine whether described two websites use identical COOKIE field; If the association of then setting up described two websites is identified described user according to the corresponding relation of described association and described COOKIE field and described user ID.
Preferably, described user identification method also comprises: if comprise the value of a plurality of COOKIE fields for the described COOKIE field that identifies described user, then carry out association between the value of described a plurality of COOKIE fields; Corresponding relation according to described association and described COOKIE field and described user ID is identified described user.
In order to address the above problem, the invention also discloses a kind of customer identification device, comprise: the first acquisition module, be used for obtaining in the setting-up time section user ID from access to netwoks daily record message identical, and described user ID and the website COOKIE that sets website message one to one; The second acquisition module, be used for obtaining quaternary group information from the described message that obtains, wherein, described quaternary group information comprises domain name, described user ID, the COOKIE field of described user's access websites and the value of described COOKIE field of user's access websites of described user ID indication; The 3rd acquisition module is used for described quaternary group information is added up, and obtains the visit information of each described user's access websites; The 4th acquisition module is used for the visit information of described each user's access websites is filtered, and obtains the COOKIE field that described each user's access websites identifies described user; Identification module be used for to be set up the described COOKIE field obtained and the corresponding relation of described user ID, identifies described user according to described corresponding relation.
Preferably, described user ID comprises user account and browser version number; The visit information of described user's access websites comprises: the domain name of described user's access websites, the page browsing amount of domain name, the page browsing amount accounting of domain name, the COOKIE field of described user's access websites, the number of page views that user ID is identical, the number of page views that user ID is different, the number of page views ratio that described user ID is different, independent visitor's number of times that user ID is identical, independent visitor's number of times that user ID is different, independent visitor's ratio that described user ID is different.
Preferably, described customer identification device also comprises: order module, be used for before described the 4th acquisition module filters the visit information of described each user's access websites, sort according to the visit information of the different independent visitor's ratio of the different number of page views ratio of described user ID and/or described user ID to described each user's access websites.
Preferably, described the 4th acquisition module is for the page browsing amount or the mutual information that use domain name, perhaps information gain, visit information to described each user's access websites filters, and obtains the COOKIE field that described each user's access websites identifies described alone family.
Compared with prior art, the present invention has the following advantages:
Among the present invention, the first COOKIE and the one-to-one relationship of user ID by determine setting the website, thereby the user who determines this user ID sign unique user whether, and then obtain the message of this unique user.
Wherein, set the website and be generally the larger website of visit capacity, its COOKIE is known and unique, can be according to these websites and the user ID corresponding user who determines this user ID sign alone family whether one by one whether.
Determining that the user that user ID indicates is in the situation at alone family, access to netwoks daily record message to institute's access websites of this user carries out the processing such as a series of extraction, statistics and filtration, thereby the website that obtains user access can be used for unique identification user's COOKIE field, and then set up the corresponding relation of this COOKIE field and user ID, in follow-up access, the website can be according to this corresponding relation identification user.
Because include a large amount of information in the access to netwoks daily record message, wherein in fact the information of some COOKIE field can be used as identity information and use, the solution of the present invention is according to these characteristics of COOKIE information, can from a large amount of information, parse as the COOKIE field of identity information to automation, then which COOKIE field can the unique identification user identity in each website of Analysis deterrmination, set up the corresponding relation of these COOKIE fields and user ID, accurately identify the user with this corresponding relation.
By the present invention, solve existing scheme and can't accurately identify user's problem, reached accurate identification user's effect, and then, follow-up high precision manipulation can be carried out according to this high-precision recognition result in the website, throw in advertisement etc. such as high accuracy, lowered information interaction cost and information interaction amount, promoted the user access of network is experienced.
Description of drawings
Fig. 1 is the flow chart of steps according to a kind of user identification method of the embodiment of the invention one;
Fig. 2 is the flow chart of steps according to a kind of user identification method of the embodiment of the invention two;
Fig. 3 is the flow chart of steps according to a kind of user identification method of the embodiment of the invention three;
Fig. 4 is the structured flowchart according to a kind of customer identification device of the embodiment of the invention four.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
Embodiment one
With reference to Fig. 1, show the flow chart of steps according to a kind of user identification method of the embodiment of the invention one.
The user identification method of the present embodiment may further comprise the steps:
Step S102: from access to netwoks daily record message, obtain in the setting-up time section user ID identical, and user ID and the website COOKIE that sets website message one to one.
Wherein, the setting-up time section can suitably be arranged according to actual conditions by those skilled in the art, and such as one day or several hours or several days etc., the present invention is not restricted this.Set the website normally visit capacity larger, the user often uses, the COOKIE field of its energy identifying user identity can be by those websites of statistical analysis technique acquisition, such as Baidu, Google, Taobao etc., these websites, its COOKIE is known and unique, can determine whether alone family of user that this user ID indicates according to these websites and user ID be whether corresponding one by one, and then obtain corresponding alone family message.
Step S104: from the message that obtains, obtain quaternary group information.
Wherein, quaternary group information comprises domain name, user ID, the COOKIE field of user's access websites and the value of COOKIE field of user's access websites of user ID indication.
The quaternary group information that obtains can comprise the quaternary group information of above-mentioned setting website, also can be the quaternary group information that comprises those websites except above-mentioned setting website.
Step S106: quaternary group information is added up, obtained the visit information of each user's access websites.
The visit information of website is the information relevant with website visiting, such as PV(PageView, page browsing amount) information, UV(Unique Visitor, independent visitor) information etc.As, can add up the different numbers of the COOKIE value under the website and the different numbers of user ID such as userid according to the access message of each website, obtain the visit information of each user's access websites.
Step S108: the visit information to each user's access websites filters, and obtains the COOKIE field that each user's access websites identifies described user.
The purpose that visit information is filtered is to filter out the COOKIE field that those can not identifying user.
Step S110: the COOKIE field that foundation is obtained and the corresponding relation of user ID, identify described user according to corresponding relation.
For example, suppose to determine that by said process the COOKIE ID in the COOKIE field of a certain website can user of unique identification, the corresponding relation of then setting up is: UID-COOKIE ID, wherein UID represents user ID, is ADSL1+UA1 such as a user's user ID, and COOKIE ID is 4500, then when this user accesses this website, server gets access to the access message of website, and therefrom getting access to COOKIE ID is 4500 o'clock, can determine that then corresponding user is ADSL1+UA1.
By the present embodiment, the first COOKIE and the one-to-one relationship of user ID by determine setting the website, thereby the user who determines this user ID sign unique user whether, and then obtain the message of this unique user.Wherein, set the website and be generally the larger website of visit capacity, its COOKIE is known and unique, can be according to these websites and the user ID corresponding user who determines this user ID sign alone family whether one by one whether.Determining that the user that user ID indicates is in the situation at alone family, access to netwoks daily record message to institute's access websites of this user carries out the processing such as a series of extraction, statistics and filtration, thereby the website that obtains user access can be used for unique identification user's COOKIE field, and then set up the corresponding relation of this COOKIE field and user ID, in follow-up access, the website can be according to this corresponding relation identification user.Because include a large amount of information in the access to netwoks daily record message, wherein in fact the information of some COOKIE field can be used as identity information and use, the solution of the present invention is according to these characteristics of COOKIE information, can from a large amount of information, parse as the COOKIE field of identity information to automation, then which COOKIE field can the unique identification user identity in each website of Analysis deterrmination, set up the corresponding relation of these COOKIE fields and user ID, accurately identify the user with this corresponding relation.Pass through the present embodiment, solve existing scheme and can't accurately identify user's problem, reached accurate identification user's effect, and then, follow-up high precision manipulation can be carried out according to this high-precision recognition result in the website, throw in advertisement etc. such as high accuracy, lowered information interaction cost and information interaction amount, promoted the user access of network is experienced.
Embodiment two
With reference to Fig. 2, show the flow chart of steps according to a kind of user identification method of the embodiment of the invention two.
The user identification method of the present embodiment may further comprise the steps:
Step S202: server obtains in the setting-up time section user ID from access to netwoks daily record message identical, and user ID and the website COOKIE that sets website message one to one.
In the present embodiment, user ID comprises user account and browser version number.User account include but not limited to surf the Net account such as ADSL account or ADSL account+UA, subscriber mailbox etc.
Take a user ID as example, suppose that this user ID is ADSL1+UA1, then server obtains all messages that user ID is ADSL1+UA1 from access to netwoks daily record message.Certainly, user account is not limited to the ADSL account, and other user account is applicable too.
Then, in the message of same user ID, can analyze the message of setting-up time section first, as, the message on the same day that this user ID is corresponding, because of the COOKIE that sets the website generally all representative, therefore user that can this website of identification access, judges whether one to one user ID and the website COOKIE that sets the website, can determine that this user ID has represented an alone family, then obtains the message at this alone family; Not one to one for user ID and website COOKIE, illustrate that then this user ID can not only represent a user, can abandon the message of the type.
Step S204: server obtains quaternary group information from the message that obtains.
Wherein, quaternary group information comprises domain name, user ID, the COOKIE field of user's access websites and the value of COOKIE field of the website that the user of user ID indication accesses.
Step S206: server is added up quaternary group information, obtains the visit information of each user's access websites.
In the present embodiment, the visit information of user's access websites comprises: the PV(page browsing amount of the domain name of user's access websites, domain name), the PV(page browsing amount of domain name) accounting, the COOKIE field of user's access websites, PV(page browsing that user ID is identical) number of times, PV(page browsing that user ID is different) number of times, PV(number of page views that user ID is different) ratio, UV(independence visitor that user ID is identical) number of times, UV(independence visitor that user ID is different) number of times, UV(independence visitor that user ID is different) ratio.
Step S208: server sorts according to the visit information of the different independent visitor's ratio of the different number of page views ratio of user ID and/or user ID to each user's access websites.
This step is preferred steps, by the ordering to website access information, and can be follow-up more effective and rapidly visit information is filtered.Obviously, it also is feasible not sorting and directly filter.
Step S210: server filters the visit information of each user's access websites, obtains the COOKIE field of each user's access websites identifying user.
Preferably, server can use page browsing amount or the mutual information of domain name, and perhaps information gain is filtered the visit information of each user's access websites, obtains the COOKIE field that each user's access websites identifies alone family.
Mutual information and information increment are two terms in the information theory, generally consider these metric relations when using text classification.Mutual information is the relation of the possibility between two event sets only, is whether be effective field by calculating corresponding relation between user ID (user ID) and the COOKIE value if weighing this COOKIE field in the present embodiment.Information gain is asymmetrical relation, is used for measuring the difference of two kinds of probability distribution, namely judges by ratio and the different threshold value of ratio setting corresponding to COOKIE of user ID.Also namely, can determine from a plurality of dimension calculating probabilities the cookie field of the unique identification user identity of each user's access websites.
Step S212: the COOKIE field that server foundation is obtained and the corresponding relation of user ID, according to corresponding relation identification user.
Preferably, if be used for the value that the COOKIE field of identifying user comprises a plurality of COOKIE fields, then between the value of a plurality of COOKIE fields, carry out association; Corresponding relation identification user according to related and COOKIE field and user ID.As, the mailbox of supposing existing user in unique identification user's the COOKIE field has again user's individual account, so, can set up this user's mailbox and the incidence relation of individual account, it also is corresponding relation, then the user can obtain user ID by mailbox, also can obtain user ID by individual account, can certainly obtain user ID by the combination of mailbox and individual account.
Need to prove, there is incidence relation in two or more websites in some cases, such as Taobao and day cat, might use identical COOKIE to indicate the user between these websites, then the scheme of the present embodiment can also comprise: according to the message that obtains, obtain the website access information of two identical websites of COOKIE name, wherein, website access information comprises: the COOKIE field of two websites, the value of COOKIE field, the domain name of two websites, the number of page views that user ID is identical, the number of page views that user ID is different, the number of page views ratio that user ID is different, independent visitor's number of times that user ID is identical, independent visitor's number of times that user ID is different, independent visitor's ratio that user ID is different; Sort according to the visit information of the different independent visitor's ratio of the different number of page views ratio of user ID and/or user ID to two websites; Visit information after the ordering is filtered, determine whether two websites use identical COOKIE field; If the association of then setting up two websites is according to the corresponding relation identification user of related and COOKIE field and user ID.For example, suppose that Taobao has identical COOKIE name with a day cat, by obtaining the website access information of these two websites, this website access information is sorted, after filtration waits and processes, determine that these two websites have used identical COOKIE field, as used identical COOKIE ID, then set up the corresponding relation of Taobao and day cat, then no matter the user accesses Taobao and still accesses a day cat, server can be according to this COOKIE field in the access message, corresponding relation according to identical user ID and website COOKIE is determined user ID, and then the user of definite access websites.
Pass through the present embodiment, realizing accurately identifying on user's the basis, also incidence relation is set up in the website of using identical COOKIE field identification user, when the COOKIE field has a plurality of value, a plurality of values are set up incidence relation, organization of unity and the management of related information have further been realized, improved user's recognition efficiency, and the information of having saved takies resource.
Embodiment three
With reference to Fig. 3, show the flow chart of steps according to a kind of user identification method of the embodiment of the invention three.
The user identification method of the present embodiment may further comprise the steps:
Step S302: obtain the COOKIE field that the website can the unique identification user identity.
Comprise in the COOKIE information a large amount of, such as YYID=D4A741CDC23704C21D8E99150E94F9C4; SUID=96F7B43C26420A0A4EA94973000407AE etc. do not have the COOKIE field information of mark, and these fields can be used as identity information and use.By this step, can automation these fields be parsed from COOKIE, judge then which COOKIE field can the unique identity of identifying user in each website.
This step specifically comprises:
Step S3022: obtain one day the original ptu daily record daily record of message (produce), by upper offline information, each bar log(daily record of mark) ADSL ID.
Step S3024: in the middle of above-mentioned daily record data, choose the data at all alone families.
Wherein, alone family refers to: ADSL ID+UA is identical, and in the middle of intraday Visitor Logs, the COOKIE of several main stream website such as baidu cookie, taobao cookie is corresponding one by one with ADSL ID+UA.
Step S3026: in the middle of alone user data, extract quaternary group information { host, userid, cookie field, the value that the cookie field is concrete }.
Wherein, host represents the domain name of the website that the user accesses; Userid represents user ID, is ADSL ID+UA in the present embodiment; The cookie field represents the COOKIE field of the website that the user accesses; The value of the COOKIE field of the website of the value representation user access that the cookie field is concrete.
Step S3028: according to quaternary group information, statistics obtains following data { host, host pv, host pv accounting, cookie field, identical pv the number of user id, different pv the numbers of user id, different pv the several ratios of user id, identical uv the number of user id, different uv the numbers of user id, the different uv ratios of user id }.
Wherein, host represents the domain name of the website that the user accesses as mentioned above, pv representation page pageview, and uv represents independent visitor, user id represents that user ID also is userid.
Step S30210: according to the resulting data of previous step, to with each website, according to the different number of times ratios of user id, isolated user not same ratio sort, then with host pv threshold filtering, perhaps mutual information or information gain are filtered, and find which COOKIE field is used for identifying unique user under each host.
Wherein, the different number of times ratios of user id are expressed as with the form of molecule/denominator: molecule is a cookie field under a website, the number of the different user id of the unique cookie value that user id is corresponding; Denominator is this cookie field under this website, and the number of the user id that all are different comprises the situation of the corresponding a plurality of cookie values of user id.
Isolated user not same ratio is expressed as with the form of molecule/denominator: molecule is a cookie field under a website, the number of the cookie value of the unique user id that cookie value is corresponding; Denominator is: this cookie field under this website, the cookie value that all are different.
In the middle of the data of the COOKIE field of identifying user, can analyze the information such as a large amount of mailboxes, account.
By said process, can find to automation each website can be used for the COOKIE field of identifying user.
Step S304: the COOKIE field that is used for identifying unique user of obtaining the inter-network station.
At many different web sites, such as " BAIDUID=" relevant information has been used in a lot of websites, at present except BAIDUID, also has the rule of some other websites such as taobao, cnzz, can use the method for statistics, excavate the similar inter-network station identifications of BAIDUID user's COOKIE field.
This step specifically comprises:
Step S3042: from alone user data, extract following data { cookie field, the cookie field value, host1, host2, identical pv the number of user id, different pv the numbers of user id, different pv the several ratios of user id, identical uv the number of user id, different uv the numbers of user id, the different uv ratios of user id }.
Preferably, host can use top-level domain such as host1 and host2, and user id also is userid.
Step S3044: according to the different number of times ratios of user id, isolated user not same ratio sort, then use host pv threshold filtering, also can mutual information or information gain filter, whether in twos count between the website public COOKIE field.
Step S3046: if whether in twos public COOKIE field between the website then according to the data of previous step, is carried out host and merged, the COOKIE field that may appear at a plurality of websites finds.Obtain data table items { cookie field, cookie field value, [host1, host2 ... ].
By said process, can find to automation which website to plant algorithm in the common same COOKIE kind of using, can lean on these COOKIE fields to carry out association between these websites.
Step S306: set up the COOKIE field of identifying user and the association between the user ID.
Comprise:
In the middle of same COOKIE field, the identifier (such as mailbox and the account information that has simultaneously the user) that may have simultaneously two or more identifying users, can set up the relation between these identifiers, and then set up the corresponding relation between these identifiers and the user ID;
Put up a bridge according to inter-network station COOKIE, set up the mapping relations between more identifiers; As, the cna cookie field of Taobao is consistent with the cna field of tmail and Alibaba, then can be by other cookie fields under related these two websites of this identical cna field;
The account information that obtains according to step S302 (comprise subscriber mailbox and log in some general information such as account information), setting up different I D(is user ID userid), the incidence relation of different COOKIE;
By refer tree information, set up the corresponding relation between different COOKIE, the account.As, set up the refer tree according to the redirect relation of user's access websites, jump to Sina such as the user by Baidu's search, and then jump to other websites from Sina, and then set the corresponding relation of setting up between different COOKIE, the account according to this refer.
Step S308: according to the COOKIE field of setting up and the association between the user ID, the identification user.
Pass through the present embodiment, can from a large amount of information, parse as the COOKIE field of identity information to automation, then which COOKIE field can the unique identification user identity in each website of Analysis deterrmination, set up the corresponding relation of these COOKIE fields and user ID, accurately identify the user with this corresponding relation, solve existing scheme and can't accurately identify user's problem, reached accurate identification user's effect.
Embodiment four
With reference to Fig. 4, show the structured flowchart according to a kind of customer identification device of the embodiment of the invention four.
The customer identification device of the present embodiment comprises: the first acquisition module 402, be used for obtaining in the setting-up time section user ID from access to netwoks daily record message identical, and the website COOKIE of user ID and setting website message one to one; The second acquisition module 404 is used for obtaining quaternary group information from the message that obtains, and wherein, quaternary group information comprises domain name, user ID, the COOKIE field of user's access websites and the value of COOKIE field of user's access websites of user ID indication; The 3rd acquisition module 406 is used for quaternary group information is added up, and obtains the visit information of each user's access websites; The 4th acquisition module 408 is used for the visit information of each user's access websites is filtered, and obtains each user's access websites unique identification user's COOKIE field; Identification module 410 be used for to be set up the COOKIE field and the corresponding relation of user ID obtained, identifies the user according to corresponding relation.
Preferably, user ID comprises user account and browser version number; The visit information of user's access websites comprises: the page browsing amount accounting of the page browsing amount of the domain name of user's access websites, domain name, domain name, the COOKIE field of user's access websites, the different different identical different different independent visitor's ratio of independent visitor's number of times, user ID of independent visitor's number of times, user ID of number of page views ratio, user ID of number of page views, user ID of number of page views, user ID that user ID is identical.
Preferably, the customer identification device of the present embodiment also comprises: order module 412, be used for before the 4th acquisition module 408 filters the visit information of each user's access websites, sort according to the visit information of the different independent visitor's ratio of the different number of page views ratio of user ID and/or user ID to each user's access websites.
Preferably, the 4th acquisition module 408 be used for to use page browsing amount or the mutual information of domain name, and perhaps information gain is filtered the visit information of each user's access websites, obtains the COOKIE field that each user's access websites identifies described alone family.
Preferably, the customer identification device of the present embodiment also comprises: the first relating module 414, be used for according to the message that obtains, obtain the website access information of two identical websites of COOKIE name, wherein, website access information comprises: the COOKIE field of two websites, the value of COOKIE field, the domain name of two websites, the number of page views that user ID is identical, the number of page views that user ID is different, the number of page views ratio that user ID is different, independent visitor's number of times that user ID is identical, independent visitor's number of times that user ID is different, independent visitor's ratio that user ID is different; Sort according to the visit information of the different independent visitor's ratio of the different number of page views ratio of user ID and/or user ID to two websites; Visit information after the ordering is filtered, determine whether two websites use identical COOKIE field; If the association of then setting up two websites is according to the corresponding relation identification user of related and COOKIE field and user ID.
Preferably, the customer identification device of the present embodiment also comprises: the second relating module 416, if be used for the value that the COOKIE field of identifying user comprises a plurality of COOKIE fields, then carry out association between the value of a plurality of COOKIE fields; Corresponding relation identification user according to related and COOKIE field and user ID.
The customer identification device of the present embodiment is used for realizing the corresponding user identification method of aforementioned a plurality of embodiment of the method, and the beneficial effect with corresponding embodiment of the method, does not repeat them here.
Each embodiment in this specification all adopts the mode of going forward one by one to describe, and what each embodiment stressed is and the difference of other embodiment that identical similar part is mutually referring to getting final product between each embodiment.For device embodiment because itself and embodiment of the method basic simlarity, so describe fairly simple, relevant part gets final product referring to the part explanation of embodiment of the method.
Above a kind of user identification method provided by the present invention and device are described in detail, used specific case herein principle of the present invention and execution mode are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (10)

1. a user identification method is characterized in that, comprising:
From access to netwoks daily record message, obtain in the setting-up time section user ID identical, and described user ID and the website COOKIE that sets website message one to one;
From the described message that obtains, obtain quaternary group information, wherein, described quaternary group information comprises domain name, described user ID, the COOKIE field of described user's access websites and the value of described COOKIE field of user's access websites of described user ID indication;
Described quaternary group information is added up, obtained the visit information of each described user's access websites;
Visit information to described each user's access websites filters, and obtains the COOKIE field that described each user's access websites identifies described user;
The described COOKIE field that foundation is obtained and the corresponding relation of described user ID are identified described user according to described corresponding relation.
2. method according to claim 1 is characterized in that,
Described user ID comprises user account and browser version number;
The visit information of described user's access websites comprises:
The page browsing amount accounting of the page browsing amount of the domain name of described user's access websites, domain name, domain name, the COOKIE field of described user's access websites, the different different identical different different independent visitor's ratio of independent visitor's number of times, described user ID of independent visitor's number of times, user ID of number of page views ratio, user ID of number of page views, described user ID of number of page views, user ID that user ID is identical.
3. method according to claim 2 is characterized in that, before the described step that the visit information of described each user's access websites is filtered, also comprises:
Sort according to the visit information of the different independent visitor's ratio of the different number of page views ratio of described user ID and/or described user ID to described each user's access websites.
4. according to claim 2 or 3 described methods, it is characterized in that, described visit information to described each user's access websites filters, and obtains the step that described each user's access websites identifies described user's COOKIE field and comprises:
Use page browsing amount or the mutual information of domain name, perhaps information gain is filtered the visit information of described each user's access websites, obtains the COOKIE field that described each user's access websites identifies described alone family.
5. method according to claim 1 is characterized in that, also comprises:
According to the described message that obtains, obtain the website access information of two identical websites of COOKIE name, wherein, described website access information comprises:
The COOKIE field of described two websites, the value of described COOKIE field, the domain name of described two websites, the different different identical different different independent visitor's ratio of independent visitor's number of times, described user ID of independent visitor's number of times, user ID of number of page views ratio, user ID of number of page views, described user ID of number of page views, user ID that user ID is identical;
Sort according to the visit information of the different independent visitor's ratio of the different number of page views ratio of described user ID and/or described user ID to described two websites;
Described visit information after the ordering is filtered, determine whether described two websites use identical COOKIE field;
If the association of then setting up described two websites is identified described user according to the corresponding relation of described association and described COOKIE field and described user ID.
6. method according to claim 1 is characterized in that, also comprises:
If comprise the value of a plurality of COOKIE fields for the described COOKIE field that identifies described user, then between the value of described a plurality of COOKIE fields, carry out association;
Corresponding relation according to described association and described COOKIE field and described user ID is identified described user.
7. a customer identification device is characterized in that, comprising:
The first acquisition module is used for obtaining in the setting-up time section user ID from access to netwoks daily record message identical, and described user ID and the website COOKIE that sets website message one to one;
The second acquisition module, be used for obtaining quaternary group information from the described message that obtains, wherein, described quaternary group information comprises domain name, described user ID, the COOKIE field of described user's access websites and the value of described COOKIE field of user's access websites of described user ID indication;
The 3rd acquisition module is used for described quaternary group information is added up, and obtains the visit information of each described user's access websites;
The 4th acquisition module is used for the visit information of described each user's access websites is filtered, and obtains the COOKIE field that described each user's access websites identifies described user;
Identification module be used for to be set up the described COOKIE field obtained and the corresponding relation of described user ID, identifies described user according to described corresponding relation.
8. device according to claim 7 is characterized in that,
Described user ID comprises user account and browser version number;
The visit information of described user's access websites comprises: the domain name of described user's access websites, the page browsing amount of domain name, the page browsing amount accounting of domain name, the COOKIE field of described user's access websites, the number of page views that user ID is identical, the number of page views that user ID is different, the number of page views ratio that described user ID is different, independent visitor's number of times that user ID is identical, independent visitor's number of times that user ID is different, independent visitor's ratio that described user ID is different.
9. device according to claim 8 is characterized in that, also comprises:
Order module, be used for before described the 4th acquisition module filters the visit information of described each user's access websites, sort according to the visit information of the different independent visitor's ratio of the different number of page views ratio of described user ID and/or described user ID to described each user's access websites.
10. according to claim 8 or 9 described devices, it is characterized in that, described the 4th acquisition module, be used for using page browsing amount or the mutual information of domain name, perhaps information gain, visit information to described each user's access websites filters, and obtains the COOKIE field that described each user's access websites identifies described alone family.
CN2012105932268A 2012-12-31 2012-12-31 User identification method and device Pending CN103051637A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012105932268A CN103051637A (en) 2012-12-31 2012-12-31 User identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012105932268A CN103051637A (en) 2012-12-31 2012-12-31 User identification method and device

Publications (1)

Publication Number Publication Date
CN103051637A true CN103051637A (en) 2013-04-17

Family

ID=48064136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012105932268A Pending CN103051637A (en) 2012-12-31 2012-12-31 User identification method and device

Country Status (1)

Country Link
CN (1) CN103051637A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103533530A (en) * 2013-09-26 2014-01-22 林毅 Cross-device user corresponding and user tracking methods and systems
CN103944995A (en) * 2014-04-28 2014-07-23 东华大学 Method for recognizing accounts of independent users in broadband network
CN103944916A (en) * 2014-04-28 2014-07-23 东华大学 Key Cookies identification method for Web session merging
CN103995907A (en) * 2014-06-13 2014-08-20 北京奇艺世纪科技有限公司 Determining method of access users
CN104199849A (en) * 2014-08-08 2014-12-10 亿赞普(北京)科技有限公司 Advertisement injecting method and device
CN104484357A (en) * 2014-12-01 2015-04-01 北京国双科技有限公司 Data processing method and device and access frequency information processing method and device
CN104717079A (en) * 2013-12-12 2015-06-17 华为技术有限公司 Network flow data processing method and device
CN105224593A (en) * 2015-08-25 2016-01-06 中国人民解放军信息工程大学 Frequent co-occurrence account method for digging in a kind of of short duration online affairs
CN105447148A (en) * 2015-11-26 2016-03-30 上海晶赞科技发展有限公司 Cookie identifier association method and apparatus
CN106302797A (en) * 2016-08-31 2017-01-04 北京锐安科技有限公司 A kind of cookie accesses De-weight method and device
CN106656934A (en) * 2015-11-03 2017-05-10 中国移动通信集团公司 User identity mapping method and user identity mapping device based on operator gateway log
CN107592214A (en) * 2017-08-28 2018-01-16 杭州安恒信息技术有限公司 A kind of method for identifying Internet application system login username
CN107767070A (en) * 2017-11-06 2018-03-06 泰康保险集团股份有限公司 method and device for information popularization
CN108282475A (en) * 2018-01-18 2018-07-13 世纪龙信息网络有限责任公司 User identity information read method and system, computer storage media and equipment
CN108462615A (en) * 2018-02-05 2018-08-28 百川通联(北京)网络技术有限公司 A kind of network user's group technology and device
CN109388686A (en) * 2017-08-10 2019-02-26 北京国双科技有限公司 A kind of user identifier method and device
CN109583472A (en) * 2018-10-30 2019-04-05 中国科学院计算技术研究所 A kind of web log user identification method and system
CN111581235A (en) * 2020-03-25 2020-08-25 贝壳技术有限公司 Method and system for identifying common incidence relation

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020007317A1 (en) * 1998-03-30 2002-01-17 Patrick Joseph Callaghan Method, system and program products for sharing state information across domains
US20080235243A1 (en) * 2007-03-21 2008-09-25 Nhn Corporation System and method for expanding target inventory according to browser-login mapping
CN101523379A (en) * 2006-08-18 2009-09-02 阿卡麦科技公司 Method of data collection in a distributed network
CN101651671A (en) * 2008-08-14 2010-02-17 鸿富锦精密工业(深圳)有限公司 Inter-system subscriber identity authentication system and method
CN101847160A (en) * 2010-05-19 2010-09-29 深圳市五巨科技有限公司 Method and device for pushing personalized pages to mobile terminal
CN101860987A (en) * 2010-05-07 2010-10-13 中兴通讯股份有限公司 Mobile terminal and method for acquiring network information by same
CN101945234A (en) * 2010-09-13 2011-01-12 深圳市华曦达科技股份有限公司 Method and television terminal for formulating personalized menu based on use frequency of user
CN101968802A (en) * 2010-09-30 2011-02-09 百度在线网络技术(北京)有限公司 Method and equipment for recommending content of Internet based on user browse behavior
CN102333092A (en) * 2011-09-30 2012-01-25 北京亿赞普网络技术有限公司 Network user identification method and application server

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020007317A1 (en) * 1998-03-30 2002-01-17 Patrick Joseph Callaghan Method, system and program products for sharing state information across domains
CN101523379A (en) * 2006-08-18 2009-09-02 阿卡麦科技公司 Method of data collection in a distributed network
US20080235243A1 (en) * 2007-03-21 2008-09-25 Nhn Corporation System and method for expanding target inventory according to browser-login mapping
CN101651671A (en) * 2008-08-14 2010-02-17 鸿富锦精密工业(深圳)有限公司 Inter-system subscriber identity authentication system and method
CN101860987A (en) * 2010-05-07 2010-10-13 中兴通讯股份有限公司 Mobile terminal and method for acquiring network information by same
CN101847160A (en) * 2010-05-19 2010-09-29 深圳市五巨科技有限公司 Method and device for pushing personalized pages to mobile terminal
CN101945234A (en) * 2010-09-13 2011-01-12 深圳市华曦达科技股份有限公司 Method and television terminal for formulating personalized menu based on use frequency of user
CN101968802A (en) * 2010-09-30 2011-02-09 百度在线网络技术(北京)有限公司 Method and equipment for recommending content of Internet based on user browse behavior
CN102333092A (en) * 2011-09-30 2012-01-25 北京亿赞普网络技术有限公司 Network user identification method and application server

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103533530A (en) * 2013-09-26 2014-01-22 林毅 Cross-device user corresponding and user tracking methods and systems
CN103533530B (en) * 2013-09-26 2017-09-26 余飞 The user's correspondence and user tracking method, system of a kind of striding equipment
CN104717079A (en) * 2013-12-12 2015-06-17 华为技术有限公司 Network flow data processing method and device
CN103944995A (en) * 2014-04-28 2014-07-23 东华大学 Method for recognizing accounts of independent users in broadband network
CN103944916A (en) * 2014-04-28 2014-07-23 东华大学 Key Cookies identification method for Web session merging
CN103944995B (en) * 2014-04-28 2017-06-06 东华大学 A kind of method of separate user accounts in identification broadband network
CN103995907B (en) * 2014-06-13 2017-04-12 北京奇艺世纪科技有限公司 Determining method of access users
CN103995907A (en) * 2014-06-13 2014-08-20 北京奇艺世纪科技有限公司 Determining method of access users
CN104199849A (en) * 2014-08-08 2014-12-10 亿赞普(北京)科技有限公司 Advertisement injecting method and device
CN104484357A (en) * 2014-12-01 2015-04-01 北京国双科技有限公司 Data processing method and device and access frequency information processing method and device
CN105224593B (en) * 2015-08-25 2019-08-16 中国人民解放军信息工程大学 Frequent co-occurrence account method for digging in the of short duration online affairs of one kind
CN105224593A (en) * 2015-08-25 2016-01-06 中国人民解放军信息工程大学 Frequent co-occurrence account method for digging in a kind of of short duration online affairs
CN106656934A (en) * 2015-11-03 2017-05-10 中国移动通信集团公司 User identity mapping method and user identity mapping device based on operator gateway log
CN106656934B (en) * 2015-11-03 2020-02-14 中国移动通信集团公司 User identifier mapping method and device based on operator gateway log
CN105447148A (en) * 2015-11-26 2016-03-30 上海晶赞科技发展有限公司 Cookie identifier association method and apparatus
CN105447148B (en) * 2015-11-26 2018-12-21 上海晶赞科技发展有限公司 A kind of Cookie mark correlating method and device
CN106302797A (en) * 2016-08-31 2017-01-04 北京锐安科技有限公司 A kind of cookie accesses De-weight method and device
CN109388686A (en) * 2017-08-10 2019-02-26 北京国双科技有限公司 A kind of user identifier method and device
CN107592214A (en) * 2017-08-28 2018-01-16 杭州安恒信息技术有限公司 A kind of method for identifying Internet application system login username
CN107592214B (en) * 2017-08-28 2021-05-14 杭州安恒信息技术股份有限公司 Method for identifying login user name of internet application system
CN107767070A (en) * 2017-11-06 2018-03-06 泰康保险集团股份有限公司 method and device for information popularization
CN107767070B (en) * 2017-11-06 2021-06-11 泰康保险集团股份有限公司 Method and device for information popularization
CN108282475B (en) * 2018-01-18 2020-09-08 世纪龙信息网络有限责任公司 User identification information reading method and system, computer storage medium and device
CN108282475A (en) * 2018-01-18 2018-07-13 世纪龙信息网络有限责任公司 User identity information read method and system, computer storage media and equipment
CN108462615A (en) * 2018-02-05 2018-08-28 百川通联(北京)网络技术有限公司 A kind of network user's group technology and device
CN109583472A (en) * 2018-10-30 2019-04-05 中国科学院计算技术研究所 A kind of web log user identification method and system
CN111581235A (en) * 2020-03-25 2020-08-25 贝壳技术有限公司 Method and system for identifying common incidence relation
CN111581235B (en) * 2020-03-25 2021-08-03 贝壳找房(北京)科技有限公司 Method and system for identifying common incidence relation

Similar Documents

Publication Publication Date Title
CN103051637A (en) User identification method and device
Lee et al. Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection
CN103218431B (en) A kind ofly can identify the system that info web gathers automatically
CN107800591B (en) Unified log data analysis method
CN101572629B (en) Method and device for processing IP data
CN102663626B (en) Based on the collaborative filtering recommending method of provincial characteristics
CN103823888B (en) Node-closeness-based social network site friend recommendation method
CN102567494B (en) Website classification method and device
CN103237094B (en) A kind of method and device identifying user
CN100578504C (en) Web page importance evaluation method and system
CN102200987A (en) Method and system for searching sock puppet identification number based on behavioural analysis of user identification numbers
CN101477552A (en) Website user rank division method
CN101789887A (en) Method and device for classifying network users and system for monitoring network services
CN107358075A (en) A kind of fictitious users detection method based on hierarchical clustering
CN109242553A (en) A kind of user behavior data recommended method, server and computer-readable medium
JP2014506355A (en) Collecting method and system for electronic bulletin board reply increase amount
CN102819580A (en) Monitoring method and system of advertisements of internet third-part media website
CN107395650A (en) Even method and device is returned based on sandbox detection file identification wooden horse
CN103218411B (en) Website related information acquisition methods and device
CN103593344B (en) A kind of information collecting method and device
CN102156746A (en) Method for evaluating performance of search engine
CN104268289A (en) Link URL (Uniform Resource Locator) failure detection method and device
CN103093377A (en) Method and system of advertisement putting
CN109104381B (en) Mobile application identification method based on third-party traffic HTTP message
CN103684856A (en) Video website infrastructure measurement and analysis method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20161116

C20 Patent right or utility model deemed to be abandoned or is abandoned