CN103237094B - A kind of method and device identifying user - Google Patents

A kind of method and device identifying user Download PDF

Info

Publication number
CN103237094B
CN103237094B CN201310134318.4A CN201310134318A CN103237094B CN 103237094 B CN103237094 B CN 103237094B CN 201310134318 A CN201310134318 A CN 201310134318A CN 103237094 B CN103237094 B CN 103237094B
Authority
CN
China
Prior art keywords
cookie
user
value
field
website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310134318.4A
Other languages
Chinese (zh)
Other versions
CN103237094A (en
Inventor
罗峰
黄苏支
李娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING IZP TECHNOLOGIES Co Ltd
Original Assignee
BEIJING IZP TECHNOLOGIES Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING IZP TECHNOLOGIES Co Ltd filed Critical BEIJING IZP TECHNOLOGIES Co Ltd
Priority to CN201310134318.4A priority Critical patent/CN103237094B/en
Publication of CN103237094A publication Critical patent/CN103237094A/en
Application granted granted Critical
Publication of CN103237094B publication Critical patent/CN103237094B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The present invention discloses a kind of method and the device that identify user, by the long-term cookie field being used for unique identification user identity that the mode of statistics obtains the correspondence of each website, then the redirect relation of message is accessed according to user, these cookie are associated, generates the user ID of user's cookie-value relation table and correspondence thereof; Gather the access message that user sends; According to the user ID of user cookie-value relation table and correspondence thereof, mark is carried out to access message and carry out user's identification, thus substitute information such as using ADSL, IP and carry out user's identification, effectively can improve accuracy and recognition efficiency that user identifies.

Description

A kind of method and device identifying user
Technical field
The present invention relates to internet information treatment technology, particularly relate to a kind of method and the device that identify user.
Background technology
In Internet technology, ADSL can be passed through, IP and UA (UserAgent) identifies user, but above-mentioned several method all has some limitations in actual applications: 1) great majority access message does not carry ADSL information, if identify that user can cause most user's None-identified by ADSL, recognition efficiency is low; 2) a lot of user computer is all adopt dynamic IP at present, and namely the IP address of user computer is often change, if identify user by IP address, is difficult to accurately navigate to user; 3) if identify user by UA, a general user, with multiple UA, namely uses multiple browser, and its next UA can corresponding a lot of user, also cannot accurately identify user.
Summary of the invention
In view of this, the technical problem to be solved in the present invention is to provide a kind of method and the device that identify user, associated user is carried out by cookie, make the cookie list of each user correspondence one oneself in order to identifying user identity, resolved by the cookie carried access message, user's degree of depth is identified.
For achieving the above object, the present invention is achieved through the following technical solutions:
Identify a user's method, the method comprises,
Generate user's cookie-value relation table and each self-corresponding can the user ID of unique identification user, wherein, described user cookie-value relation table have recorded each website corresponding be used for the cookie field of identifying user identity and the incidence relation of customer identity registration information;
Gather the access message that user sends;
When described access message carries cookie information, resolve and obtain cookie field corresponding to described cookie information and field value;
When described cookie field is present in user cookie-value relation table, the user ID corresponding with this user cookie-value relation table is added in described access message.
Further, described method also comprises, from described interpolation user ID access message in, the cookie-value carried according to message extracts the cookie value of corresponding unique subscriber.
Further, when described cookie field is not present in user cookie-value relation table, the cookie value of field value corresponding for described cookie field with the corresponding unique subscriber of described extraction is mated, if field value corresponding to described cookie field is identical with the cookie value of the corresponding unique subscriber of described extraction, then the match is successful, the user ID of cookie-value relation table corresponding for the cookie value of the corresponding unique subscriber of described extraction added in described access message.
Further, described method also comprises, when described access message does not carry cookie information, obtain the information that in described access message, URL carries, and the information of being carried by described URL is mated with the cookie value of the corresponding unique subscriber of described extraction, if the information that described URL carries is identical with the cookie value of the corresponding unique subscriber of described extraction, then the match is successful, the user ID of cookie-value relation table corresponding for the cookie value of the corresponding unique subscriber of described extraction added in described access message.
Further, if during the corresponding multiple user ID of the cookie value of the corresponding unique subscriber of described extraction, then merge described multiple user ID, and the corresponding relation recorded between the cookie value of described corresponding unique subscriber and each user ID, and again user ID corresponding for this user cookie-value relation table is added in described access message.
Further, described generation user's cookie-value relation table and respectively self-correspondingly can the user ID of unique identification user to comprise,
Screening single user and website traffic reach the website of preset flow threshold value;
Obtain Zhong Ge website, described website according to single user and be used for the cookie field of identifying user identity, generate domain-cookie dictionary;
Generate the redirect graph of a relation that user accesses message;
According to described user access the redirect graph of a relation of message and described domain-cookie dictionary creation user cookie-value relation table and each self-corresponding can the user ID of unique identification user.
Further, described screening single user comprises,
What gather the different web sites that ADSL terminal use is corresponding in a period of time can the cookie field value of unique identification user identity, if described cookie field value remains unchanged within a predetermined period of time, then judges that this user is as single user.
Further, describedly obtain Zhong Ge website, described website according to single user and be used for the cookie field of identifying user identity, generate domain-cookie dictionary and comprise,
Gather single user and cookie value that in each website cookie, each cookie field is corresponding;
Add up the quantity of the cookie value of the corresponding unique single user of each cookie field cookie value, the frequency that cookie value when calculating this cookie field cookie value and single user one_to_one corresponding occurs;
Add up the quantity of the single user of the corresponding unique cookie value of each cookie field single user, the frequency that single user when calculating this cookie field cookie value and single user one_to_one corresponding occurs;
The frequency that the frequency occurred according to cookie value when the quantity of the quantity of single user corresponding to each cookie field, cookie value, cookie value and single user one_to_one corresponding and single user occur is filtered each website cookie, selects cookie value quantity, single user quantity and cookie value and the high cookie field of the single user one_to_one corresponding frequency of occurrences as the cookie field for identifying user identity of corresponding website;
Cookie field according to the identifying user identity of each website domain and correspondence thereof generates domain-cookie dictionary.
Further, the redirect graph of a relation that described generation user accesses message comprises,
Gather the access message of all users in a period of time;
First according to the mode of ADSL+UA, described message is divided into groups, if there is the access message not carrying ADSL information, then divide into groups according to the mode of IP+UA, and sort according to the access time to often organizing message;
Set up the redirect graph of a relation that user accesses message.
Further, describedly access the redirect graph of a relation of message and described domain-cookie dictionary according to user and set up user cookie-value relation table and comprise,
S1: the host domain name of access message is not identical with the Main Domain (domain) of redirect graph of a relation, and cookie corresponding to two website domain names is in domain-cookie dictionary, the cookie value of described two website domain names is associated, generate cookie-value couple, as as described in cookie-value to setting up, then under this user, the cookie-value of two websites adds 1 to degree of incidence;
S2: access message according to user and generate cookie corresponding relation figure, the degree of incidence that in statistical chart, cookie-value is right;
S3: the degree of incidence threshold value according to presetting is screened, and obtains the connected component of cookie corresponding relation figure, generate user's cookie-value relation table and each self-corresponding can the user ID of unique identification user.
Correspondingly, the present invention also discloses a kind of device identifying user, and described device comprises,
Generation module, for generate user cookie-value relation table and each self-corresponding can the user ID of unique identification user, wherein, described user cookie-value relation table have recorded each website corresponding be used for the cookie field of identifying user identity and the incidence relation of customer identity registration information;
Acquisition module, for gathering the access message that user sends;
First judge module, for judging whether described access message carries cookie information;
Second judge module, for when described access message carries cookie information, judges whether the cookie field after resolving is present in user cookie-value relation table;
First identification module, for resolving and obtaining cookie field corresponding to the access message cookie information of carrying and field value, and when described cookie field is present in user cookie-value relation table, the user ID corresponding with this user cookie-value relation table is added in described access message.
Technical scheme of the present invention, by the long-term cookie field being used for unique identification user identity that the mode of statistics obtains the correspondence of each website, then according to the redirect relation of user's access websites, these cookie are associated, generate the user ID of user's cookie-value relation table and correspondence thereof, gather the access message that user sends; According to the user ID of user cookie-value relation table and correspondence thereof, mark is carried out to access message and carry out user's identification, thus substitute information such as using ADSL, IP and carry out user's identification, effectively can improve accuracy and recognition efficiency that user identifies.
Accompanying drawing explanation
The method flow diagram of the identification user that Fig. 1 provides for first embodiment of the invention;
The method flow diagram of the identification user that Fig. 2 provides for second embodiment of the invention;
The generation cookie-value relation table that Fig. 3 provides for the embodiment of the present invention and each self-corresponding can the flow chart of user ID of unique identification user;
The method flow diagram of the cookie field of the screening identifying user identity that Fig. 4 provides for the embodiment of the present invention;
Fig. 5 is used for the schematic diagram of cookie field of identifying user identity for screening that the embodiment of the present invention provides;
Fig. 6 is the present invention's apparatus structure block diagram corresponding with the method for the identification user that the first embodiment provides;
Fig. 7 is the present invention's apparatus structure block diagram corresponding with the method for the identification user that the second embodiment provides.
Embodiment
Below in conjunction with drawings and Examples, the invention will be further described.
Cookie (Cookies) a kind ofly preserves text on computers by browser, transmits between Web server and browser along with user's request and the page.During each access site of user, web application can read the information that Cookie comprises.Cookie is only for store character string value.If multiple browser (UA) installed by a computer, each browser can deposit cookie with independently space.Because not only user can be confirmed in cookie, the information of computer and browser can also be comprised, so a user logs in different browsers or by different computer log, can obtain different cookie information; In addition, for the Multiuser using same browser on same computer, cookie can not distinguish their identity, unless they use different user names to log in.
Because the read-write operation of cookie file relies on current browser completely, different browsers can not share a cookie file, and cookie is easy to by user manually or to be deleted automatically by software or expired.Consider the feature that cookie itself has, technical scheme of the present invention obtains the long-term cookie field being used for uniquely indicating user identity of the correspondence of each website by the mode of statistics, these cookie associate by the redirect relation of then accessing message according to user, generate the user ID of cookie-value relation table and correspondence; Gather the access message that user sends; According to the user ID of user cookie-value relation table and correspondence thereof, mark is carried out to access message and carry out user's identification.
The method flow diagram of the identification user that Fig. 1 provides for first embodiment of the invention.As shown in Figure 1, the method comprises the steps:
Step 101: generate user's cookie-value relation table and each self-corresponding can the user ID of unique identification user, wherein, described user cookie-value relation table have recorded each website corresponding be used for the cookie field of identifying user identity and the incidence relation of customer identity registration information.
The generation cookie-value relation table that Fig. 3 provides for the embodiment of the present invention and each self-corresponding can the flow chart of user ID of unique identification user.As shown in Figure 3, the method flow process comprises the steps:
Step 301: screening single user and website traffic reach the website of preset flow threshold value.
The process of described screening single user is can the cookie field value of unique identification user identity by what gather different web sites that in a period of time, ADSL terminal use is corresponding, as as described in cookie field value remain unchanged in scheduled time threshold range, then judge that this user is as single user.
The attribute of the cookie field of website has a lot, by the value of cookie field that can be used for identifying user identity that some Top Sites in each ADSL terminal use (ADSL+UA) a period of time of complicate statistics are corresponding.Table 1 be several Top Site can the cookie field of identifying user identity, No. qq of such as Tengxun, the i.e. value of Tengxun website o_cookie field; The BAIDUID of baidu, the value etc. that namely baidu website BAIDUID field is corresponding.If the value of the cookie field of identifying user identity that what each ADSL terminal use was corresponding can be used for remains unchanged in scheduled time threshold value (such as 1 month, 2 months etc.), then ADSL terminal use is single user, and namely an ADSL terminal only has a user to use in scheduled time threshold value.Can arrange according to actual needs for scheduled time threshold value.
qq o_cookie
taobao cna
baidu BAIDUID
renren id
sohu SUV
pptv PUID
weibo un
sina SINAGLOBAL
Table 1
When described screening website traffic reaches the website of preset flow threshold value, described flow threshold is arranged according to actual needs, for limiting the sample size of statistics website.In order to ensure that data statistics result has better generality, screening total flow in all websites in the present embodiment and coming the website of front 3000 as sample.
Step 302: obtain Zhong Ge website, described website according to single user and be used for the cookie field of identifying user identity, generate domain-cookie dictionary.
The method flow diagram of the cookie field of the screening identifying user identity that Fig. 4 provides for the embodiment of the present invention.As shown in Figure 4, this flow process comprises the steps:
Step 3021: gather single user and cookie value that in each website cookie, each cookie field is corresponding;
Step 3022: the quantity finding out the cookie value of the corresponding unique single user of each cookie field cookie value, the frequency that cookie value when calculating this cookie field cookie value and single user one_to_one corresponding occurs;
Step 3023: the quantity finding out the single user of the corresponding unique cookie value of each cookie field single user, the frequency that single user when calculating this cookie field cookie value and single user one_to_one corresponding occurs;
Step 3024: the frequency that the frequency occurred according to cookie value when the quantity of the quantity of single user corresponding to each cookie field, cookie value, cookie value and single user one_to_one corresponding and single user occur is filtered each website cookie, selects cookie value quantity, single user quantity and cookie value and the high cookie field of the single user one_to_one corresponding frequency of occurrences as the cookie field for identifying user identity of corresponding website;
Step 3025: the cookie field of identifying user identity can generate domain-cookie dictionary according to each website domain and correspondence thereof.
In the present embodiment, the cookie value that each cookie field of each website cookie under statistics top3000 flow is corresponding and single user.Be described for Tengxun, Tengxun website cookie comprises multiple cookie field, for o-cookie field, under described o-cookie field, the number of cookie value is m, the number of single user is n, wherein, each cookie value can corresponding multiple single user, each single user also can corresponding multiple cookie value, that is, one concrete No. qq as a cookie value, this No. qq can log on different browsers, such one No. qq will corresponding multiple user, this field lower some No. qq and user now can be made to be one to one, a single user can have multiple No. qq, in this case also only some No. qq and single user are one to one, relation one to one just can be filtered out between qq and user by these two kinds of modes, in addition, the cookie value also needing to consider that this cookie field is corresponding and number of users, just have more generality to the result that large sample is added up like this.The each cookie field of this kind of method to Tengxun website is utilized to screen, find out each cookie field cookie value and single user one_to_one corresponding and the high cookie field of cookie value quantity, the single user quantity frequency of occurrences, this field is considered to the cookie field that may be used for identifying user identity, thus can to obtain the cookie field for identifying user identity corresponding to Tengxun website be o_cookie field.
By that analogy, each website can be found out in the website of described top3000 flow for the cookie field of identifying user identity, the cookie field of identifying user identity can generate domain-cookie dictionary according to each website domain and correspondence.
Fig. 5 is used for the schematic diagram of cookie field of identifying user identity for screening that the embodiment of the present invention provides.Concrete,
1) for each cookie field under top3000host domain name, its value represents with V, and user represents with U, and the cookie stored due to subscription client is unstable, than if any cookie value to expire inefficacy, new cookie value will be produced during user's access websites.Under supposing a cookie field, the number of cookie value is m, and user's number is n; Corresponding Ki the user of cookie value Vi, corresponding Ti the cookie value of user Ui;
2) find out the number of unique U corresponding to cookie value V, i.e. the number of Ki==1, then divided by the number of total V, be set to k/m;
3) find out the number of unique V corresponding to user U, i.e. the number of Ti==1, then divided by the number of total U, be set to t/n;
4) to extract the cookie field under host domain name as filter condition according to m, n, k/m, t/n under each cookie field of each host domain name, described cookie field is as the cookie of identifying user identity, and the cookie field according to the identifying user identity of each website domain and correspondence thereof generates domain-cookie dictionary.
Step 303: generate the redirect graph of a relation that user accesses message.
The step that the redirect graph of a relation that described user accesses message generates comprises,
Gather the access message of all users in a period of time;
According to ADSL+UA and IP+UA described message divided into groups and sort according to the access time to often organizing message;
Set up the redirect graph of a relation that user accesses message.
Step 304: according to described user access the redirect graph of a relation of message and described domain-cookie dictionary creation user cookie-value relation table and each self-corresponding can the user ID of unique identification user.
The cookie field of proposition according to user's redirect relationship, is made the corresponding a series of cookie field of each user and value by this step.
Set up the corresponding relation between cookie according to the access redirect graph of a relation of message and the domain-cookie dictionary of generation, concrete steps comprise,
(1) the host domain name of accessing message is not identical with the Main Domain (domain) of redirect graph of a relation, and cookie corresponding to two website domain names is in domain-cookie dictionary, the cookie value of described two website domain names is associated, generate cookie-value couple, as as described in cookie-value to setting up, then under this user, the cookie-value of two websites adds 1 to degree of incidence;
(2) statistics, generate cookie corresponding relation figure, the node of figure represents cookie-value, while represent the degree of incidence between two cookie-value;
(3) limit degree of incidence being less than threshold value is removed, and generates the strong continune component of non-directed graph.The corresponding user of each component, namely generates the ID of user's cookie-value relation table and correspondence;
(4) ID utilizing user cookie-value relation table corresponding identifies node, repeats step (2)-(4), until the number of the user cookie-value relation table generated no longer changes.
Concrete, gather the access message of all users in a period of time, according to ADSL+UA, described message is divided into groups, describedly often organize the situation that message accounting different user uses same UA accessed web page within this time period.If some message is not containing ADSL, then according to IP+UA, the described message not containing ADSL is divided into groups, often organize the situation that message accounting different user uses same UA accessed web page within this time period.Sort described according to time sequencing according to the message after ADSL+UA and IP+UA grouping, set up the redirect graph of a relation of every group access message.
Set up cookie corresponding relation figure according to the redirect graph of a relation of every group access message and the domain-cookie dictionary of generation, and the degree of incidence right to each cookie-value carries out record; According to preset degree of incidence threshold value screen, retain the cookie-value couple that degree of incidence is more than or equal to default degree of incidence threshold value, the searching algorithm of figure is utilized to obtain the connected component of cookie corresponding relation figure, each connected component represents a user, generate user's cookie-value relation table and correspondence user ID.It will be readily appreciated by those skilled in the art that the searching algorithm of described figure can adopt Depth Priority Algorithm, breadth-first search etc.
Such as, one represents to scheme G according to the cookie corresponding relation figure of the access message redirect graph of a relation after ADSL+UA grouping and cookie-domain dictionary creation.Figure G is, A--B--C--D--BE--F--M
As seen from the figure, its connected component number is 2, namely
G1:A--B--C--D–B
G2:E–F–M
G1 and G2 represents two users, and node A, B, C, D representative of consumer 1 in G1 accesses the cookie value of different web sites; G2 interior joint E, F, M representative of consumer 2 accesses the cookie value of different web sites.By that analogy, make the cookie list of each user correspondence one oneself in order to identifying user identity.
Step 102: gather the access message that user sends;
Step 103: judge whether described access message includes cookie information, if so, performs step 104; Otherwise, terminate.
When user and server carry out mutual, in most of the cases, user access message all can with cookie information.According to the cookie-value relation table of user and respectively self-correspondingly can the user ID of unique identification user to mark access message, thus user is identified.
Step 104: resolve described cookie information, judges whether described cookie information is present in user cookie-value relation table, if so, performs step 105; Otherwise, terminate.
Step 105: the user ID corresponding with this user cookie-value relation table is added in described access message.
The method of the identification user described in the present embodiment, can will carry cookie information and the access message that is present in user cookie-value relation table of cookie information after resolving identifies, carry out user's identification by the ID adding user cookie-value relation table corresponding at described message, improve accuracy and the recognition efficiency of user's identification.
The method flow diagram of the identification user that Fig. 2 provides for second embodiment of the invention.As shown in Figure 2, the method comprises the steps:
Step 201: generate user's cookie-value relation table and each self-corresponding can the user ID of unique identification user, wherein, described user cookie-value relation table have recorded each website corresponding be used for the cookie field of identifying user identity and the incidence relation of customer identity registration information.
Described generation cookie-value relation table and each self-corresponding can the process of user ID of unique identification user can see the detailed description of Fig. 1 to this part.
Step 202: gather the access message that user sends;
Step 203: judge whether described access message includes cookie information, if so, performs step 204; Otherwise, perform step 209.
When user and server carry out mutual, in most of the cases, user access message all can with cookie information.According to the cookie-value relation table of user and respectively self-correspondingly can the user ID of unique identification user to mark access message, thus user is identified.
Step 204: resolve described cookie information, judges whether described cookie information is present in user cookie-value relation table, if so, performs step 205; Otherwise, perform step 207.
Step 205: the user ID corresponding with this user cookie-value relation table is added in described access message.
Step 206: from described interpolation user ID access message in, the cookie-value carried according to message extracts the cookie value of corresponding unique subscriber.
The cookie value of described corresponding unique subscriber comprises the user name of subscriber mailbox or No. qq etc.
Step 207: the cookie value of field value corresponding for described cookie field with the corresponding unique subscriber of described extraction mated, as the match is successful, performs step 208; Otherwise, terminate.
It is described that the match is successful, refer to that field value corresponding to described cookie field is identical with the cookie value of the corresponding unique subscriber of described extraction or meet specified conditions, then be judged to be that the match is successful, such as, the described corresponding unique subscriber cookie value extracted is the character string of 12, the field value corresponding when described cookie field mates in order with described 12 character strings, time identical, then regards as that the match is successful.When arranging certain specified conditions, as described in field value corresponding to cookie field with as described in the character string of 12 mate in order, when only having a character string different, be also judged to be that the match is successful.Certainly, the setting of described specified conditions is relevant with the accuracy rate identified.
Step 208: the user ID of cookie-value relation table corresponding for the cookie value of the corresponding unique subscriber of described extraction is added in described access message.
Step 209: obtain the information that in described access message, URL carries.
Step 210: the information of being carried by described URL is mated with the cookie value of the corresponding unique subscriber of described extraction, if the match is successful, performs step 208; Otherwise, terminate.
It is described that the match is successful, refer to that information that URL carries is identical with the cookie value of the corresponding unique subscriber of described extraction or meet specified conditions, then be judged to be that the match is successful, such as, the described corresponding unique subscriber cookie value extracted is the character string of 12, the information of carrying as URL is mated in order with the character string of described 12, then regards as that the match is successful time identical.When arranging certain specified conditions, as described in the URL information of carrying with as described in the character string of 12 mate in order, when only having a character string different, be also judged to be that the match is successful.Certainly, the setting of described specified conditions is relevant with the accuracy rate identified.
Wherein, if during the corresponding multiple ID of the cookie value of described corresponding unique subscriber, merge described ID, and the corresponding relation recorded between the cookie value of described corresponding unique subscriber and each ID, and again ID corresponding for user cookie-value relation table is added in message.Such as, same mailbox user name or the multiple ID of qq correspondence are merged, and record mailbox user name or No. qq and the corresponding relation of ID, again the ID corresponding with this user cookie-value relation table is added in message.
User described in the present embodiment knows method for distinguishing, the message of 55% can be covered, simultaneously, multiple UA corresponding to a user are merged by the cookie value of the unique single user of correspondence (as mailbox or No. qq), with the Method compare by mode identification users such as ADSL, IP, accuracy and the recognition efficiency of user's identification can be significantly improved.
Fig. 6 is the present invention's apparatus structure block diagram corresponding with the method for the identification user that the first embodiment provides.As shown in Figure 6, described device comprises,
Generation module 601, for generate user cookie-value relation table and each self-corresponding can the user ID of unique identification user, wherein, described user cookie-value relation table have recorded each website corresponding be used for the cookie field of identifying user identity and the incidence relation of customer identity registration information;
Acquisition module 602, for gathering the access message that user sends;
First judge module 603, for judging whether described message carries cookie information;
Second judge module 604, for when described access message carries cookie information, judges whether the cookie field after resolving is present in user cookie-value relation table;
First identification module 605, for resolving and obtaining cookie field corresponding to the access message cookie information of carrying and field value, and when described cookie field is present in user cookie-value relation table, the user ID corresponding with this user cookie-value relation table is added in described access message.
Fig. 7 is the present invention's apparatus structure block diagram corresponding with the method for the identification user that the second embodiment provides.As shown in Figure 7, described device comprises,
Generation module 701, for generate user cookie-value relation table and each self-corresponding can the user ID of unique identification user, wherein, described user cookie-value relation table have recorded each website corresponding be used for the cookie field of identifying user identity and the incidence relation of customer identity registration information.Described generation cookie-value relation table and each self-corresponding can the step of user ID of unique identification user can with reference to figure 1 elaborating this step.
Acquisition module 702, for gathering the access message that user sends.
First judge module 703, for judging whether described access message carries cookie information.
Second judge module 704, for when described access message carries cookie information, judges whether the cookie field after resolving is present in user cookie-value relation table;
First identification module 705, for obtaining cookie field corresponding to cookie information that the access message after parsing carries and field value, and when described cookie field is present in user cookie-value relation table, the user ID corresponding with this user cookie-value relation table is added in described access message.
Information extraction modules 706, in the access message from user ID in described interpolation, the cookie-value carried according to message extracts the cookie value of corresponding unique subscriber, and the cookie value of described corresponding unique subscriber comprises the user name of subscriber mailbox or No. qq etc.
Second identification module 707, for when described cookie field is not present in user cookie-value relation table, field value corresponding for cookie field after resolving is mated with the cookie value of the corresponding unique subscriber of described extraction, if field value corresponding to described cookie field is identical with the cookie value of the corresponding unique subscriber of described extraction, then the match is successful, the user ID of cookie-value relation table corresponding for the cookie value of the corresponding unique subscriber of described extraction added in described access message.
3rd identification module 708, for when described access message does not carry cookie information, obtain the information that in described access message, URL carries, and the information of being carried by described URL is mated with the cookie value of the corresponding unique subscriber of described extraction, if the information that described URL carries is identical with the cookie value of the corresponding unique subscriber of described extraction, then the match is successful, the user ID of cookie-value relation table corresponding for the cookie value of the corresponding unique subscriber of described extraction added in described access message.
Information merges module 709, for gathering user ID, if during the corresponding multiple ID of the cookie value of described corresponding unique subscriber, merge described ID, and the corresponding relation recorded between the cookie value of described corresponding unique subscriber and each ID, and again ID corresponding for user cookie-value relation table is added in message.
Technical scheme of the present invention, the long-term cookie being used for uniquely indicating user identity of the correspondence of each website is obtained by the mode of statistics, and the cookie field of described identifying user identity and user are carried out, generate the user ID of user's cookie-value relation table and correspondence thereof; Gather the access message that user sends; According to the user ID of user cookie-value relation table and correspondence thereof, mark is carried out to access message and carry out user's identification, thus the information such as use ADSL, IP that substitute carry out user's identification, in addition, the method can also merge multiple UA corresponding to a user by the fix information of corresponding unique single user (as mailbox or No. qq), improves accuracy and recognition efficiency that user identifies.
One of ordinary skill in the art will appreciate that all or part of step realized in above-described embodiment method is that the hardware that can carry out instruction relevant by program has come, described program can be stored in a computer read/write memory medium, described storage medium, as: ROM/RAM, magnetic disc, CD etc.
Above are only preferred embodiment of the present invention and institute's application technology principle, be anyly familiar with those skilled in the art in the technical scope that the present invention discloses, the change that can expect easily or replacement, all should be encompassed in protection scope of the present invention.

Claims (10)

1. identify a user's method, it is characterized in that, the method comprises,
Generate user's cookie-value relation table and each self-corresponding can the user ID of unique identification user, wherein, described user cookie-value relation table have recorded each website corresponding be used for the cookie field of identifying user identity and the incidence relation of customer identity registration information;
Gather the access message that user sends;
When described access message carries cookie information, resolve and obtain cookie field corresponding to described cookie information and field value;
When described cookie field is present in user cookie-value relation table, the user ID corresponding with this user cookie-value relation table is added in described access message;
Wherein, described generation user's cookie-value relation table and respectively self-correspondingly can the user ID of unique identification user to comprise,
Screening single user and website traffic reach the website of preset flow threshold value;
Obtain Zhong Ge website, described website according to single user and be used for the cookie field of identifying user identity, generate domain-cookie dictionary;
Generate the redirect graph of a relation that user accesses message;
According to described user access the redirect graph of a relation of message and described domain-cookie dictionary creation user cookie-value relation table and each self-corresponding can the user ID of unique identification user.
2. the method for identification user according to claim 1, it is characterized in that, described method also comprises, from described interpolation user ID access message in, the cookie-value carried according to message extracts the cookie value of corresponding unique subscriber.
3. the method for identification user according to claim 2, it is characterized in that, described method also comprises, when described cookie field is not present in user cookie-value relation table, the cookie value of field value corresponding for described cookie field with the corresponding unique subscriber of described extraction is mated, if field value corresponding to described cookie field is identical with the cookie value of the corresponding unique subscriber of described extraction, then the match is successful, the user ID of cookie-value relation table corresponding for the cookie value of the corresponding unique subscriber of described extraction is added in described access message.
4. the method for identification user according to claim 2, it is characterized in that, described method also comprises, when described access message does not carry cookie information, obtain the information that in described access message, URL carries, and the information of being carried by described URL is mated with the cookie value of the corresponding unique subscriber of described extraction, if the information that described URL carries is identical with the cookie value of the corresponding unique subscriber of described extraction, then the match is successful, the user ID of cookie-value relation table corresponding for the cookie value of the corresponding unique subscriber of described extraction is added in described access message.
5. the method for the identification user according to Claims 2 or 3 or 4, it is characterized in that, if during the corresponding multiple user ID of the cookie value of the corresponding unique subscriber of described extraction, then merge described multiple user ID, and the corresponding relation recorded between the cookie value of described corresponding unique subscriber and each user ID, and again user ID corresponding for this user cookie-value relation table is added in described access message.
6. the method for identification user according to claim 1, it is characterized in that, described screening single user comprises,
What gather the different web sites that ADSL terminal use is corresponding in a period of time can the cookie field value of unique identification user identity, if described cookie field value remains unchanged within a predetermined period of time, then judges that this user is as single user.
7. the method for identification user according to claim 1, is characterized in that, describedly obtains Zhong Ge website, described website according to single user and is used for the cookie field of identifying user identity, generates domain-cookie dictionary and comprises,
Gather single user and cookie value that in each website cookie, each cookie field is corresponding;
Add up the quantity of the cookie value of the corresponding unique single user of each cookie field cookie value, the frequency that cookie value when calculating this cookie field cookie value and single user one_to_one corresponding occurs;
Add up the quantity of the single user of the corresponding unique cookie value of each cookie field single user, the frequency that single user when calculating this cookie field cookie value and single user one_to_one corresponding occurs;
The frequency that the frequency occurred according to cookie value when the quantity of the quantity of single user corresponding to each cookie field, cookie value, cookie value and single user one_to_one corresponding and single user occur is filtered each website cookie, selects cookie value quantity, single user quantity and cookie value and the high cookie field of the single user one_to_one corresponding frequency of occurrences as the cookie field for identifying user identity of corresponding website;
Cookie field according to the identifying user identity of each website domain and correspondence thereof generates domain-cookie dictionary.
8. the method for identification user according to claim 1, is characterized in that, the redirect graph of a relation that described generation user accesses message comprises,
Gather the access message of all users in a period of time;
First according to the mode of ADSL+UA, described message is divided into groups, if there is the access message not carrying ADSL information, then divide into groups according to the mode of IP+UA, and sort according to the access time to often organizing message;
Set up the redirect graph of a relation that user accesses message.
9. the method for identification user according to claim 1, is characterized in that, describedly accesses the redirect graph of a relation of message and described domain-cookie dictionary according to user and sets up user cookie-value relation table and comprise,
S1: the host domain name of access message is not identical with the Main Domain (domain) of redirect graph of a relation, and cookie corresponding to two website domain names is in domain-cookie dictionary, the cookie value of described two website domain names is associated, generate cookie-value couple, as as described in cookie-value to setting up, then under this user, the cookie-value of two websites adds 1 to degree of incidence;
S2: access message according to user and generate cookie corresponding relation figure, the degree of incidence that in statistical chart, cookie-value is right;
S3: the degree of incidence threshold value according to presetting is screened, and obtains the connected component of cookie corresponding relation figure, generate user's cookie-value relation table and each self-corresponding can the user ID of unique identification user.
10. identify a user's device, it is characterized in that, described device comprises,
Generation module, for generate user cookie-value relation table and each self-corresponding can the user ID of unique identification user, wherein, described user cookie-value relation table have recorded each website corresponding be used for the cookie field of identifying user identity and the incidence relation of customer identity registration information;
Acquisition module, for gathering the access message that user sends;
First judge module, for judging whether described access message carries cookie information;
Second judge module, for when described access message carries cookie information, judges whether the cookie field after resolving is present in user cookie-value relation table;
First identification module, for resolving and obtaining cookie field corresponding to the access message cookie information of carrying and field value, and when described cookie field is present in user cookie-value relation table, the user ID corresponding with this user cookie-value relation table is added in described access message
Wherein, described generation user's cookie-value relation table and respectively self-correspondingly can the user ID of unique identification user to comprise,
Screening single user and website traffic reach the website of preset flow threshold value;
Obtain Zhong Ge website, described website according to single user and be used for the cookie field of identifying user identity, generate domain-cookie dictionary;
Generate the redirect graph of a relation that user accesses message;
According to described user access the redirect graph of a relation of message and described domain-cookie dictionary creation user cookie-value relation table and each self-corresponding can the user ID of unique identification user.
CN201310134318.4A 2013-04-17 2013-04-17 A kind of method and device identifying user Expired - Fee Related CN103237094B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310134318.4A CN103237094B (en) 2013-04-17 2013-04-17 A kind of method and device identifying user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310134318.4A CN103237094B (en) 2013-04-17 2013-04-17 A kind of method and device identifying user

Publications (2)

Publication Number Publication Date
CN103237094A CN103237094A (en) 2013-08-07
CN103237094B true CN103237094B (en) 2016-04-13

Family

ID=48885110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310134318.4A Expired - Fee Related CN103237094B (en) 2013-04-17 2013-04-17 A kind of method and device identifying user

Country Status (1)

Country Link
CN (1) CN103237094B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105100295B (en) * 2014-05-21 2019-01-15 北京秒针信息咨询有限公司 A kind of method and apparatus identifying isolated user
CN104199849A (en) * 2014-08-08 2014-12-10 亿赞普(北京)科技有限公司 Advertisement injecting method and device
CN106855864A (en) * 2015-12-09 2017-06-16 北京秒针信息咨询有限公司 A kind of method and apparatus of extraction information
CN107426133B (en) * 2016-05-23 2020-06-30 株式会社理光 Method and device for identifying user identity information
CN107659602B (en) * 2016-07-26 2020-12-29 株式会社理光 Method, device and system for associating user access records
CN106302797B (en) * 2016-08-31 2019-08-13 北京锐安科技有限公司 A kind of cookie access De-weight method and device
CN107092535B (en) * 2017-04-18 2020-06-19 上海雷腾软件股份有限公司 Method and apparatus for data storage of test interface
CN109388686A (en) * 2017-08-10 2019-02-26 北京国双科技有限公司 A kind of user identifier method and device
CN108536831A (en) * 2018-04-11 2018-09-14 上海驰骛信息科技有限公司 A kind of user's identifying system and method based on multi-parameter
CN108595657B (en) * 2018-04-28 2020-10-09 成都智信电子技术有限公司 Data table classification mapping method and device of HIS (hardware-in-the-system)
CN110995887B (en) * 2019-12-17 2021-09-24 武汉绿色网络信息服务有限责任公司 ID association method and device
CN112152873B (en) * 2020-09-02 2022-10-21 杭州安恒信息技术股份有限公司 User identification method and device, computer equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102333092A (en) * 2011-09-30 2012-01-25 北京亿赞普网络技术有限公司 Network user identification method and application server

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100910517B1 (en) * 2007-03-21 2009-07-31 엔에이치엔비즈니스플랫폼 주식회사 System and method for expanding target inventory according to borwser-login mapping

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102333092A (en) * 2011-09-30 2012-01-25 北京亿赞普网络技术有限公司 Network user identification method and application server

Also Published As

Publication number Publication date
CN103237094A (en) 2013-08-07

Similar Documents

Publication Publication Date Title
CN103237094B (en) A kind of method and device identifying user
CN108304410B (en) Method and device for detecting abnormal access page and data analysis method
CN100394727C (en) Log analyzing method and system
JP5160556B2 (en) Log file analysis method and system based on distributed computer network
CN101329687B (en) Method for positioning news web page
US10216848B2 (en) Method and system for recommending cloud websites based on terminal access statistics
CN106095979B (en) URL merging processing method and device
CN106570013B (en) Method and device for processing page access data
CN105404699A (en) Method, device and server for searching articles of finance and economics
CN101409690A (en) Method and system for obtaining internet user behaviors
CN103888490A (en) Automatic WEB client man-machine identification method
CN106951557B (en) Log association method and device and computer system applying log association method and device
WO2014180130A1 (en) Method and system for recommending contents
CN106708841B (en) The polymerization and device of website visitation path
EP2802979A2 (en) Processing store visiting data
CN108900554B (en) HTTP asset detection method, system, device and computer medium
CN108366012B (en) Social relationship establishing method and device and electronic equipment
Skopik et al. Online log data analysis with efficient machine learning: A review
CN104202418A (en) Method and system for recommending commercial content distribution network for content provider
CN105653674A (en) File management method and system of intelligent terminal
CN105989019B (en) A kind of method and device for cleaning data
CN106844553A (en) Data snooping and extending method and device based on sample data
CN105763633A (en) Association method of domain name and website visiting behavior
CN105630983A (en) Resource obtaining and optimizing device and method
CN105095404A (en) Method and apparatus for processing and recommending webpage information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160413

Termination date: 20170417