CN105721629B - User identifier matching process and device - Google Patents

User identifier matching process and device Download PDF

Info

Publication number
CN105721629B
CN105721629B CN201610172168.XA CN201610172168A CN105721629B CN 105721629 B CN105721629 B CN 105721629B CN 201610172168 A CN201610172168 A CN 201610172168A CN 105721629 B CN105721629 B CN 105721629B
Authority
CN
China
Prior art keywords
user
matched
user identifier
address
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610172168.XA
Other languages
Chinese (zh)
Other versions
CN105721629A (en
Inventor
程允胜
吴海山
周景博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201610172168.XA priority Critical patent/CN105721629B/en
Publication of CN105721629A publication Critical patent/CN105721629A/en
Application granted granted Critical
Publication of CN105721629B publication Critical patent/CN105721629B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/60Types of network addresses
    • H04L2101/69Types of network addresses using geographic information, e.g. room number
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/52Network services specially adapted for the location of the user terminal

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

This application discloses user identifier matching process and devices.One specific embodiment of the user identifier matching process includes: to analyze the user's operation information set prestored, obtain the weight of at least one localization region and each localization region that each network protocol IP address recorded in the user's operation information set is distributed, wherein, the user's operation information in the user's operation information set includes following information: user identifier, IP address, anchor point coordinate;The weight of localization region and each localization region that the IP address according to associated by user identifier is distributed obtains the location information similarity between the other user identifiers recorded in user identifier to be matched and each user's operation information set;According to location information similarity, the determining and matched other user identifiers of user identifier to be matched.The embodiment, which realizes, accurately and reliably matches user identifier.

Description

User identifier matching process and device
Technical field
This application involves field of computer technology, and in particular to user's Portrait brand technology field more particularly to user identifier Method of completing the square and device.
Background technique
With flourishing for internet, the attribute and relationship of each user are precisely analyzed by user's representation data Demand is more and more clear.User's portrait is the virtual representations of real user, a series of target being built upon on truthful datas User model.Understanding user is gone to be divided into them according to the difference of their target, behavior and viewpoint by user's investigation Then different types extracts characteristic feature in each type, assign name, photo, some demography elements, scene Deng description, it is formed user's representation data.User's portrait enables enterprise advantageously to obtain user by internet More extensive feedback information, further precisely, rapidly to analyze the important business letter such as user behavior habit, consumption habit Breath, provides enough data basis.
Currently, some Large-Scale Interconnected nets enterprise usually possesses a plurality of product line, every product line has respective user to believe Breath.In order to more accurately extract user's representation data, need to match the user identifier in multiple product lines, to determine respectively Whether a product line user identifier belongs to the same user.The existing matched method of user identifier is usually to be based solely on use Family identifies associated (Internet Protocol, the network protocol) address IP to match to user identifier, or individually User identifier is matched according to location information associated by user identifier.
It however, the IP address distribution mechanism due to each common carrier is different, and is usually to be randomly assigned, therefore individually It is lower come the reliability for carrying out matched scheme to user identifier based on IP address.Simultaneously as user takes in access internet It would generally select to shield unnecessary Location Request when business, therefore the location information of user usually has missing, to be difficult to root Accurately user identifier is matched according to the location information of excalation.
Summary of the invention
The purpose of the application is to propose a kind of user identifier matching process and device, to solve background section above The technical issues of mentioning.
In a first aspect, this application provides a kind of user identifier matching process, which comprises grasped to the user prestored It is analyzed as information aggregate, obtains what each network protocol IP address recorded in the user's operation information set was distributed The weight of at least one localization region and each localization region, wherein the user's operation in the user's operation information set Information includes following information: user identifier, IP address, anchor point coordinate;The IP address according to associated by user identifier is distributed Localization region and each localization region weight, obtain user identifier to be matched and each user's operation information set Location information similarity between other user identifiers of middle record;According to location information similarity, it is determining with it is described to be matched The matched other user identifiers of user identifier.
In some embodiments, the described pair of user's operation information set prestored is analyzed, and obtains the user's operation At least one localization region that each network protocol IP address recorded in information aggregate is distributed and each localization region Weight, comprising: obtain anchor point coordinate set associated by each IP address recorded in the user's operation information set;Needle To each IP address, clustering is carried out to anchor point coordinate set associated by the IP address, is obtained corresponding At least one cluster, the localization region being distributed as the IP;For each IP address, determine that the IP address is divided The weight of each localization region of cloth.
In some embodiments, described to be directed to each IP address, determine each positioning that the IP address is distributed The weight in region, comprising: the localization region number being distributed is greater than the anchor point in preset quantity threshold value or localization region The IP address that the distance average of coordinate and center point coordinate is greater than pre-determined distance threshold value is deleted;For remaining each IP Address determines the weight for each localization region that the IP address is distributed.
In some embodiments, the weight for each localization region that the determination IP address is distributed, comprising: according to The number and range for the anchor point coordinate in each localization region that the IP address is distributed determine the first of each localization region Beginning weight;The center point coordinate for each localization region that IP address associated by user identifier is distributed is as user identifier pair The center point coordinate answered, to the corresponding center point coordinate of user identifier recorded in the user's operation information set according to geography Layout carries out gridding, generates at least two grids;Obtain each user identifier recorded in the user's operation information set The sum of the initial weight of localization region where the corresponding center point coordinate in each grid, as each grid and often The corresponding frequency of a user identifier, and the sum of the initial weight of localization region where obtaining the center point coordinate in each grid, As the corresponding total user's frequency of each grid;Based on the frequency, the power of each localization region is calculated by TF-IDF algorithm Weight.
In some embodiments, the method also includes: calculate the user identifier to be matched and each other users mark IP address similarity between knowledge;And it is described according to location information similarity, determination is matched with the user identifier to be matched Other user identifiers, comprising: according to the location information phase between the user identifier to be matched and each other user identifiers Like degree and IP address similarity, the determining and matched other user identifiers of user identifier to be matched.
In some embodiments, the positioning according between the user identifier to be matched and each other user identifiers Information similarity and IP address similarity, the determining and matched other user identifiers of user identifier to be matched, comprising: obtain The user identifier to be matched and the corresponding characteristic information of each other user identifiers, the characteristic information include: it is described to With IP address similarity, the location information similarity between user identifier and other user identifiers;Based on the user to be matched Mark and the corresponding characteristic information of each other user identifiers are obtained user to be matched and are marked by order models trained in advance Know and each other matched probability of user identifier;Determine that the corresponding probability is greater than other user identifiers of scheduled threshold value It is matched with the user identifier to be matched.
In some embodiments, the user's operation data information in the user's operation information set further include: terminal type Number information, operation system information;And the characteristic information further includes at least one of following information: the user to be matched Identical IP address quantity, corresponding center point coordinate between mark and other user identifiers are overlapped quantity, the use to be matched Terminal type information, operation system information associated by family mark and other user identifiers.
In some embodiments, the user identifier recorded in the user's operation information set include the first user identifier and Second user mark, the user identifier to be matched and each other user identifiers are belonging respectively to the first user identifier and second and use Family mark.
In some embodiments, it is recorded in obtaining user identifier to be matched and each user's operation information set After location information similarity between other user identifiers, the method also includes: according to the user identifier to be matched Between location information similarity sequence from big to small, the second user mark recorded in the user's operation information set In successively choose predetermined quantity second user mark, obtain candidate second user logo collection;And it is described according to positioning letter Cease similarity, the determining and matched other user identifiers of user identifier to be matched, comprising: mark according to the user to be matched The location information similarity between each second user mark in knowledge and the candidate second user logo collection, determining and institute State the matched second user mark of the first user identifier to be matched.
In some embodiments, according in the user identifier to be matched and the candidate second user logo collection Location information similarity between each second user mark, it is determining to be used with first user identifier matched second to be matched Before the mark of family, the method also includes: for each second user mark in the candidate second user logo collection, obtain The second user is taken to identify the location information similarity between each first user identifier;According to the second user mark The sequence of location information similarity from big to small between knowledge successively chooses predetermined quantity the first user identifier, obtains candidate the One user identifier set;If the user identifier to be matched, will be described not in the candidate first user identifier set Second user mark is deleted from the candidate second user logo collection.
Second aspect, this application provides a kind of user identifier coalignment, described device includes: that location information obtains list Member, for analyzing the user's operation information set prestored, obtain recorded in the user's operation information set it is each The weight of at least one localization region and each localization region that network protocol IP address is distributed, wherein the user behaviour Making the user's operation information in information aggregate includes following information: user identifier, IP address, anchor point coordinate;Location information phase Like degree acquiring unit, the localization region being distributed for the IP address according to associated by user identifier and each localization region Weight obtains and determines between the other user identifiers recorded in user identifier to be matched and each user's operation information set Position information similarity;Matching unit, for according to location information similarity, it is determining with the user identifier to be matched it is matched its Its user identifier.
In some embodiments, the location information acquiring unit includes: that coordinate set obtains subelement, for obtaining State anchor point coordinate set associated by each IP address recorded in user's operation information set;Subelement is clustered, needle is used for To each IP address, clustering is carried out to anchor point coordinate set associated by the IP address, is obtained corresponding At least one cluster, the localization region being distributed as the IP;Weight determines subelement, for for each IP Location determines the weight for each localization region that the IP address is distributed.
In some embodiments, the weight determines that subelement includes: that extensive IP removes module, for determining what is be distributed Position areal is greater than the distance average of preset quantity threshold value or anchor point coordinate and center point coordinate in localization region IP address greater than pre-determined distance threshold value is deleted;Weight determination module determines institute for being directed to remaining each IP address State the weight for each localization region that IP address is distributed.
In some embodiments, the weight determines that subelement includes: initial weight determining module, for according to the IP The number and range for the anchor point coordinate in each localization region that address is distributed, determine the initial power of each localization region Weight;The center point coordinate of gridding module, each localization region for IP address associated by user identifier to be distributed is made For the corresponding center point coordinate of user identifier, to the corresponding central point of user identifier recorded in the user's operation information set Coordinate carries out gridding according to geographic layout, generates at least two grids;The frequency obtains module, for obtaining the user's operation Localization region where the corresponding center point coordinate in each grid of each user identifier recorded in information aggregate The sum of initial weight as each grid frequency corresponding with each user identifier, and obtains the seat of the central point in each grid The sum of the initial weight of localization region where mark, as the corresponding total user's frequency of each grid;Weight calculation module is used for base In the frequency, the weight of each cluster is calculated by TF-IDF algorithm.
In some embodiments, described device further include: IP similarity calculated, for calculating the user to be matched IP address similarity between mark and each other user identifiers;And the matching unit is also used to according to described to be matched Location information similarity and IP address similarity between user identifier and each other user identifiers, it is determining with it is described to be matched The matched other user identifiers of user identifier.
In some embodiments, the matching unit includes: that characteristic information obtains subelement, described to be matched for obtaining User identifier and the corresponding characteristic information of each other user identifiers, the characteristic information include: the user identifier to be matched IP address similarity, location information similarity between other user identifiers;Sorting subunit, for based on described to be matched User identifier and the corresponding characteristic information of each other user identifiers obtain use to be matched by order models trained in advance Family mark and each other matched probability of user identifier;Coupling subelement, for determining that it is predetermined that the corresponding probability is greater than Other user identifiers of threshold value matched with the user identifier to be matched.
In some embodiments, the user's operation data information in the user's operation information set further include: terminal type Number information, operation system information;And the characteristic information further includes at least one of following information: the user to be matched Identical IP address quantity, corresponding center point coordinate between mark and other user identifiers are overlapped quantity, the use to be matched Terminal type information, operation system information associated by family mark and other user identifiers.
In some embodiments, the user identifier recorded in the user's operation information set include the first user identifier and Second user mark, the user identifier to be matched and each other user identifiers are belonging respectively to the first user identifier and second and use Family mark.
In some embodiments, described device further include: the first selection unit, for being obtained in the location information similarity Unit is taken to obtain between the other user identifiers recorded in user identifier to be matched and each user's operation information set After location information similarity, according to from big to small suitable of the location information similarity between the user identifier to be matched Sequence, recorded in the user's operation information set second user mark in successively choose predetermined quantity second user mark Know, obtains candidate second user logo collection;And the matching unit is also used to according to the user identifier to be matched and institute State the location information similarity between each second user mark in candidate second user logo collection, it is determining with it is described to With the matched second user mark of the first user identifier.
In some embodiments, the location information similarity acquiring unit is also used in the matching unit according to Location information phase between user identifier to be matched and each second user mark in the candidate second user logo collection Like degree, before the determining and described matched second user mark of first user identifier to be matched, for the candidate second user Each second user mark in logo collection, obtains the positioning between the second user mark and each first user identifier Information similarity;And described device further include: the second selection unit, for determining according between second user mark The sequence of position information similarity from big to small successively chooses the first user identifier of predetermined quantity, obtains candidate first user identifier Set;Candidate filter element is used in the matching unit according to the user identifier to be matched and the candidate second user The location information similarity between each second user mark in logo collection, determining and first user identifier to be matched Before matched second user mark, when the user identifier to be matched is not in the candidate first user identifier set, Second user mark is deleted from the candidate second user logo collection.
User identifier matching process and device provided by the present application, it is each by being recorded in acquisition user's operation information set The weight of at least one localization region and each localization region that a network protocol IP address is distributed is supplemented and perfect is used Family identifies corresponding location information;And the localization region that is distributed of the IP address according to associated by user identifier and each positioning The weight in region, obtain other user identifiers for being recorded in user identifier to be matched and each user's operation information set it Between location information similarity, it is determining with the matched other users of user identifier to be matched according to location information similarity Mark, realizes and accurately and reliably matches to user identifier.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow chart according to one embodiment of the user identifier matching process of the application;
Fig. 3 A is according to the exemplary of some data processings of one embodiment of the user identifier matching process of the application Schematic diagram;
Fig. 3 B is the example according to the other data processing of one embodiment of the user identifier matching process of the application Property schematic diagram;
Fig. 4 is the matching effect comparison diagram according to one embodiment of the user identifier matching process of the application;
Fig. 5 is the flow chart according to another embodiment of the user identifier matching process of the application;
Fig. 6 is the matching effect comparison diagram according to another embodiment of the user identifier matching process of the application;
Fig. 7 is the structural schematic diagram according to one embodiment of the user identifier coalignment of the application;
Fig. 8 is adapted for the structural schematic diagram for the computer system for realizing the server of the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using the implementation of the user identifier matching process or user identifier matching process device of the application The exemplary system architecture 100 of example.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various client applications, such as browser application, searching class can be installed on terminal device 101,102,103 Using, shopping class application, social platform software etc..
Terminal device 101,102,103 can be the various electronic equipments for supporting browser application, searching class application, including But be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 105 can be to provide the server of various services, such as to the browsing on terminal device 101,102,103 The database server or Cloud Server that the offers such as device application, searching class application are supported.Server can be to the user received Operation information such as is stored, is integrated, being analyzed at the processing, to match to user identifier.
It should be noted that user identifier matching process provided by the embodiments of the present application is usually executed by server 105.Phase Ying Di, user identifier coalignment are generally disposed in server 105.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, Fig. 2 shows the processes according to one embodiment of the user identifier matching process of the application 200。
As shown in Fig. 2, the user identifier matching process of the present embodiment the following steps are included:
Step 201, the user's operation information set prestored is analyzed, obtains and remembers in above-mentioned user's operation information set The weight of at least one localization region and each localization region that each IP address of record is distributed.
Wherein, the user's operation information in above-mentioned user's operation information set includes following information: user identifier, IP Location, anchor point coordinate.In the present embodiment, the electronic equipment of user identifier matching process operation thereon is (such as shown in FIG. 1 Server) can be from locally or remotely acquisition user's operation information set, and it is directed to the user's operation information set got Each IP address of middle record, obtain in above-mentioned user's operation information set with the associated anchor point coordinate of IP address.Then, It can will be divided into the associated multiple anchor point coordinates of IP address according to the distance between associated anchor point coordinate of IP address At least one localization region, each region include at least one anchor point coordinate.It later, can be according to the positioning coordinate in region Number, range and/or user residence time in the zone, determine the weight of each localization region, or pass through other power Re-computation algorithm (such as TF-IDF algorithm) determines the weight of each localization region.
In some optional implementations of the present embodiment, the available above-mentioned user's operation information set of electronic equipment Anchor point coordinate set associated by each IP address of middle record;For each above-mentioned IP address, above-mentioned IP address is closed The anchor point coordinate set of connection carries out clustering, obtains at least one corresponding cluster, determines as what above-mentioned IP was distributed Position region;For each above-mentioned IP address, the weight for each localization region that above-mentioned IP address is distributed is determined.Wherein, electronics Equipment can carry out clustering to anchor point coordinate set associated by above-mentioned IP address by K-means algorithm, obtain institute At least one corresponding cluster.
Step 202, the IP address according to associated by user identifier is distributed localization region and each localization region Weight obtains and determines between the other user identifiers recorded in user identifier to be matched and each above-mentioned user's operation information set Position information similarity.
In the present embodiment, for each to user identifier to be matched and other user identifiers, electronic equipment can basis The weight of localization region and localization region that IP address associated by two user identifiers is distributed generates two vectors, leads to Such as cosine similarity algorithm, Jaccard similarity algorithm or other similarity algorithms are crossed to calculate two vector similarities, As the location information similarity between two user identifiers.
Step 203, determining to be marked with the above-mentioned matched other users of user identifier to be matched according to location information similarity Know.
In the present embodiment, it is pre- to can determine that the location information similarity between user identifier to be matched is greater than for electronic equipment Determine similarity threshold other user identifiers be and the above-mentioned matched other user identifiers of user identifier to be matched.In addition, electronics Equipment can also be according to location information similarity and some other characteristic information (for example, terminal type associated by user identifier Number information, operation system information), user identifier to be matched and each other use are calculated by order models trained in advance Matched probability between the mark of family, and determine corresponding above-mentioned probability be greater than predetermined threshold other user identifiers and it is above-mentioned to It is matched with user identifier.
In some optional implementations of the present embodiment, it is directed to each above-mentioned IP address in step 201, determines IP The processing of the weight for each localization region that address is distributed may include: that the localization region number that will be distributed is greater than present count The distance average for measuring threshold value (such as 5) or anchor point coordinate and center point coordinate in localization region is greater than pre-determined distance The IP address of threshold value (such as 3000 meters) is deleted;For remaining each above-mentioned IP address, determine what above-mentioned IP address was distributed The weight of each localization region.To be not fixed using crowd, wide coverage is simultaneously while overlay area in geographical distribution Unfixed IP address (such as mobile cellular IP address) removal, only analyzes the opposite IP address using user for having comparison fixed (such as outlet IP address of family, company etc.) is analyzed, and the matched accuracy of user identifier is improved.
In addition, the above-mentioned IP address of determination in step 201 is distributed in some optional implementations of the present embodiment The processing of weight of each localization region may include: that anchor point in each localization region being distributed according to IP address is sat Target number and range determine the initial weight of each localization region;IP address associated by user identifier is distributed each The center point coordinate of a localization region is as the corresponding center point coordinate of user identifier, to remembering in above-mentioned user's operation information set The corresponding center point coordinate of the user identifier of record carries out gridding according to geographic layout, generates at least two grids;It obtains above-mentioned It is fixed where the corresponding center point coordinate in each above-mentioned grid of each user identifier recorded in user's operation information set The sum of the initial weight in position region, as each grid frequency corresponding with each user identifier, and obtains in each grid The sum of the initial weight of localization region where center point coordinate, as the corresponding total user's frequency of each grid;Based on above-mentioned frequency It is secondary, it is calculated by TF-IDF (Term Frequency-Inverse Document Frequency, the reverse file word frequency of word frequency -) Method calculates the weight of each localization region.
Wherein, the range of the anchor point coordinate in localization region can use the anchor point coordinate and central point in localization region The distance average of coordinate indicates that electronic equipment can pass through the number and IP address of the anchor point coordinate in a localization region The ratio of the number of the anchor point coordinate in all localization regions distributed and above range determine the power of localization region Weight, wherein above-mentioned bigger weight is higher, the smaller weight of range is higher.TF-IDF algorithm be usually to assess a words for The significance level of one file set or a copy of it file in a corpus, the main thought of the algorithm is:: if some The frequency TF high that word or phrase occur in an article, and seldom occur in other articles, then it is assumed that this word is short Language has good class discrimination ability, is adapted to classify.In the present embodiment, electronic equipment can be using grid as word Language calculates the weight of each localization region by TF-IDF algorithm using user identifier as file, therefore in the present embodiment, The weight of distributed areas where the grid that the frequency corresponding with the user identifier is higher, corresponding total user's frequency is lower is more It is high.
This implementation, the number for the anchor point coordinate in each localization region being distributed by elder generation according to IP address and Range determines the initial weight of each localization region, is then based on the initial weight, calculates each positioning by TF-IDF algorithm The weight in region is determined more reasonably to weigh to comprehensively consider the independence of distributed areas, liveness and scope of activities Weight.
In some optional implementations of the present embodiment, the user identifier that is recorded in above-mentioned user's operation information set It is identified including the first user identifier and second user, above-mentioned user identifier to be matched and above-mentioned each other user identifiers belong to respectively It is identified in the first user identifier and second user.Wherein it is possible to which electronic equipment can distinguish the first user mark by flag bit Know and second user mark, the first user identifier and second user mark can be distributed be two different product lines user mark User identifier when knowing, such as being scanned in webpage by browser, and by searching for use when scanning for using APP Family mark.If user identifier to be matched is the first user identifier, when matching user identifier to be matched, only second user is identified The processing such as similarity calculation is carried out with user identifier to be matched, to reduce the data volume of calculating, accelerates matching efficiency.
Based on a upper implementation, in some optional implementations of the present embodiment, after step 202, this reality The user identifier matching process for applying example can also include: according to the location information similarity between user identifier to be matched from big To small sequence, predetermined quantity (example is successively chosen in the second user that records in above-mentioned user's operation information set mark Such as 50) second user mark, obtain candidate second user logo collection.And step 203 may include: according to it is above-mentioned to The location information matched between user identifier and each second user mark in above-mentioned candidate second user logo collection is similar Degree, it is determining to be identified with the above-mentioned matched second user of first user identifier to be matched.By the implementation, reduce step 203 calculation amount improves the matched efficiency of user identifier.
Based on a upper implementation, in some optional implementations of the present embodiment, before step 203, this reality The user identifier matching process for applying example can also include: for each second user in above-mentioned candidate second user logo collection Mark obtains the location information similarity between above-mentioned second user mark and each first user identifier;According to above-mentioned The sequence of location information similarity from big to small between two user identifiers successively chooses predetermined quantity (such as 50) first User identifier obtains candidate first user identifier set;If above-mentioned user identifier to be matched is not in above-mentioned candidate first user In logo collection, then above-mentioned second user is identified and deleted from above-mentioned candidate second user logo collection.Pass through the realization side Formula ensure that the other user identifiers for the processing for participating in step 203 must be the location information between user identifier to be matched Similarity comes the preceding predetermined ranking of user identifier to be matched, and between user identifier to be matched and other user identifiers Location information similarity comes other user identifiers of the preceding predetermined ranking of other user identifiers.To reduce extreme portions Non-associated users mark, reduces noise data, promotes matching efficiency, accuracy rate and recall rate.
Illustrate some example data treatment processes of the present embodiment below with reference to Fig. 3 A and Fig. 3 B.The use of the present embodiment Family identifies matching process, can obtain the seat of anchor point associated by each IP address recorded in user's operation information set first Mark set, wherein anchor point coordinate set associated by some IP address can be as shown in Figure 3A, the dot-hatched table in Fig. 3 A Show the anchor point coordinate in above-mentioned anchor point coordinate set;Later, above-mentioned user's operation is obtained by parsers such as clusters to believe The weight of at least one localization region and each localization region that each IP address recorded in breath set is distributed, as a result may be used With as shown in Figure 3B, 4 each points marked in Fig. 3 B can indicate the centre coordinate point for the localization region that the IP address is distributed, mark The other numerical value of note point indicate for localization region weight;Then, so that it may by localization region shown in Fig. 3 B and weight, The location information similarity between user identifier is carried out, and according to the determination of location information similarity and above-mentioned user identifier to be matched Matched other user identifiers.
Fig. 4 is the matching effect comparison diagram according to one embodiment of the user identifier matching process of the application.Wherein, divide It does not provide and is based solely on IP address in the prior art to carry out matching and user identifier matching process through this embodiment and come Carry out the accuracy rate and recall rate of matched matching result.From fig. 4, it can be seen that user identifier match party through this embodiment The accuracy rate and recall rate of method have certain promotion.
User identifier matching process provided in this embodiment, by obtaining each net recorded in user's operation information set The weight of at least one localization region and each localization region that network Protocol IP address is distributed, is supplemented and perfect user's mark Know corresponding location information;And localization region and each localization region that the IP address according to associated by user identifier is distributed Weight, obtain between the other user identifiers recorded in user identifier to be matched and each above-mentioned user's operation information set Location information similarity, according to location information similarity, determination and the above-mentioned matched other user identifiers of user identifier to be matched, It realizes and accurately and reliably user identifier is matched.
The process of another embodiment of the user identifier matching process according to the application is shown with continued reference to Fig. 5, Fig. 5 500。
As shown in figure 5, the user identifier matching process of the present embodiment the following steps are included:
Step 501, the user's operation information set prestored is analyzed, obtains and remembers in above-mentioned user's operation information set The weight of at least one localization region and each localization region that each network protocol IP address of record is distributed.
Wherein, the user's operation information in above-mentioned user's operation information set includes following information: user identifier, IP Location, anchor point coordinate.
In the present embodiment, the specific processing of step 501 can refer to the related description of step 201 in Fig. 2 corresponding embodiment, Details are not described herein.
Step 502, the IP address according to associated by user identifier is distributed localization region and each localization region Weight obtains and determines between the other user identifiers recorded in user identifier to be matched and each above-mentioned user's operation information set Position information similarity.
In the present embodiment, the specific processing of step 502 can refer to the related description of step 202 in Fig. 2 corresponding embodiment, Details are not described herein.
Step 503, the IP address similarity between above-mentioned user identifier to be matched and each other user identifiers is calculated.
In the present embodiment, electronic equipment (such as the service shown in FIG. 1 of user identifier matching process operation thereon Device) at least one IP address associated by each user can be obtained first, then pass through TF-IDF algorithm or other weight calculations Method calculates the weight of each IP address, and the weight of IP address is finally remembered according to associated IP address, calculates user to be matched IP address similarity between mark and each other user identifiers.
It should be noted that step 503 can be performed simultaneously with step 501 or step 502, can also step 501 it Preceding execution, or execute after step 502, the present embodiment does not execute sequence to it and limits.
Step 504, according to the location information similarity between above-mentioned user identifier to be matched and each other user identifiers With IP address similarity, the determining and above-mentioned matched other user identifiers of user identifier to be matched.
In the present embodiment, electronic equipment can be according to the power of preset location information similarity and IP address similarity Weight, to calculate the synthesis pertinence between the matched each other user identifiers of user identifier to be matched, and according to comprehensive related Degree determination and the above-mentioned matched other user identifiers of user identifier to be matched.Wherein, above-mentioned location information similarity and IP address The weight of similarity can rule of thumb be preset with actual conditions by manually.
In some optional implementations of the present embodiment, step 504 may include: to obtain above-mentioned user's mark to be matched Know characteristic information corresponding with each other user identifiers, features described above information includes: above-mentioned user identifier to be matched and other IP address similarity, location information similarity between user identifier;Based on above-mentioned user identifier to be matched and each other use Family identifies corresponding characteristic information, by order models trained in advance, obtains user identifier to be matched and each other users Identify matched probability;Determine other user identifiers of the corresponding above-mentioned probability greater than scheduled threshold value and above-mentioned user to be matched Mark matching.Wherein, above-mentioned order models can be based on the training sample set with mark, pass through the LTR such as Pairwise (Learning To Rank, study sequence) method training obtains.Wherein it is possible to by known two for belonging to same user The corresponding features described above information of user identifier as positive sample, by known two user identifiers of incoherent user it is corresponding on Characteristic information is stated as negative sample.This implementation is based on user identifier to be matched and other by order models trained in advance IP address similarity, location information similarity between user identifier calculate other user identifiers and user identifier to be matched The probability matched calculates between the matched each other user identifiers of user identifier to be matched compared to by the way that weight is manually arranged Synthesis pertinence, more accurately and reliably.
Based on a upper implementation, in some optional implementations of the present embodiment, above-mentioned user's operation information collection User's operation data information in conjunction can also include: terminal type information, operation system information.And features described above information It further include at least one of following information: the identical IP address number between above-mentioned user identifier to be matched and other user identifiers Amount, corresponding center point coordinate are overlapped quantity, terminal models associated by above-mentioned user identifier to be matched and other user identifiers Information, operation system information.The implementation is obtaining user identifier to be matched and each matched probability of other user identifiers When, it is contemplated that more influence factors, to keep matched accuracy higher.
Fig. 6 is according to the matching effect comparison diagram of another embodiment of the user identifier matching process of the application.Wherein, divide It does not provide and is matched in the prior art separately through IP address, by the user identifier match party of Fig. 2 corresponding embodiment Method and user identifier matching process through this embodiment carry out the accuracy rate and recall rate of matched matching result.From The accuracy rate that Fig. 6 can be seen that user identifier matching process through this embodiment is compared with recall rate to be implemented by the way that Fig. 2 is corresponding The user identifier matching process of example has certain promotion again.
As can be known from Fig. 5 and Fig. 6, the user identifier match party compared with the corresponding embodiment of Fig. 2, in the present embodiment The process 500 of method increases IP address similarity as the matched reference factor of user identifier.The side of the present embodiment description as a result, Case can be with reference to more fully influence factor, to improve matching accuracy.
With further reference to Fig. 7, as the realization to method shown in above-mentioned each figure, this application provides a kind of user identifiers One embodiment with device, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically apply In various electronic equipments.
As shown in fig. 7, the user identifier coalignment 700 of the present embodiment includes: location information acquiring unit 701, positioning Information similarity acquiring unit 702 and matching unit 703.Wherein, location information acquiring unit 701 is used for the user prestored Operation information set is analyzed, and is obtained each network protocol IP address recorded in above-mentioned user's operation information set and is distributed At least one localization region and each localization region weight, wherein in above-mentioned user's operation information set user behaviour It include following information as information: user identifier, IP address, anchor point coordinate;Location information similarity acquiring unit 702 is used for root The weight of the localization region and each localization region that are distributed according to IP address associated by user identifier, obtains user to be matched The location information similarity between other user identifiers recorded in mark and each above-mentioned user's operation information set;Matching is single Member 703 is for according to location information similarity, the determining and above-mentioned matched other user identifiers of user identifier to be matched.
In the present embodiment, location information acquiring unit 701, location information similarity acquiring unit 702 and matching are single The specific processing of member 703 can refer in Fig. 2 corresponding embodiment respectively, the processing of step 201, step 202 and step 203, This is repeated no more.
In some optional implementations of the present embodiment, location information acquiring unit 701 may include: coordinate set Subelement 7011 is obtained, is sat for obtaining anchor point associated by each IP address recorded in above-mentioned user's operation information set Mark set;Subelement 7012 is clustered, for being directed to each above-mentioned IP address, to anchor point coordinate set associated by above-mentioned IP address It closes and carries out clustering, obtain at least one corresponding cluster, the localization region being distributed as above-mentioned IP;Weight determines son Unit 7013 determines the weight for each localization region that above-mentioned IP address is distributed for being directed to each above-mentioned IP address.
Wherein, cluster subelement 7012 can be by K-means algorithm to anchor point coordinate associated by above-mentioned IP address Set carries out clustering, obtains at least one corresponding cluster.
In some optional implementations of the present embodiment, weight determines that subelement 7013 may include: extensive IP clear Except module (not shown), the localization region number for will be distributed is greater than in preset quantity threshold value or localization region The IP address that the distance average of anchor point coordinate and center point coordinate is greater than pre-determined distance threshold value is deleted;Weight determination module (not shown) determines each localization region that above-mentioned IP address is distributed for being directed to remaining each above-mentioned IP address Weight.Extensive IP removes the specific processing of module and weight determination module and its brought technical effect can refer to Fig. 2 The associated description of corresponding implementation in corresponding embodiment, details are not described herein.
In addition, weight determines that subelement 7013 may include: initial in some optional implementations of the present embodiment Weight determination module (not shown), the anchor point coordinate in each localization region for being distributed according to above-mentioned IP address Number and range, determine the initial weight of each localization region;Gridding module (not shown) is used for user identifier The center point coordinate for each localization region that associated IP address is distributed is right as the corresponding center point coordinate of user identifier The corresponding center point coordinate of user identifier recorded in above-mentioned user's operation information set carries out gridding according to geographic layout, raw At at least two grids;The frequency obtains module (not shown), records in above-mentioned user's operation information set for obtaining The sum of the initial weight of localization region where the corresponding center point coordinate in each above-mentioned grid of each user identifier, as Each grid frequency corresponding with each user identifier, and obtain the first of the place of the center point coordinate in each grid localization region The sum of beginning weight, as the corresponding total user's frequency of each grid;Weight calculation module (not shown), for based on above-mentioned The frequency calculates the weight of each cluster by TF-IDF algorithm.Wherein, initial weight determining module, gridding module, the frequency obtain The specific processing of modulus block and weight calculation module and its brought technology effect can refer to corresponding real in Fig. 2 corresponding embodiment The associated description of existing mode, details are not described herein.
In some optional implementations of the present embodiment, the user identifier coalignment of the present embodiment can also be wrapped It includes: IP similarity calculated 704, for calculating the IP between above-mentioned user identifier to be matched and each other user identifiers Location similarity.And matching unit 703 can be also used for according to above-mentioned user identifier to be matched and each other user identifiers it Between location information similarity and IP address similarity, it is determining with the above-mentioned matched other user identifiers of user identifier to be matched. The specific processing of the implementation and its brought technology effect can refer to step 503 and step 504 in Fig. 5 corresponding embodiment Associated description, details are not described herein.
Based on a upper implementation, in some optional implementations of the present embodiment, matching unit 703 be can wrap Include: characteristic information obtains subelement 7031, corresponding for obtaining above-mentioned user identifier to be matched and each other user identifiers Characteristic information, features described above information include: IP address similarity between above-mentioned user identifier to be matched and other user identifiers, Location information similarity;Sorting subunit 7032, for being based on above-mentioned user identifier to be matched and each other user identifiers pair The characteristic information answered obtains user identifier to be matched and each other user identifier matchings by order models trained in advance Probability;Coupling subelement 7033, for determine corresponding above-mentioned probability be greater than scheduled threshold value other user identifiers with State user identifier matching to be matched.Characteristic information obtains subelement 7031, sorting subunit 7032 and coupling subelement 7033 Specific processing and its brought technology effect can refer to the associated description of corresponding implementation in Fig. 5 corresponding embodiment, herein It repeats no more.
Based on a upper implementation, in some optional implementations of the present embodiment, above-mentioned user's operation information collection User's operation data information in conjunction can also include: terminal type information, operation system information.And features described above information It further include at least one of following information: the identical IP address number between above-mentioned user identifier to be matched and other user identifiers Amount, corresponding center point coordinate are overlapped quantity, terminal models associated by above-mentioned user identifier to be matched and other user identifiers Information, operation system information.The specific processing of the implementation and its brought technology effect can refer in Fig. 5 corresponding embodiment The associated description of corresponding implementation, details are not described herein.
In some optional implementations of the present embodiment, the user identifier that is recorded in above-mentioned user's operation information set It is identified including the first user identifier and second user, above-mentioned user identifier to be matched and above-mentioned each other user identifiers belong to respectively It is identified in the first user identifier and second user.The specific processing of the implementation and its brought technology effect can refer to Fig. 2 The associated description of corresponding implementation in corresponding embodiment, details are not described herein.
Based on a upper implementation, in some optional implementations of the present embodiment, the user identifier of the present embodiment Coalignment can also include: the first selection unit (not shown), in above-mentioned location information similarity acquiring unit Obtain the positioning letter between the other user identifiers recorded in user identifier to be matched and each above-mentioned user's operation information set After ceasing similarity, according to the sequence of the location information similarity between above-mentioned user identifier to be matched from big to small, upper It states and successively chooses predetermined quantity second user mark in the second user mark recorded in user's operation information set, waited Select second user logo collection.And matching unit 703 can be also used for according to above-mentioned user identifier to be matched and above-mentioned candidate The location information similarity between each second user mark in second user logo collection, determines and above-mentioned to be matched first The matched second user mark of user identifier.The specific processing of the implementation and its brought technology effect can refer to Fig. 2 pairs The associated description of corresponding implementation in embodiment is answered, details are not described herein.
Based on a upper implementation, in some optional implementations of the present embodiment, location information similarity is obtained Unit 702 can be also used in above-mentioned matching unit according to above-mentioned user identifier to be matched and above-mentioned candidate second user identification sets The location information similarity between each second user mark in conjunction, determination are matched with above-mentioned first user identifier to be matched It before second user mark, is identified for each second user in above-mentioned candidate second user logo collection, obtains above-mentioned the Location information similarity between two user identifiers and each first user identifier.And the user identifier matching of the present embodiment Device can also include: the second selection unit (not shown), for according to the positioning between above-mentioned second user mark The sequence of information similarity from big to small successively chooses the first user identifier of predetermined quantity, obtains candidate first user identifier collection It closes;Candidate filter element (not shown) is used in above-mentioned matching unit according to above-mentioned user identifier to be matched and above-mentioned time Select each second user in second user logo collection identify between location information similarity, it is determining with above-mentioned to be matched the Before the matched second user mark of one user identifier, in above-mentioned user identifier to be matched not in above-mentioned candidate first user identifier When in set, above-mentioned second user mark is deleted from above-mentioned candidate second user logo collection.The implementation it is specific Processing and its brought technology effect can refer to the associated description of corresponding implementation in Fig. 2 corresponding embodiment, no longer superfluous herein It states.
User identifier match party device provided in this embodiment obtains user's operation by location information acquiring unit 701 At least one localization region that each network protocol IP address recorded in information aggregate is distributed and each localization region Weight, is supplemented and the perfect corresponding location information of user identifier;And by location information similarity acquiring unit 702 according to The weight of localization region and each localization region that mark associated IP address in family is distributed, obtains user identifier to be matched With the location information similarity between other user identifiers for being recorded in each above-mentioned user's operation information set, and by With unit 703 according to location information similarity, the determining and above-mentioned matched other user identifiers of user identifier to be matched are realized Accurately and reliably user identifier is matched.
Below with reference to Fig. 8, it illustrates the computer systems 600 for the server for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.
As shown in figure 8, computer system 800 includes central processing unit (CPU) 801, it can be read-only according to being stored in Program in memory (ROM) 802 or be loaded into the program in random access storage device (RAM) 803 from storage section 808 and Execute various movements appropriate and processing.In RAM 803, also it is stored with system 800 and operates required various programs and data. CPU 801, ROM 802 and RAM 803 are connected with each other by bus 804.Input/output (I/O) interface 805 is also connected to always Line 804.
I/O interface 805 is connected to lower component: the storage section 806 including hard disk etc.;And including such as LAN card, tune The communications portion 807 of the network interface card of modulator-demodulator etc..Communications portion 807 executes mailing address via the network of such as internet Reason.Driver 808 is also connected to I/O interface 805 as needed.Detachable media 809, such as disk, CD, magneto-optic disk, half Conductor memory etc. is mounted on as needed on driver 808, in order to as needed from the computer program read thereon It is mounted into storage section 806.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be tangibly embodied in machine readable Computer program on medium, the computer program include the program code for method shown in execution flow chart.At this In the embodiment of sample, which can be downloaded and installed from network by communications portion 807, and/or from removable Medium 809 is unloaded to be mounted.When the computer program is executed by central processing unit (CPU) 601, execute in the present processes The above-mentioned function of limiting.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer The combination of order is realized.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include location information acquiring unit, location information similarity acquiring unit and matching unit.Wherein, the title of these units is at certain The restriction to the unit itself is not constituted in the case of kind, for example, matching unit is also described as " according to location information phase Like degree, the determining unit with the matched other user identifiers of user identifier to be matched ".
As on the other hand, present invention also provides a kind of nonvolatile computer storage media, the non-volatile calculating Machine storage medium can be nonvolatile computer storage media included in device described in above-described embodiment;It is also possible to Individualism, without the nonvolatile computer storage media in supplying terminal.Above-mentioned nonvolatile computer storage media is deposited One or more program is contained, when one or more of programs are executed by an equipment, so that the equipment: to pre- The user's operation information set deposited is analyzed, with obtaining each network protocol IP recorded in the user's operation information set The weight of at least one localization region and each localization region that location is distributed, wherein in the user's operation information set User's operation information include following information: user identifier, IP address, anchor point coordinate;The IP according to associated by user identifier The weight of localization region and each localization region that address is distributed, obtains user identifier to be matched and each user grasps Make the location information similarity between the other user identifiers recorded in information aggregate;According to location information similarity, determine with The matched other user identifiers of user identifier to be matched.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (18)

1. a kind of user identifier matching process, which is characterized in that the described method includes:
The user's operation information set prestored is analyzed, each network recorded in the user's operation information set is obtained The weight of at least one localization region and each localization region that Protocol IP address is distributed, wherein the user's operation letter User's operation information in breath set includes following information: user identifier, IP address, anchor point coordinate;
The weight of localization region and each localization region that the IP address according to associated by user identifier is distributed, obtain to With the location information similarity between the other user identifiers recorded in user identifier and each user's operation information set;
According to location information similarity, the determining and matched other user identifiers of user identifier to be matched;
Wherein, the described pair of user's operation information set prestored is analyzed, and is obtained and is recorded in the user's operation information set The weight of at least one localization region and each localization region that is distributed of each network protocol IP address, comprising:
Obtain anchor point coordinate set associated by each IP address recorded in the user's operation information set;
For each IP address, clustering is carried out to anchor point coordinate set associated by the IP address, obtains institute At least one corresponding cluster, the localization region being distributed as the IP;
For each IP address, the weight for each localization region that the IP address is distributed is determined.
2. the method according to claim 1, wherein described be directed to each IP address, with determining the IP The weight for each localization region that location is distributed, comprising:
The localization region number being distributed is greater than the anchor point coordinate and central point in preset quantity threshold value or localization region The IP address that the distance average of coordinate is greater than pre-determined distance threshold value is deleted;
For remaining each IP address, the weight for each localization region that the IP address is distributed is determined.
3. the method according to claim 1, wherein each positioning area that the determination IP address is distributed The weight in domain, comprising:
The number and range for the anchor point coordinate in each localization region being distributed according to the IP address, determine each positioning The initial weight in region;
The center point coordinate for each localization region that IP address associated by user identifier is distributed is corresponding as user identifier Center point coordinate, to the corresponding center point coordinate base area removing the work of user identifier recorded in the user's operation information set Office carries out gridding, generates at least two grids;
Obtain the corresponding center in each grid of each user identifier recorded in the user's operation information set The sum of the initial weight of localization region, as each grid frequency corresponding with each user identifier, and obtains where point coordinate The sum of the initial weight of localization region where center point coordinate in each grid, as the corresponding total user's frequency of each grid It is secondary;
Based on the frequency, the weight of each localization region is calculated by TF-IDF algorithm.
4. method according to claim 1 to 3, which is characterized in that the method also includes:
Calculate the IP address similarity between the user identifier to be matched and each other user identifiers;And
It is described according to location information similarity, it is determining with the matched other user identifiers of user identifier to be matched, comprising:
According between the user identifier to be matched and each other user identifiers location information similarity and IP address it is similar Degree, the determining and matched other user identifiers of user identifier to be matched.
5. according to the method described in claim 4, it is characterized in that, it is described according to the user identifier to be matched with it is each other Location information similarity and IP address similarity between user identifier, determination are matched other with the user identifier to be matched User identifier, comprising:
The user identifier to be matched and the corresponding characteristic information of each other user identifiers are obtained, the characteristic information includes: IP address similarity, location information similarity between the user identifier to be matched and other user identifiers;
Based on the user identifier to be matched and the corresponding characteristic information of each other user identifiers, pass through sequence trained in advance Model obtains user identifier to be matched and each matched probability of other user identifiers;
Determine that the corresponding probability is matched greater than other user identifiers of scheduled threshold value with the user identifier to be matched.
6. according to the method described in claim 5, it is characterized in that, user's operation data in the user's operation information set Information further include: terminal type information, operation system information;And
The characteristic information further includes at least one of following information: the user identifier to be matched and other user identifiers it Between identical IP address quantity, corresponding center point coordinate be overlapped quantity, the user identifier to be matched and other user identifiers Associated terminal type information, operation system information.
7. method according to claim 1 to 3, which is characterized in that the use recorded in the user's operation information set Family mark includes that the first user identifier and second user identify, the user identifier to be matched and each other user identifier difference Belong to the first user identifier and second user mark.
8. the method according to the description of claim 7 is characterized in that obtaining user identifier to be matched and each user behaviour After making the location information similarity between the other user identifiers recorded in information aggregate, the method also includes:
According to the sequence of the location information similarity between the user identifier to be matched from big to small, in the user's operation Predetermined quantity second user mark is successively chosen in the second user mark recorded in information aggregate, obtains candidate second user Logo collection;
And
It is described according to location information similarity, it is determining with the matched other user identifiers of user identifier to be matched, comprising:
According between each second user mark in the user identifier to be matched and the candidate second user logo collection Location information similarity, it is determining to be identified with the matched second user of the first user identifier to be matched.
9. according to the method described in claim 8, it is characterized in that, according to the user identifier to be matched and described candidate the The location information similarity between each second user mark in two user identifier set, it is determining to be used with described to be matched first Before family identifies matched second user mark, the method also includes:
For each second user mark in the candidate second user logo collection, obtain the second user mark with it is each Location information similarity between a first user identifier;
Predetermined quantity is successively chosen according to the sequence of the location information similarity between second user mark from big to small A first user identifier obtains candidate first user identifier set;
If the user identifier to be matched in the candidate first user identifier set, the second user is not identified It is deleted from the candidate second user logo collection.
10. a kind of user identifier coalignment, which is characterized in that described device includes:
Location information acquiring unit obtains the user's operation letter for analyzing the user's operation information set prestored The power of at least one localization region and each localization region that each network protocol IP address recorded in breath set is distributed Weight, wherein the user's operation information in the user's operation information set includes following information: user identifier, IP address, positioning Point coordinate;
Location information similarity acquiring unit, the localization region being distributed for the IP address according to associated by user identifier and The weight of each localization region obtains the other use recorded in user identifier to be matched and each user's operation information set Location information similarity between the mark of family;
Matching unit, for according to location information similarity, the determining and described matched other users of user identifier to be matched to be marked Know;
Wherein, the location information acquiring unit includes:
Coordinate set obtains subelement, for obtaining associated by each IP address recorded in the user's operation information set Anchor point coordinate set;
Subelement is clustered, for being directed to each IP address, anchor point coordinate set associated by the IP address is carried out Clustering obtains at least one corresponding cluster, the localization region being distributed as the IP;
Weight determines subelement, for being directed to each IP address, determines each localization region that the IP address is distributed Weight.
11. device according to claim 10, which is characterized in that the weight determines that subelement includes:
Extensive IP removes module, and the localization region number for will be distributed is greater than in preset quantity threshold value or localization region The IP address that the distance average of anchor point coordinate and center point coordinate is greater than pre-determined distance threshold value is deleted;
Weight determination module determines each positioning that the IP address is distributed for being directed to remaining each IP address The weight in region.
12. device according to claim 10, which is characterized in that the weight determines that subelement includes:
Initial weight determining module, of the anchor point coordinate in each localization region for being distributed according to the IP address Several and range, determines the initial weight of each localization region;
The center point coordinate of gridding module, each localization region for IP address associated by user identifier to be distributed is made For the corresponding center point coordinate of user identifier, to the corresponding central point of user identifier recorded in the user's operation information set Coordinate carries out gridding according to geographic layout, generates at least two grids;
The frequency obtains module, corresponding each for obtaining each user identifier recorded in the user's operation information set The sum of the initial weight of localization region where center point coordinate in the grid, as each grid and each user identifier pair The frequency answered, and the sum of the initial weight of the place of the center point coordinate in each grid localization region is obtained, as each grid Corresponding total user's frequency;
Weight calculation module calculates the weight of each cluster by TF-IDF algorithm for being based on the frequency.
13. any device of 0-12 according to claim 1, which is characterized in that described device further include:
IP similarity calculated, for calculating the IP address between the user identifier to be matched and each other user identifiers Similarity;And
The matching unit is also used to according to the location information between the user identifier to be matched and each other user identifiers Similarity and IP address similarity, the determining and matched other user identifiers of user identifier to be matched.
14. device according to claim 13, which is characterized in that the matching unit includes:
Characteristic information obtains subelement, for obtaining the user identifier to be matched and the corresponding feature of each other user identifiers Information, the characteristic information include: IP address similarity between the user identifier to be matched and other user identifiers, positioning Information similarity;
Sorting subunit is led to for being based on the user identifier to be matched and the corresponding characteristic information of each other user identifiers After order models trained in advance, user identifier to be matched and each matched probability of other user identifiers are obtained;
Coupling subelement, for determine the corresponding probability be greater than scheduled threshold value other user identifiers with it is described to be matched User identifier matching.
15. device according to claim 14, which is characterized in that the user's operation number in the user's operation information set It is believed that breath further include: terminal type information, operation system information;And
The characteristic information further includes at least one of following information: the user identifier to be matched and other user identifiers it Between identical IP address quantity, corresponding center point coordinate be overlapped quantity, the user identifier to be matched and other user identifiers Associated terminal type information, operation system information.
16. any device of 0-12 according to claim 1, which is characterized in that recorded in the user's operation information set User identifier include that the first user identifier and second user identify, the user identifier to be matched and each other user identifiers It is belonging respectively to the first user identifier and second user mark.
17. device according to claim 16, which is characterized in that described device further include:
First selection unit, for the location information similarity acquiring unit obtain user identifier to be matched with it is each described After the location information similarity between other user identifiers recorded in user's operation information set, according to it is described to be matched The sequence of location information similarity from big to small between user identifier, second recorded in the user's operation information set Predetermined quantity second user mark is successively chosen in user identifier, obtains candidate second user logo collection;And
The matching unit be also used to according to the user identifier to be matched with it is each in the candidate second user logo collection Location information similarity between a second user mark, the determining and matched second user of the first user identifier to be matched Mark.
18. device according to claim 17, which is characterized in that the location information similarity acquiring unit is also used to The matching unit is according to each second user in the user identifier to be matched and the candidate second user logo collection Location information similarity between mark, before the matched second user of determining and described first user identifier to be matched identifies, For each second user mark in the candidate second user logo collection, the second user mark and each the are obtained Location information similarity between one user identifier;And
Described device further include:
Second selection unit, for the sequence according to the location information similarity between second user mark from big to small The first user identifier of predetermined quantity is successively chosen, candidate first user identifier set is obtained;
Candidate filter element is used in the matching unit according to the user identifier to be matched and the candidate second user mark Know the location information similarity between each second user mark in set, determining and first user identifier to be matched It, will when the user identifier to be matched is not in the candidate first user identifier set before the second user mark matched The second user mark is deleted from the candidate second user logo collection.
CN201610172168.XA 2016-03-24 2016-03-24 User identifier matching process and device Active CN105721629B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610172168.XA CN105721629B (en) 2016-03-24 2016-03-24 User identifier matching process and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610172168.XA CN105721629B (en) 2016-03-24 2016-03-24 User identifier matching process and device

Publications (2)

Publication Number Publication Date
CN105721629A CN105721629A (en) 2016-06-29
CN105721629B true CN105721629B (en) 2019-04-26

Family

ID=56159077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610172168.XA Active CN105721629B (en) 2016-03-24 2016-03-24 User identifier matching process and device

Country Status (1)

Country Link
CN (1) CN105721629B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228187A (en) * 2016-07-21 2016-12-14 贵州力创科技发展有限公司 Individual recognizer model based on multiple user's detail data and treatment technology
CN106789411B (en) * 2016-12-07 2020-01-21 北京亚鸿世纪科技发展有限公司 Method and device for acquiring active IP data in machine room
US10348745B2 (en) 2017-01-05 2019-07-09 Cisco Technology, Inc. Associating a user identifier detected from web traffic with a client address
CN109104506B (en) * 2017-06-20 2021-05-14 腾讯科技(深圳)有限公司 Method and device for determining domain name resolution rule and computer readable storage medium
CN109005513B (en) * 2018-06-26 2021-03-19 北京酷云互动科技有限公司 Mobile phone terminal association method and mobile phone terminal association system
CN109447114B (en) * 2018-09-25 2020-11-06 北京酷云互动科技有限公司 Method and system for evaluating association degree between places
CN110493368B (en) * 2019-08-21 2022-02-25 北京明略软件系统有限公司 Matching method and device of equipment identifiers
CN111026937B (en) * 2019-11-13 2021-02-19 百度在线网络技术(北京)有限公司 Method, device and equipment for extracting POI name and computer storage medium
CN111127094B (en) * 2019-12-19 2023-08-25 秒针信息技术有限公司 Account matching method and device, electronic equipment and storage medium
CN117172792A (en) * 2023-11-02 2023-12-05 赞塔(杭州)科技有限公司 Customer information management method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409868A (en) * 2008-12-01 2009-04-15 腾讯科技(深圳)有限公司 Method, system and equipment for matching object in mobile terminal
CN102056079A (en) * 2009-10-30 2011-05-11 中国移动通信集团上海有限公司 Method, device and system for determining information to be pushed
CN105187237A (en) * 2015-08-12 2015-12-23 百度在线网络技术(北京)有限公司 Method and device for searching associated user identifications

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120174205A1 (en) * 2010-12-31 2012-07-05 International Business Machines Corporation User profile and usage pattern based user identification prediction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409868A (en) * 2008-12-01 2009-04-15 腾讯科技(深圳)有限公司 Method, system and equipment for matching object in mobile terminal
CN102056079A (en) * 2009-10-30 2011-05-11 中国移动通信集团上海有限公司 Method, device and system for determining information to be pushed
CN105187237A (en) * 2015-08-12 2015-12-23 百度在线网络技术(北京)有限公司 Method and device for searching associated user identifications

Also Published As

Publication number Publication date
CN105721629A (en) 2016-06-29

Similar Documents

Publication Publication Date Title
CN105721629B (en) User identifier matching process and device
CN105608179B (en) The method and apparatus for determining the relevance of user identifier
CN105431844B (en) Third party for search system searches for application
CN108282527B (en) Generate the distributed system and method for Service Instance
CN105247507B (en) Method, system and storage medium for the influence power score for determining brand
CN104008139B (en) The creation method and device of video index table, the recommendation method and apparatus of video
CN111046237B (en) User behavior data processing method and device, electronic equipment and readable medium
WO2019062081A1 (en) Salesman profile formation method, electronic device and computer readable storage medium
CN109918378A (en) A kind of remotely-sensed data storage method and storage system based on block chain
CN108549909B (en) Object classification method and object classification system based on crowdsourcing
CN107977678A (en) Method and apparatus for output information
CN110191183A (en) Accurate intelligent method for pushing, system, device and computer readable storage medium
CN109416684A (en) The intake manager of analysis platform
CN109862100A (en) Method and apparatus for pushed information
KR101346927B1 (en) Search device, search method, and computer-readable memory medium for recording search program
CN110209658A (en) Data cleaning method and device
CN110399564B (en) Account classification method and device, storage medium and electronic device
CN109614549B (en) Method and apparatus for pushed information
CN116263659A (en) Data processing method, apparatus, computer program product, device and storage medium
CN108182180B (en) Method and apparatus for generating information
CN110532254A (en) The method and apparatus of fused data table
CN109902698A (en) Information generating method and device
CN105849719A (en) Augmented reality
CN116186119A (en) User behavior analysis method, device, equipment and storage medium
CN110062112A (en) Data processing method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant