CN105721629B - User identifier matching process and device - Google Patents
User identifier matching process and device Download PDFInfo
- Publication number
- CN105721629B CN105721629B CN201610172168.XA CN201610172168A CN105721629B CN 105721629 B CN105721629 B CN 105721629B CN 201610172168 A CN201610172168 A CN 201610172168A CN 105721629 B CN105721629 B CN 105721629B
- Authority
- CN
- China
- Prior art keywords
- user
- matched
- user identifier
- address
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2101/00—Indexing scheme associated with group H04L61/00
- H04L2101/60—Types of network addresses
- H04L2101/69—Types of network addresses using geographic information, e.g. room number
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/52—Network services specially adapted for the location of the user terminal
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
This application discloses user identifier matching process and devices.One specific embodiment of the user identifier matching process includes: to analyze the user's operation information set prestored, obtain the weight of at least one localization region and each localization region that each network protocol IP address recorded in the user's operation information set is distributed, wherein, the user's operation information in the user's operation information set includes following information: user identifier, IP address, anchor point coordinate;The weight of localization region and each localization region that the IP address according to associated by user identifier is distributed obtains the location information similarity between the other user identifiers recorded in user identifier to be matched and each user's operation information set;According to location information similarity, the determining and matched other user identifiers of user identifier to be matched.The embodiment, which realizes, accurately and reliably matches user identifier.
Description
Technical field
This application involves field of computer technology, and in particular to user's Portrait brand technology field more particularly to user identifier
Method of completing the square and device.
Background technique
With flourishing for internet, the attribute and relationship of each user are precisely analyzed by user's representation data
Demand is more and more clear.User's portrait is the virtual representations of real user, a series of target being built upon on truthful datas
User model.Understanding user is gone to be divided into them according to the difference of their target, behavior and viewpoint by user's investigation
Then different types extracts characteristic feature in each type, assign name, photo, some demography elements, scene
Deng description, it is formed user's representation data.User's portrait enables enterprise advantageously to obtain user by internet
More extensive feedback information, further precisely, rapidly to analyze the important business letter such as user behavior habit, consumption habit
Breath, provides enough data basis.
Currently, some Large-Scale Interconnected nets enterprise usually possesses a plurality of product line, every product line has respective user to believe
Breath.In order to more accurately extract user's representation data, need to match the user identifier in multiple product lines, to determine respectively
Whether a product line user identifier belongs to the same user.The existing matched method of user identifier is usually to be based solely on use
Family identifies associated (Internet Protocol, the network protocol) address IP to match to user identifier, or individually
User identifier is matched according to location information associated by user identifier.
It however, the IP address distribution mechanism due to each common carrier is different, and is usually to be randomly assigned, therefore individually
It is lower come the reliability for carrying out matched scheme to user identifier based on IP address.Simultaneously as user takes in access internet
It would generally select to shield unnecessary Location Request when business, therefore the location information of user usually has missing, to be difficult to root
Accurately user identifier is matched according to the location information of excalation.
Summary of the invention
The purpose of the application is to propose a kind of user identifier matching process and device, to solve background section above
The technical issues of mentioning.
In a first aspect, this application provides a kind of user identifier matching process, which comprises grasped to the user prestored
It is analyzed as information aggregate, obtains what each network protocol IP address recorded in the user's operation information set was distributed
The weight of at least one localization region and each localization region, wherein the user's operation in the user's operation information set
Information includes following information: user identifier, IP address, anchor point coordinate;The IP address according to associated by user identifier is distributed
Localization region and each localization region weight, obtain user identifier to be matched and each user's operation information set
Location information similarity between other user identifiers of middle record;According to location information similarity, it is determining with it is described to be matched
The matched other user identifiers of user identifier.
In some embodiments, the described pair of user's operation information set prestored is analyzed, and obtains the user's operation
At least one localization region that each network protocol IP address recorded in information aggregate is distributed and each localization region
Weight, comprising: obtain anchor point coordinate set associated by each IP address recorded in the user's operation information set;Needle
To each IP address, clustering is carried out to anchor point coordinate set associated by the IP address, is obtained corresponding
At least one cluster, the localization region being distributed as the IP;For each IP address, determine that the IP address is divided
The weight of each localization region of cloth.
In some embodiments, described to be directed to each IP address, determine each positioning that the IP address is distributed
The weight in region, comprising: the localization region number being distributed is greater than the anchor point in preset quantity threshold value or localization region
The IP address that the distance average of coordinate and center point coordinate is greater than pre-determined distance threshold value is deleted;For remaining each IP
Address determines the weight for each localization region that the IP address is distributed.
In some embodiments, the weight for each localization region that the determination IP address is distributed, comprising: according to
The number and range for the anchor point coordinate in each localization region that the IP address is distributed determine the first of each localization region
Beginning weight;The center point coordinate for each localization region that IP address associated by user identifier is distributed is as user identifier pair
The center point coordinate answered, to the corresponding center point coordinate of user identifier recorded in the user's operation information set according to geography
Layout carries out gridding, generates at least two grids;Obtain each user identifier recorded in the user's operation information set
The sum of the initial weight of localization region where the corresponding center point coordinate in each grid, as each grid and often
The corresponding frequency of a user identifier, and the sum of the initial weight of localization region where obtaining the center point coordinate in each grid,
As the corresponding total user's frequency of each grid;Based on the frequency, the power of each localization region is calculated by TF-IDF algorithm
Weight.
In some embodiments, the method also includes: calculate the user identifier to be matched and each other users mark
IP address similarity between knowledge;And it is described according to location information similarity, determination is matched with the user identifier to be matched
Other user identifiers, comprising: according to the location information phase between the user identifier to be matched and each other user identifiers
Like degree and IP address similarity, the determining and matched other user identifiers of user identifier to be matched.
In some embodiments, the positioning according between the user identifier to be matched and each other user identifiers
Information similarity and IP address similarity, the determining and matched other user identifiers of user identifier to be matched, comprising: obtain
The user identifier to be matched and the corresponding characteristic information of each other user identifiers, the characteristic information include: it is described to
With IP address similarity, the location information similarity between user identifier and other user identifiers;Based on the user to be matched
Mark and the corresponding characteristic information of each other user identifiers are obtained user to be matched and are marked by order models trained in advance
Know and each other matched probability of user identifier;Determine that the corresponding probability is greater than other user identifiers of scheduled threshold value
It is matched with the user identifier to be matched.
In some embodiments, the user's operation data information in the user's operation information set further include: terminal type
Number information, operation system information;And the characteristic information further includes at least one of following information: the user to be matched
Identical IP address quantity, corresponding center point coordinate between mark and other user identifiers are overlapped quantity, the use to be matched
Terminal type information, operation system information associated by family mark and other user identifiers.
In some embodiments, the user identifier recorded in the user's operation information set include the first user identifier and
Second user mark, the user identifier to be matched and each other user identifiers are belonging respectively to the first user identifier and second and use
Family mark.
In some embodiments, it is recorded in obtaining user identifier to be matched and each user's operation information set
After location information similarity between other user identifiers, the method also includes: according to the user identifier to be matched
Between location information similarity sequence from big to small, the second user mark recorded in the user's operation information set
In successively choose predetermined quantity second user mark, obtain candidate second user logo collection;And it is described according to positioning letter
Cease similarity, the determining and matched other user identifiers of user identifier to be matched, comprising: mark according to the user to be matched
The location information similarity between each second user mark in knowledge and the candidate second user logo collection, determining and institute
State the matched second user mark of the first user identifier to be matched.
In some embodiments, according in the user identifier to be matched and the candidate second user logo collection
Location information similarity between each second user mark, it is determining to be used with first user identifier matched second to be matched
Before the mark of family, the method also includes: for each second user mark in the candidate second user logo collection, obtain
The second user is taken to identify the location information similarity between each first user identifier;According to the second user mark
The sequence of location information similarity from big to small between knowledge successively chooses predetermined quantity the first user identifier, obtains candidate the
One user identifier set;If the user identifier to be matched, will be described not in the candidate first user identifier set
Second user mark is deleted from the candidate second user logo collection.
Second aspect, this application provides a kind of user identifier coalignment, described device includes: that location information obtains list
Member, for analyzing the user's operation information set prestored, obtain recorded in the user's operation information set it is each
The weight of at least one localization region and each localization region that network protocol IP address is distributed, wherein the user behaviour
Making the user's operation information in information aggregate includes following information: user identifier, IP address, anchor point coordinate;Location information phase
Like degree acquiring unit, the localization region being distributed for the IP address according to associated by user identifier and each localization region
Weight obtains and determines between the other user identifiers recorded in user identifier to be matched and each user's operation information set
Position information similarity;Matching unit, for according to location information similarity, it is determining with the user identifier to be matched it is matched its
Its user identifier.
In some embodiments, the location information acquiring unit includes: that coordinate set obtains subelement, for obtaining
State anchor point coordinate set associated by each IP address recorded in user's operation information set;Subelement is clustered, needle is used for
To each IP address, clustering is carried out to anchor point coordinate set associated by the IP address, is obtained corresponding
At least one cluster, the localization region being distributed as the IP;Weight determines subelement, for for each IP
Location determines the weight for each localization region that the IP address is distributed.
In some embodiments, the weight determines that subelement includes: that extensive IP removes module, for determining what is be distributed
Position areal is greater than the distance average of preset quantity threshold value or anchor point coordinate and center point coordinate in localization region
IP address greater than pre-determined distance threshold value is deleted;Weight determination module determines institute for being directed to remaining each IP address
State the weight for each localization region that IP address is distributed.
In some embodiments, the weight determines that subelement includes: initial weight determining module, for according to the IP
The number and range for the anchor point coordinate in each localization region that address is distributed, determine the initial power of each localization region
Weight;The center point coordinate of gridding module, each localization region for IP address associated by user identifier to be distributed is made
For the corresponding center point coordinate of user identifier, to the corresponding central point of user identifier recorded in the user's operation information set
Coordinate carries out gridding according to geographic layout, generates at least two grids;The frequency obtains module, for obtaining the user's operation
Localization region where the corresponding center point coordinate in each grid of each user identifier recorded in information aggregate
The sum of initial weight as each grid frequency corresponding with each user identifier, and obtains the seat of the central point in each grid
The sum of the initial weight of localization region where mark, as the corresponding total user's frequency of each grid;Weight calculation module is used for base
In the frequency, the weight of each cluster is calculated by TF-IDF algorithm.
In some embodiments, described device further include: IP similarity calculated, for calculating the user to be matched
IP address similarity between mark and each other user identifiers;And the matching unit is also used to according to described to be matched
Location information similarity and IP address similarity between user identifier and each other user identifiers, it is determining with it is described to be matched
The matched other user identifiers of user identifier.
In some embodiments, the matching unit includes: that characteristic information obtains subelement, described to be matched for obtaining
User identifier and the corresponding characteristic information of each other user identifiers, the characteristic information include: the user identifier to be matched
IP address similarity, location information similarity between other user identifiers;Sorting subunit, for based on described to be matched
User identifier and the corresponding characteristic information of each other user identifiers obtain use to be matched by order models trained in advance
Family mark and each other matched probability of user identifier;Coupling subelement, for determining that it is predetermined that the corresponding probability is greater than
Other user identifiers of threshold value matched with the user identifier to be matched.
In some embodiments, the user's operation data information in the user's operation information set further include: terminal type
Number information, operation system information;And the characteristic information further includes at least one of following information: the user to be matched
Identical IP address quantity, corresponding center point coordinate between mark and other user identifiers are overlapped quantity, the use to be matched
Terminal type information, operation system information associated by family mark and other user identifiers.
In some embodiments, the user identifier recorded in the user's operation information set include the first user identifier and
Second user mark, the user identifier to be matched and each other user identifiers are belonging respectively to the first user identifier and second and use
Family mark.
In some embodiments, described device further include: the first selection unit, for being obtained in the location information similarity
Unit is taken to obtain between the other user identifiers recorded in user identifier to be matched and each user's operation information set
After location information similarity, according to from big to small suitable of the location information similarity between the user identifier to be matched
Sequence, recorded in the user's operation information set second user mark in successively choose predetermined quantity second user mark
Know, obtains candidate second user logo collection;And the matching unit is also used to according to the user identifier to be matched and institute
State the location information similarity between each second user mark in candidate second user logo collection, it is determining with it is described to
With the matched second user mark of the first user identifier.
In some embodiments, the location information similarity acquiring unit is also used in the matching unit according to
Location information phase between user identifier to be matched and each second user mark in the candidate second user logo collection
Like degree, before the determining and described matched second user mark of first user identifier to be matched, for the candidate second user
Each second user mark in logo collection, obtains the positioning between the second user mark and each first user identifier
Information similarity;And described device further include: the second selection unit, for determining according between second user mark
The sequence of position information similarity from big to small successively chooses the first user identifier of predetermined quantity, obtains candidate first user identifier
Set;Candidate filter element is used in the matching unit according to the user identifier to be matched and the candidate second user
The location information similarity between each second user mark in logo collection, determining and first user identifier to be matched
Before matched second user mark, when the user identifier to be matched is not in the candidate first user identifier set,
Second user mark is deleted from the candidate second user logo collection.
User identifier matching process and device provided by the present application, it is each by being recorded in acquisition user's operation information set
The weight of at least one localization region and each localization region that a network protocol IP address is distributed is supplemented and perfect is used
Family identifies corresponding location information;And the localization region that is distributed of the IP address according to associated by user identifier and each positioning
The weight in region, obtain other user identifiers for being recorded in user identifier to be matched and each user's operation information set it
Between location information similarity, it is determining with the matched other users of user identifier to be matched according to location information similarity
Mark, realizes and accurately and reliably matches to user identifier.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow chart according to one embodiment of the user identifier matching process of the application;
Fig. 3 A is according to the exemplary of some data processings of one embodiment of the user identifier matching process of the application
Schematic diagram;
Fig. 3 B is the example according to the other data processing of one embodiment of the user identifier matching process of the application
Property schematic diagram;
Fig. 4 is the matching effect comparison diagram according to one embodiment of the user identifier matching process of the application;
Fig. 5 is the flow chart according to another embodiment of the user identifier matching process of the application;
Fig. 6 is the matching effect comparison diagram according to another embodiment of the user identifier matching process of the application;
Fig. 7 is the structural schematic diagram according to one embodiment of the user identifier coalignment of the application;
Fig. 8 is adapted for the structural schematic diagram for the computer system for realizing the server of the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using the implementation of the user identifier matching process or user identifier matching process device of the application
The exemplary system architecture 100 of example.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out
Send message etc..Various client applications, such as browser application, searching class can be installed on terminal device 101,102,103
Using, shopping class application, social platform software etc..
Terminal device 101,102,103 can be the various electronic equipments for supporting browser application, searching class application, including
But be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 105 can be to provide the server of various services, such as to the browsing on terminal device 101,102,103
The database server or Cloud Server that the offers such as device application, searching class application are supported.Server can be to the user received
Operation information such as is stored, is integrated, being analyzed at the processing, to match to user identifier.
It should be noted that user identifier matching process provided by the embodiments of the present application is usually executed by server 105.Phase
Ying Di, user identifier coalignment are generally disposed in server 105.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, Fig. 2 shows the processes according to one embodiment of the user identifier matching process of the application
200。
As shown in Fig. 2, the user identifier matching process of the present embodiment the following steps are included:
Step 201, the user's operation information set prestored is analyzed, obtains and remembers in above-mentioned user's operation information set
The weight of at least one localization region and each localization region that each IP address of record is distributed.
Wherein, the user's operation information in above-mentioned user's operation information set includes following information: user identifier, IP
Location, anchor point coordinate.In the present embodiment, the electronic equipment of user identifier matching process operation thereon is (such as shown in FIG. 1
Server) can be from locally or remotely acquisition user's operation information set, and it is directed to the user's operation information set got
Each IP address of middle record, obtain in above-mentioned user's operation information set with the associated anchor point coordinate of IP address.Then,
It can will be divided into the associated multiple anchor point coordinates of IP address according to the distance between associated anchor point coordinate of IP address
At least one localization region, each region include at least one anchor point coordinate.It later, can be according to the positioning coordinate in region
Number, range and/or user residence time in the zone, determine the weight of each localization region, or pass through other power
Re-computation algorithm (such as TF-IDF algorithm) determines the weight of each localization region.
In some optional implementations of the present embodiment, the available above-mentioned user's operation information set of electronic equipment
Anchor point coordinate set associated by each IP address of middle record;For each above-mentioned IP address, above-mentioned IP address is closed
The anchor point coordinate set of connection carries out clustering, obtains at least one corresponding cluster, determines as what above-mentioned IP was distributed
Position region;For each above-mentioned IP address, the weight for each localization region that above-mentioned IP address is distributed is determined.Wherein, electronics
Equipment can carry out clustering to anchor point coordinate set associated by above-mentioned IP address by K-means algorithm, obtain institute
At least one corresponding cluster.
Step 202, the IP address according to associated by user identifier is distributed localization region and each localization region
Weight obtains and determines between the other user identifiers recorded in user identifier to be matched and each above-mentioned user's operation information set
Position information similarity.
In the present embodiment, for each to user identifier to be matched and other user identifiers, electronic equipment can basis
The weight of localization region and localization region that IP address associated by two user identifiers is distributed generates two vectors, leads to
Such as cosine similarity algorithm, Jaccard similarity algorithm or other similarity algorithms are crossed to calculate two vector similarities,
As the location information similarity between two user identifiers.
Step 203, determining to be marked with the above-mentioned matched other users of user identifier to be matched according to location information similarity
Know.
In the present embodiment, it is pre- to can determine that the location information similarity between user identifier to be matched is greater than for electronic equipment
Determine similarity threshold other user identifiers be and the above-mentioned matched other user identifiers of user identifier to be matched.In addition, electronics
Equipment can also be according to location information similarity and some other characteristic information (for example, terminal type associated by user identifier
Number information, operation system information), user identifier to be matched and each other use are calculated by order models trained in advance
Matched probability between the mark of family, and determine corresponding above-mentioned probability be greater than predetermined threshold other user identifiers and it is above-mentioned to
It is matched with user identifier.
In some optional implementations of the present embodiment, it is directed to each above-mentioned IP address in step 201, determines IP
The processing of the weight for each localization region that address is distributed may include: that the localization region number that will be distributed is greater than present count
The distance average for measuring threshold value (such as 5) or anchor point coordinate and center point coordinate in localization region is greater than pre-determined distance
The IP address of threshold value (such as 3000 meters) is deleted;For remaining each above-mentioned IP address, determine what above-mentioned IP address was distributed
The weight of each localization region.To be not fixed using crowd, wide coverage is simultaneously while overlay area in geographical distribution
Unfixed IP address (such as mobile cellular IP address) removal, only analyzes the opposite IP address using user for having comparison fixed
(such as outlet IP address of family, company etc.) is analyzed, and the matched accuracy of user identifier is improved.
In addition, the above-mentioned IP address of determination in step 201 is distributed in some optional implementations of the present embodiment
The processing of weight of each localization region may include: that anchor point in each localization region being distributed according to IP address is sat
Target number and range determine the initial weight of each localization region;IP address associated by user identifier is distributed each
The center point coordinate of a localization region is as the corresponding center point coordinate of user identifier, to remembering in above-mentioned user's operation information set
The corresponding center point coordinate of the user identifier of record carries out gridding according to geographic layout, generates at least two grids;It obtains above-mentioned
It is fixed where the corresponding center point coordinate in each above-mentioned grid of each user identifier recorded in user's operation information set
The sum of the initial weight in position region, as each grid frequency corresponding with each user identifier, and obtains in each grid
The sum of the initial weight of localization region where center point coordinate, as the corresponding total user's frequency of each grid;Based on above-mentioned frequency
It is secondary, it is calculated by TF-IDF (Term Frequency-Inverse Document Frequency, the reverse file word frequency of word frequency -)
Method calculates the weight of each localization region.
Wherein, the range of the anchor point coordinate in localization region can use the anchor point coordinate and central point in localization region
The distance average of coordinate indicates that electronic equipment can pass through the number and IP address of the anchor point coordinate in a localization region
The ratio of the number of the anchor point coordinate in all localization regions distributed and above range determine the power of localization region
Weight, wherein above-mentioned bigger weight is higher, the smaller weight of range is higher.TF-IDF algorithm be usually to assess a words for
The significance level of one file set or a copy of it file in a corpus, the main thought of the algorithm is:: if some
The frequency TF high that word or phrase occur in an article, and seldom occur in other articles, then it is assumed that this word is short
Language has good class discrimination ability, is adapted to classify.In the present embodiment, electronic equipment can be using grid as word
Language calculates the weight of each localization region by TF-IDF algorithm using user identifier as file, therefore in the present embodiment,
The weight of distributed areas where the grid that the frequency corresponding with the user identifier is higher, corresponding total user's frequency is lower is more
It is high.
This implementation, the number for the anchor point coordinate in each localization region being distributed by elder generation according to IP address and
Range determines the initial weight of each localization region, is then based on the initial weight, calculates each positioning by TF-IDF algorithm
The weight in region is determined more reasonably to weigh to comprehensively consider the independence of distributed areas, liveness and scope of activities
Weight.
In some optional implementations of the present embodiment, the user identifier that is recorded in above-mentioned user's operation information set
It is identified including the first user identifier and second user, above-mentioned user identifier to be matched and above-mentioned each other user identifiers belong to respectively
It is identified in the first user identifier and second user.Wherein it is possible to which electronic equipment can distinguish the first user mark by flag bit
Know and second user mark, the first user identifier and second user mark can be distributed be two different product lines user mark
User identifier when knowing, such as being scanned in webpage by browser, and by searching for use when scanning for using APP
Family mark.If user identifier to be matched is the first user identifier, when matching user identifier to be matched, only second user is identified
The processing such as similarity calculation is carried out with user identifier to be matched, to reduce the data volume of calculating, accelerates matching efficiency.
Based on a upper implementation, in some optional implementations of the present embodiment, after step 202, this reality
The user identifier matching process for applying example can also include: according to the location information similarity between user identifier to be matched from big
To small sequence, predetermined quantity (example is successively chosen in the second user that records in above-mentioned user's operation information set mark
Such as 50) second user mark, obtain candidate second user logo collection.And step 203 may include: according to it is above-mentioned to
The location information matched between user identifier and each second user mark in above-mentioned candidate second user logo collection is similar
Degree, it is determining to be identified with the above-mentioned matched second user of first user identifier to be matched.By the implementation, reduce step
203 calculation amount improves the matched efficiency of user identifier.
Based on a upper implementation, in some optional implementations of the present embodiment, before step 203, this reality
The user identifier matching process for applying example can also include: for each second user in above-mentioned candidate second user logo collection
Mark obtains the location information similarity between above-mentioned second user mark and each first user identifier;According to above-mentioned
The sequence of location information similarity from big to small between two user identifiers successively chooses predetermined quantity (such as 50) first
User identifier obtains candidate first user identifier set;If above-mentioned user identifier to be matched is not in above-mentioned candidate first user
In logo collection, then above-mentioned second user is identified and deleted from above-mentioned candidate second user logo collection.Pass through the realization side
Formula ensure that the other user identifiers for the processing for participating in step 203 must be the location information between user identifier to be matched
Similarity comes the preceding predetermined ranking of user identifier to be matched, and between user identifier to be matched and other user identifiers
Location information similarity comes other user identifiers of the preceding predetermined ranking of other user identifiers.To reduce extreme portions
Non-associated users mark, reduces noise data, promotes matching efficiency, accuracy rate and recall rate.
Illustrate some example data treatment processes of the present embodiment below with reference to Fig. 3 A and Fig. 3 B.The use of the present embodiment
Family identifies matching process, can obtain the seat of anchor point associated by each IP address recorded in user's operation information set first
Mark set, wherein anchor point coordinate set associated by some IP address can be as shown in Figure 3A, the dot-hatched table in Fig. 3 A
Show the anchor point coordinate in above-mentioned anchor point coordinate set;Later, above-mentioned user's operation is obtained by parsers such as clusters to believe
The weight of at least one localization region and each localization region that each IP address recorded in breath set is distributed, as a result may be used
With as shown in Figure 3B, 4 each points marked in Fig. 3 B can indicate the centre coordinate point for the localization region that the IP address is distributed, mark
The other numerical value of note point indicate for localization region weight;Then, so that it may by localization region shown in Fig. 3 B and weight,
The location information similarity between user identifier is carried out, and according to the determination of location information similarity and above-mentioned user identifier to be matched
Matched other user identifiers.
Fig. 4 is the matching effect comparison diagram according to one embodiment of the user identifier matching process of the application.Wherein, divide
It does not provide and is based solely on IP address in the prior art to carry out matching and user identifier matching process through this embodiment and come
Carry out the accuracy rate and recall rate of matched matching result.From fig. 4, it can be seen that user identifier match party through this embodiment
The accuracy rate and recall rate of method have certain promotion.
User identifier matching process provided in this embodiment, by obtaining each net recorded in user's operation information set
The weight of at least one localization region and each localization region that network Protocol IP address is distributed, is supplemented and perfect user's mark
Know corresponding location information;And localization region and each localization region that the IP address according to associated by user identifier is distributed
Weight, obtain between the other user identifiers recorded in user identifier to be matched and each above-mentioned user's operation information set
Location information similarity, according to location information similarity, determination and the above-mentioned matched other user identifiers of user identifier to be matched,
It realizes and accurately and reliably user identifier is matched.
The process of another embodiment of the user identifier matching process according to the application is shown with continued reference to Fig. 5, Fig. 5
500。
As shown in figure 5, the user identifier matching process of the present embodiment the following steps are included:
Step 501, the user's operation information set prestored is analyzed, obtains and remembers in above-mentioned user's operation information set
The weight of at least one localization region and each localization region that each network protocol IP address of record is distributed.
Wherein, the user's operation information in above-mentioned user's operation information set includes following information: user identifier, IP
Location, anchor point coordinate.
In the present embodiment, the specific processing of step 501 can refer to the related description of step 201 in Fig. 2 corresponding embodiment,
Details are not described herein.
Step 502, the IP address according to associated by user identifier is distributed localization region and each localization region
Weight obtains and determines between the other user identifiers recorded in user identifier to be matched and each above-mentioned user's operation information set
Position information similarity.
In the present embodiment, the specific processing of step 502 can refer to the related description of step 202 in Fig. 2 corresponding embodiment,
Details are not described herein.
Step 503, the IP address similarity between above-mentioned user identifier to be matched and each other user identifiers is calculated.
In the present embodiment, electronic equipment (such as the service shown in FIG. 1 of user identifier matching process operation thereon
Device) at least one IP address associated by each user can be obtained first, then pass through TF-IDF algorithm or other weight calculations
Method calculates the weight of each IP address, and the weight of IP address is finally remembered according to associated IP address, calculates user to be matched
IP address similarity between mark and each other user identifiers.
It should be noted that step 503 can be performed simultaneously with step 501 or step 502, can also step 501 it
Preceding execution, or execute after step 502, the present embodiment does not execute sequence to it and limits.
Step 504, according to the location information similarity between above-mentioned user identifier to be matched and each other user identifiers
With IP address similarity, the determining and above-mentioned matched other user identifiers of user identifier to be matched.
In the present embodiment, electronic equipment can be according to the power of preset location information similarity and IP address similarity
Weight, to calculate the synthesis pertinence between the matched each other user identifiers of user identifier to be matched, and according to comprehensive related
Degree determination and the above-mentioned matched other user identifiers of user identifier to be matched.Wherein, above-mentioned location information similarity and IP address
The weight of similarity can rule of thumb be preset with actual conditions by manually.
In some optional implementations of the present embodiment, step 504 may include: to obtain above-mentioned user's mark to be matched
Know characteristic information corresponding with each other user identifiers, features described above information includes: above-mentioned user identifier to be matched and other
IP address similarity, location information similarity between user identifier;Based on above-mentioned user identifier to be matched and each other use
Family identifies corresponding characteristic information, by order models trained in advance, obtains user identifier to be matched and each other users
Identify matched probability;Determine other user identifiers of the corresponding above-mentioned probability greater than scheduled threshold value and above-mentioned user to be matched
Mark matching.Wherein, above-mentioned order models can be based on the training sample set with mark, pass through the LTR such as Pairwise
(Learning To Rank, study sequence) method training obtains.Wherein it is possible to by known two for belonging to same user
The corresponding features described above information of user identifier as positive sample, by known two user identifiers of incoherent user it is corresponding on
Characteristic information is stated as negative sample.This implementation is based on user identifier to be matched and other by order models trained in advance
IP address similarity, location information similarity between user identifier calculate other user identifiers and user identifier to be matched
The probability matched calculates between the matched each other user identifiers of user identifier to be matched compared to by the way that weight is manually arranged
Synthesis pertinence, more accurately and reliably.
Based on a upper implementation, in some optional implementations of the present embodiment, above-mentioned user's operation information collection
User's operation data information in conjunction can also include: terminal type information, operation system information.And features described above information
It further include at least one of following information: the identical IP address number between above-mentioned user identifier to be matched and other user identifiers
Amount, corresponding center point coordinate are overlapped quantity, terminal models associated by above-mentioned user identifier to be matched and other user identifiers
Information, operation system information.The implementation is obtaining user identifier to be matched and each matched probability of other user identifiers
When, it is contemplated that more influence factors, to keep matched accuracy higher.
Fig. 6 is according to the matching effect comparison diagram of another embodiment of the user identifier matching process of the application.Wherein, divide
It does not provide and is matched in the prior art separately through IP address, by the user identifier match party of Fig. 2 corresponding embodiment
Method and user identifier matching process through this embodiment carry out the accuracy rate and recall rate of matched matching result.From
The accuracy rate that Fig. 6 can be seen that user identifier matching process through this embodiment is compared with recall rate to be implemented by the way that Fig. 2 is corresponding
The user identifier matching process of example has certain promotion again.
As can be known from Fig. 5 and Fig. 6, the user identifier match party compared with the corresponding embodiment of Fig. 2, in the present embodiment
The process 500 of method increases IP address similarity as the matched reference factor of user identifier.The side of the present embodiment description as a result,
Case can be with reference to more fully influence factor, to improve matching accuracy.
With further reference to Fig. 7, as the realization to method shown in above-mentioned each figure, this application provides a kind of user identifiers
One embodiment with device, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically apply
In various electronic equipments.
As shown in fig. 7, the user identifier coalignment 700 of the present embodiment includes: location information acquiring unit 701, positioning
Information similarity acquiring unit 702 and matching unit 703.Wherein, location information acquiring unit 701 is used for the user prestored
Operation information set is analyzed, and is obtained each network protocol IP address recorded in above-mentioned user's operation information set and is distributed
At least one localization region and each localization region weight, wherein in above-mentioned user's operation information set user behaviour
It include following information as information: user identifier, IP address, anchor point coordinate;Location information similarity acquiring unit 702 is used for root
The weight of the localization region and each localization region that are distributed according to IP address associated by user identifier, obtains user to be matched
The location information similarity between other user identifiers recorded in mark and each above-mentioned user's operation information set;Matching is single
Member 703 is for according to location information similarity, the determining and above-mentioned matched other user identifiers of user identifier to be matched.
In the present embodiment, location information acquiring unit 701, location information similarity acquiring unit 702 and matching are single
The specific processing of member 703 can refer in Fig. 2 corresponding embodiment respectively, the processing of step 201, step 202 and step 203,
This is repeated no more.
In some optional implementations of the present embodiment, location information acquiring unit 701 may include: coordinate set
Subelement 7011 is obtained, is sat for obtaining anchor point associated by each IP address recorded in above-mentioned user's operation information set
Mark set;Subelement 7012 is clustered, for being directed to each above-mentioned IP address, to anchor point coordinate set associated by above-mentioned IP address
It closes and carries out clustering, obtain at least one corresponding cluster, the localization region being distributed as above-mentioned IP;Weight determines son
Unit 7013 determines the weight for each localization region that above-mentioned IP address is distributed for being directed to each above-mentioned IP address.
Wherein, cluster subelement 7012 can be by K-means algorithm to anchor point coordinate associated by above-mentioned IP address
Set carries out clustering, obtains at least one corresponding cluster.
In some optional implementations of the present embodiment, weight determines that subelement 7013 may include: extensive IP clear
Except module (not shown), the localization region number for will be distributed is greater than in preset quantity threshold value or localization region
The IP address that the distance average of anchor point coordinate and center point coordinate is greater than pre-determined distance threshold value is deleted;Weight determination module
(not shown) determines each localization region that above-mentioned IP address is distributed for being directed to remaining each above-mentioned IP address
Weight.Extensive IP removes the specific processing of module and weight determination module and its brought technical effect can refer to Fig. 2
The associated description of corresponding implementation in corresponding embodiment, details are not described herein.
In addition, weight determines that subelement 7013 may include: initial in some optional implementations of the present embodiment
Weight determination module (not shown), the anchor point coordinate in each localization region for being distributed according to above-mentioned IP address
Number and range, determine the initial weight of each localization region;Gridding module (not shown) is used for user identifier
The center point coordinate for each localization region that associated IP address is distributed is right as the corresponding center point coordinate of user identifier
The corresponding center point coordinate of user identifier recorded in above-mentioned user's operation information set carries out gridding according to geographic layout, raw
At at least two grids;The frequency obtains module (not shown), records in above-mentioned user's operation information set for obtaining
The sum of the initial weight of localization region where the corresponding center point coordinate in each above-mentioned grid of each user identifier, as
Each grid frequency corresponding with each user identifier, and obtain the first of the place of the center point coordinate in each grid localization region
The sum of beginning weight, as the corresponding total user's frequency of each grid;Weight calculation module (not shown), for based on above-mentioned
The frequency calculates the weight of each cluster by TF-IDF algorithm.Wherein, initial weight determining module, gridding module, the frequency obtain
The specific processing of modulus block and weight calculation module and its brought technology effect can refer to corresponding real in Fig. 2 corresponding embodiment
The associated description of existing mode, details are not described herein.
In some optional implementations of the present embodiment, the user identifier coalignment of the present embodiment can also be wrapped
It includes: IP similarity calculated 704, for calculating the IP between above-mentioned user identifier to be matched and each other user identifiers
Location similarity.And matching unit 703 can be also used for according to above-mentioned user identifier to be matched and each other user identifiers it
Between location information similarity and IP address similarity, it is determining with the above-mentioned matched other user identifiers of user identifier to be matched.
The specific processing of the implementation and its brought technology effect can refer to step 503 and step 504 in Fig. 5 corresponding embodiment
Associated description, details are not described herein.
Based on a upper implementation, in some optional implementations of the present embodiment, matching unit 703 be can wrap
Include: characteristic information obtains subelement 7031, corresponding for obtaining above-mentioned user identifier to be matched and each other user identifiers
Characteristic information, features described above information include: IP address similarity between above-mentioned user identifier to be matched and other user identifiers,
Location information similarity;Sorting subunit 7032, for being based on above-mentioned user identifier to be matched and each other user identifiers pair
The characteristic information answered obtains user identifier to be matched and each other user identifier matchings by order models trained in advance
Probability;Coupling subelement 7033, for determine corresponding above-mentioned probability be greater than scheduled threshold value other user identifiers with
State user identifier matching to be matched.Characteristic information obtains subelement 7031, sorting subunit 7032 and coupling subelement 7033
Specific processing and its brought technology effect can refer to the associated description of corresponding implementation in Fig. 5 corresponding embodiment, herein
It repeats no more.
Based on a upper implementation, in some optional implementations of the present embodiment, above-mentioned user's operation information collection
User's operation data information in conjunction can also include: terminal type information, operation system information.And features described above information
It further include at least one of following information: the identical IP address number between above-mentioned user identifier to be matched and other user identifiers
Amount, corresponding center point coordinate are overlapped quantity, terminal models associated by above-mentioned user identifier to be matched and other user identifiers
Information, operation system information.The specific processing of the implementation and its brought technology effect can refer in Fig. 5 corresponding embodiment
The associated description of corresponding implementation, details are not described herein.
In some optional implementations of the present embodiment, the user identifier that is recorded in above-mentioned user's operation information set
It is identified including the first user identifier and second user, above-mentioned user identifier to be matched and above-mentioned each other user identifiers belong to respectively
It is identified in the first user identifier and second user.The specific processing of the implementation and its brought technology effect can refer to Fig. 2
The associated description of corresponding implementation in corresponding embodiment, details are not described herein.
Based on a upper implementation, in some optional implementations of the present embodiment, the user identifier of the present embodiment
Coalignment can also include: the first selection unit (not shown), in above-mentioned location information similarity acquiring unit
Obtain the positioning letter between the other user identifiers recorded in user identifier to be matched and each above-mentioned user's operation information set
After ceasing similarity, according to the sequence of the location information similarity between above-mentioned user identifier to be matched from big to small, upper
It states and successively chooses predetermined quantity second user mark in the second user mark recorded in user's operation information set, waited
Select second user logo collection.And matching unit 703 can be also used for according to above-mentioned user identifier to be matched and above-mentioned candidate
The location information similarity between each second user mark in second user logo collection, determines and above-mentioned to be matched first
The matched second user mark of user identifier.The specific processing of the implementation and its brought technology effect can refer to Fig. 2 pairs
The associated description of corresponding implementation in embodiment is answered, details are not described herein.
Based on a upper implementation, in some optional implementations of the present embodiment, location information similarity is obtained
Unit 702 can be also used in above-mentioned matching unit according to above-mentioned user identifier to be matched and above-mentioned candidate second user identification sets
The location information similarity between each second user mark in conjunction, determination are matched with above-mentioned first user identifier to be matched
It before second user mark, is identified for each second user in above-mentioned candidate second user logo collection, obtains above-mentioned the
Location information similarity between two user identifiers and each first user identifier.And the user identifier matching of the present embodiment
Device can also include: the second selection unit (not shown), for according to the positioning between above-mentioned second user mark
The sequence of information similarity from big to small successively chooses the first user identifier of predetermined quantity, obtains candidate first user identifier collection
It closes;Candidate filter element (not shown) is used in above-mentioned matching unit according to above-mentioned user identifier to be matched and above-mentioned time
Select each second user in second user logo collection identify between location information similarity, it is determining with above-mentioned to be matched the
Before the matched second user mark of one user identifier, in above-mentioned user identifier to be matched not in above-mentioned candidate first user identifier
When in set, above-mentioned second user mark is deleted from above-mentioned candidate second user logo collection.The implementation it is specific
Processing and its brought technology effect can refer to the associated description of corresponding implementation in Fig. 2 corresponding embodiment, no longer superfluous herein
It states.
User identifier match party device provided in this embodiment obtains user's operation by location information acquiring unit 701
At least one localization region that each network protocol IP address recorded in information aggregate is distributed and each localization region
Weight, is supplemented and the perfect corresponding location information of user identifier;And by location information similarity acquiring unit 702 according to
The weight of localization region and each localization region that mark associated IP address in family is distributed, obtains user identifier to be matched
With the location information similarity between other user identifiers for being recorded in each above-mentioned user's operation information set, and by
With unit 703 according to location information similarity, the determining and above-mentioned matched other user identifiers of user identifier to be matched are realized
Accurately and reliably user identifier is matched.
Below with reference to Fig. 8, it illustrates the computer systems 600 for the server for being suitable for being used to realize the embodiment of the present application
Structural schematic diagram.
As shown in figure 8, computer system 800 includes central processing unit (CPU) 801, it can be read-only according to being stored in
Program in memory (ROM) 802 or be loaded into the program in random access storage device (RAM) 803 from storage section 808 and
Execute various movements appropriate and processing.In RAM 803, also it is stored with system 800 and operates required various programs and data.
CPU 801, ROM 802 and RAM 803 are connected with each other by bus 804.Input/output (I/O) interface 805 is also connected to always
Line 804.
I/O interface 805 is connected to lower component: the storage section 806 including hard disk etc.;And including such as LAN card, tune
The communications portion 807 of the network interface card of modulator-demodulator etc..Communications portion 807 executes mailing address via the network of such as internet
Reason.Driver 808 is also connected to I/O interface 805 as needed.Detachable media 809, such as disk, CD, magneto-optic disk, half
Conductor memory etc. is mounted on as needed on driver 808, in order to as needed from the computer program read thereon
It is mounted into storage section 806.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be tangibly embodied in machine readable
Computer program on medium, the computer program include the program code for method shown in execution flow chart.At this
In the embodiment of sample, which can be downloaded and installed from network by communications portion 807, and/or from removable
Medium 809 is unloaded to be mounted.When the computer program is executed by central processing unit (CPU) 601, execute in the present processes
The above-mentioned function of limiting.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong
The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer
The combination of order is realized.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet
Include location information acquiring unit, location information similarity acquiring unit and matching unit.Wherein, the title of these units is at certain
The restriction to the unit itself is not constituted in the case of kind, for example, matching unit is also described as " according to location information phase
Like degree, the determining unit with the matched other user identifiers of user identifier to be matched ".
As on the other hand, present invention also provides a kind of nonvolatile computer storage media, the non-volatile calculating
Machine storage medium can be nonvolatile computer storage media included in device described in above-described embodiment;It is also possible to
Individualism, without the nonvolatile computer storage media in supplying terminal.Above-mentioned nonvolatile computer storage media is deposited
One or more program is contained, when one or more of programs are executed by an equipment, so that the equipment: to pre-
The user's operation information set deposited is analyzed, with obtaining each network protocol IP recorded in the user's operation information set
The weight of at least one localization region and each localization region that location is distributed, wherein in the user's operation information set
User's operation information include following information: user identifier, IP address, anchor point coordinate;The IP according to associated by user identifier
The weight of localization region and each localization region that address is distributed, obtains user identifier to be matched and each user grasps
Make the location information similarity between the other user identifiers recorded in information aggregate;According to location information similarity, determine with
The matched other user identifiers of user identifier to be matched.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein
Can technical characteristic replaced mutually and the technical solution that is formed.
Claims (18)
1. a kind of user identifier matching process, which is characterized in that the described method includes:
The user's operation information set prestored is analyzed, each network recorded in the user's operation information set is obtained
The weight of at least one localization region and each localization region that Protocol IP address is distributed, wherein the user's operation letter
User's operation information in breath set includes following information: user identifier, IP address, anchor point coordinate;
The weight of localization region and each localization region that the IP address according to associated by user identifier is distributed, obtain to
With the location information similarity between the other user identifiers recorded in user identifier and each user's operation information set;
According to location information similarity, the determining and matched other user identifiers of user identifier to be matched;
Wherein, the described pair of user's operation information set prestored is analyzed, and is obtained and is recorded in the user's operation information set
The weight of at least one localization region and each localization region that is distributed of each network protocol IP address, comprising:
Obtain anchor point coordinate set associated by each IP address recorded in the user's operation information set;
For each IP address, clustering is carried out to anchor point coordinate set associated by the IP address, obtains institute
At least one corresponding cluster, the localization region being distributed as the IP;
For each IP address, the weight for each localization region that the IP address is distributed is determined.
2. the method according to claim 1, wherein described be directed to each IP address, with determining the IP
The weight for each localization region that location is distributed, comprising:
The localization region number being distributed is greater than the anchor point coordinate and central point in preset quantity threshold value or localization region
The IP address that the distance average of coordinate is greater than pre-determined distance threshold value is deleted;
For remaining each IP address, the weight for each localization region that the IP address is distributed is determined.
3. the method according to claim 1, wherein each positioning area that the determination IP address is distributed
The weight in domain, comprising:
The number and range for the anchor point coordinate in each localization region being distributed according to the IP address, determine each positioning
The initial weight in region;
The center point coordinate for each localization region that IP address associated by user identifier is distributed is corresponding as user identifier
Center point coordinate, to the corresponding center point coordinate base area removing the work of user identifier recorded in the user's operation information set
Office carries out gridding, generates at least two grids;
Obtain the corresponding center in each grid of each user identifier recorded in the user's operation information set
The sum of the initial weight of localization region, as each grid frequency corresponding with each user identifier, and obtains where point coordinate
The sum of the initial weight of localization region where center point coordinate in each grid, as the corresponding total user's frequency of each grid
It is secondary;
Based on the frequency, the weight of each localization region is calculated by TF-IDF algorithm.
4. method according to claim 1 to 3, which is characterized in that the method also includes:
Calculate the IP address similarity between the user identifier to be matched and each other user identifiers;And
It is described according to location information similarity, it is determining with the matched other user identifiers of user identifier to be matched, comprising:
According between the user identifier to be matched and each other user identifiers location information similarity and IP address it is similar
Degree, the determining and matched other user identifiers of user identifier to be matched.
5. according to the method described in claim 4, it is characterized in that, it is described according to the user identifier to be matched with it is each other
Location information similarity and IP address similarity between user identifier, determination are matched other with the user identifier to be matched
User identifier, comprising:
The user identifier to be matched and the corresponding characteristic information of each other user identifiers are obtained, the characteristic information includes:
IP address similarity, location information similarity between the user identifier to be matched and other user identifiers;
Based on the user identifier to be matched and the corresponding characteristic information of each other user identifiers, pass through sequence trained in advance
Model obtains user identifier to be matched and each matched probability of other user identifiers;
Determine that the corresponding probability is matched greater than other user identifiers of scheduled threshold value with the user identifier to be matched.
6. according to the method described in claim 5, it is characterized in that, user's operation data in the user's operation information set
Information further include: terminal type information, operation system information;And
The characteristic information further includes at least one of following information: the user identifier to be matched and other user identifiers it
Between identical IP address quantity, corresponding center point coordinate be overlapped quantity, the user identifier to be matched and other user identifiers
Associated terminal type information, operation system information.
7. method according to claim 1 to 3, which is characterized in that the use recorded in the user's operation information set
Family mark includes that the first user identifier and second user identify, the user identifier to be matched and each other user identifier difference
Belong to the first user identifier and second user mark.
8. the method according to the description of claim 7 is characterized in that obtaining user identifier to be matched and each user behaviour
After making the location information similarity between the other user identifiers recorded in information aggregate, the method also includes:
According to the sequence of the location information similarity between the user identifier to be matched from big to small, in the user's operation
Predetermined quantity second user mark is successively chosen in the second user mark recorded in information aggregate, obtains candidate second user
Logo collection;
And
It is described according to location information similarity, it is determining with the matched other user identifiers of user identifier to be matched, comprising:
According between each second user mark in the user identifier to be matched and the candidate second user logo collection
Location information similarity, it is determining to be identified with the matched second user of the first user identifier to be matched.
9. according to the method described in claim 8, it is characterized in that, according to the user identifier to be matched and described candidate the
The location information similarity between each second user mark in two user identifier set, it is determining to be used with described to be matched first
Before family identifies matched second user mark, the method also includes:
For each second user mark in the candidate second user logo collection, obtain the second user mark with it is each
Location information similarity between a first user identifier;
Predetermined quantity is successively chosen according to the sequence of the location information similarity between second user mark from big to small
A first user identifier obtains candidate first user identifier set;
If the user identifier to be matched in the candidate first user identifier set, the second user is not identified
It is deleted from the candidate second user logo collection.
10. a kind of user identifier coalignment, which is characterized in that described device includes:
Location information acquiring unit obtains the user's operation letter for analyzing the user's operation information set prestored
The power of at least one localization region and each localization region that each network protocol IP address recorded in breath set is distributed
Weight, wherein the user's operation information in the user's operation information set includes following information: user identifier, IP address, positioning
Point coordinate;
Location information similarity acquiring unit, the localization region being distributed for the IP address according to associated by user identifier and
The weight of each localization region obtains the other use recorded in user identifier to be matched and each user's operation information set
Location information similarity between the mark of family;
Matching unit, for according to location information similarity, the determining and described matched other users of user identifier to be matched to be marked
Know;
Wherein, the location information acquiring unit includes:
Coordinate set obtains subelement, for obtaining associated by each IP address recorded in the user's operation information set
Anchor point coordinate set;
Subelement is clustered, for being directed to each IP address, anchor point coordinate set associated by the IP address is carried out
Clustering obtains at least one corresponding cluster, the localization region being distributed as the IP;
Weight determines subelement, for being directed to each IP address, determines each localization region that the IP address is distributed
Weight.
11. device according to claim 10, which is characterized in that the weight determines that subelement includes:
Extensive IP removes module, and the localization region number for will be distributed is greater than in preset quantity threshold value or localization region
The IP address that the distance average of anchor point coordinate and center point coordinate is greater than pre-determined distance threshold value is deleted;
Weight determination module determines each positioning that the IP address is distributed for being directed to remaining each IP address
The weight in region.
12. device according to claim 10, which is characterized in that the weight determines that subelement includes:
Initial weight determining module, of the anchor point coordinate in each localization region for being distributed according to the IP address
Several and range, determines the initial weight of each localization region;
The center point coordinate of gridding module, each localization region for IP address associated by user identifier to be distributed is made
For the corresponding center point coordinate of user identifier, to the corresponding central point of user identifier recorded in the user's operation information set
Coordinate carries out gridding according to geographic layout, generates at least two grids;
The frequency obtains module, corresponding each for obtaining each user identifier recorded in the user's operation information set
The sum of the initial weight of localization region where center point coordinate in the grid, as each grid and each user identifier pair
The frequency answered, and the sum of the initial weight of the place of the center point coordinate in each grid localization region is obtained, as each grid
Corresponding total user's frequency;
Weight calculation module calculates the weight of each cluster by TF-IDF algorithm for being based on the frequency.
13. any device of 0-12 according to claim 1, which is characterized in that described device further include:
IP similarity calculated, for calculating the IP address between the user identifier to be matched and each other user identifiers
Similarity;And
The matching unit is also used to according to the location information between the user identifier to be matched and each other user identifiers
Similarity and IP address similarity, the determining and matched other user identifiers of user identifier to be matched.
14. device according to claim 13, which is characterized in that the matching unit includes:
Characteristic information obtains subelement, for obtaining the user identifier to be matched and the corresponding feature of each other user identifiers
Information, the characteristic information include: IP address similarity between the user identifier to be matched and other user identifiers, positioning
Information similarity;
Sorting subunit is led to for being based on the user identifier to be matched and the corresponding characteristic information of each other user identifiers
After order models trained in advance, user identifier to be matched and each matched probability of other user identifiers are obtained;
Coupling subelement, for determine the corresponding probability be greater than scheduled threshold value other user identifiers with it is described to be matched
User identifier matching.
15. device according to claim 14, which is characterized in that the user's operation number in the user's operation information set
It is believed that breath further include: terminal type information, operation system information;And
The characteristic information further includes at least one of following information: the user identifier to be matched and other user identifiers it
Between identical IP address quantity, corresponding center point coordinate be overlapped quantity, the user identifier to be matched and other user identifiers
Associated terminal type information, operation system information.
16. any device of 0-12 according to claim 1, which is characterized in that recorded in the user's operation information set
User identifier include that the first user identifier and second user identify, the user identifier to be matched and each other user identifiers
It is belonging respectively to the first user identifier and second user mark.
17. device according to claim 16, which is characterized in that described device further include:
First selection unit, for the location information similarity acquiring unit obtain user identifier to be matched with it is each described
After the location information similarity between other user identifiers recorded in user's operation information set, according to it is described to be matched
The sequence of location information similarity from big to small between user identifier, second recorded in the user's operation information set
Predetermined quantity second user mark is successively chosen in user identifier, obtains candidate second user logo collection;And
The matching unit be also used to according to the user identifier to be matched with it is each in the candidate second user logo collection
Location information similarity between a second user mark, the determining and matched second user of the first user identifier to be matched
Mark.
18. device according to claim 17, which is characterized in that the location information similarity acquiring unit is also used to
The matching unit is according to each second user in the user identifier to be matched and the candidate second user logo collection
Location information similarity between mark, before the matched second user of determining and described first user identifier to be matched identifies,
For each second user mark in the candidate second user logo collection, the second user mark and each the are obtained
Location information similarity between one user identifier;And
Described device further include:
Second selection unit, for the sequence according to the location information similarity between second user mark from big to small
The first user identifier of predetermined quantity is successively chosen, candidate first user identifier set is obtained;
Candidate filter element is used in the matching unit according to the user identifier to be matched and the candidate second user mark
Know the location information similarity between each second user mark in set, determining and first user identifier to be matched
It, will when the user identifier to be matched is not in the candidate first user identifier set before the second user mark matched
The second user mark is deleted from the candidate second user logo collection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610172168.XA CN105721629B (en) | 2016-03-24 | 2016-03-24 | User identifier matching process and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610172168.XA CN105721629B (en) | 2016-03-24 | 2016-03-24 | User identifier matching process and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105721629A CN105721629A (en) | 2016-06-29 |
CN105721629B true CN105721629B (en) | 2019-04-26 |
Family
ID=56159077
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610172168.XA Active CN105721629B (en) | 2016-03-24 | 2016-03-24 | User identifier matching process and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105721629B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228187A (en) * | 2016-07-21 | 2016-12-14 | 贵州力创科技发展有限公司 | Individual recognizer model based on multiple user's detail data and treatment technology |
CN106789411B (en) * | 2016-12-07 | 2020-01-21 | 北京亚鸿世纪科技发展有限公司 | Method and device for acquiring active IP data in machine room |
US10348745B2 (en) | 2017-01-05 | 2019-07-09 | Cisco Technology, Inc. | Associating a user identifier detected from web traffic with a client address |
CN109104506B (en) * | 2017-06-20 | 2021-05-14 | 腾讯科技(深圳)有限公司 | Method and device for determining domain name resolution rule and computer readable storage medium |
CN109005513B (en) * | 2018-06-26 | 2021-03-19 | 北京酷云互动科技有限公司 | Mobile phone terminal association method and mobile phone terminal association system |
CN109447114B (en) * | 2018-09-25 | 2020-11-06 | 北京酷云互动科技有限公司 | Method and system for evaluating association degree between places |
CN110493368B (en) * | 2019-08-21 | 2022-02-25 | 北京明略软件系统有限公司 | Matching method and device of equipment identifiers |
CN111026937B (en) * | 2019-11-13 | 2021-02-19 | 百度在线网络技术(北京)有限公司 | Method, device and equipment for extracting POI name and computer storage medium |
CN111127094B (en) * | 2019-12-19 | 2023-08-25 | 秒针信息技术有限公司 | Account matching method and device, electronic equipment and storage medium |
CN117172792A (en) * | 2023-11-02 | 2023-12-05 | 赞塔(杭州)科技有限公司 | Customer information management method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101409868A (en) * | 2008-12-01 | 2009-04-15 | 腾讯科技(深圳)有限公司 | Method, system and equipment for matching object in mobile terminal |
CN102056079A (en) * | 2009-10-30 | 2011-05-11 | 中国移动通信集团上海有限公司 | Method, device and system for determining information to be pushed |
CN105187237A (en) * | 2015-08-12 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | Method and device for searching associated user identifications |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120174205A1 (en) * | 2010-12-31 | 2012-07-05 | International Business Machines Corporation | User profile and usage pattern based user identification prediction |
-
2016
- 2016-03-24 CN CN201610172168.XA patent/CN105721629B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101409868A (en) * | 2008-12-01 | 2009-04-15 | 腾讯科技(深圳)有限公司 | Method, system and equipment for matching object in mobile terminal |
CN102056079A (en) * | 2009-10-30 | 2011-05-11 | 中国移动通信集团上海有限公司 | Method, device and system for determining information to be pushed |
CN105187237A (en) * | 2015-08-12 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | Method and device for searching associated user identifications |
Also Published As
Publication number | Publication date |
---|---|
CN105721629A (en) | 2016-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105721629B (en) | User identifier matching process and device | |
CN105608179B (en) | The method and apparatus for determining the relevance of user identifier | |
CN105431844B (en) | Third party for search system searches for application | |
CN108282527B (en) | Generate the distributed system and method for Service Instance | |
CN105247507B (en) | Method, system and storage medium for the influence power score for determining brand | |
CN104008139B (en) | The creation method and device of video index table, the recommendation method and apparatus of video | |
CN111046237B (en) | User behavior data processing method and device, electronic equipment and readable medium | |
WO2019062081A1 (en) | Salesman profile formation method, electronic device and computer readable storage medium | |
CN109918378A (en) | A kind of remotely-sensed data storage method and storage system based on block chain | |
CN108549909B (en) | Object classification method and object classification system based on crowdsourcing | |
CN107977678A (en) | Method and apparatus for output information | |
CN110191183A (en) | Accurate intelligent method for pushing, system, device and computer readable storage medium | |
CN109416684A (en) | The intake manager of analysis platform | |
CN109862100A (en) | Method and apparatus for pushed information | |
KR101346927B1 (en) | Search device, search method, and computer-readable memory medium for recording search program | |
CN110209658A (en) | Data cleaning method and device | |
CN110399564B (en) | Account classification method and device, storage medium and electronic device | |
CN109614549B (en) | Method and apparatus for pushed information | |
CN116263659A (en) | Data processing method, apparatus, computer program product, device and storage medium | |
CN108182180B (en) | Method and apparatus for generating information | |
CN110532254A (en) | The method and apparatus of fused data table | |
CN109902698A (en) | Information generating method and device | |
CN105849719A (en) | Augmented reality | |
CN116186119A (en) | User behavior analysis method, device, equipment and storage medium | |
CN110062112A (en) | Data processing method, device, equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |