CN103207901A - Method and system for obtaining internet protocol address attribution position based on search engine - Google Patents

Method and system for obtaining internet protocol address attribution position based on search engine Download PDF

Info

Publication number
CN103207901A
CN103207901A CN201310091285XA CN201310091285A CN103207901A CN 103207901 A CN103207901 A CN 103207901A CN 201310091285X A CN201310091285X A CN 201310091285XA CN 201310091285 A CN201310091285 A CN 201310091285A CN 103207901 A CN103207901 A CN 103207901A
Authority
CN
China
Prior art keywords
word
address
ground
user
weighted value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310091285XA
Other languages
Chinese (zh)
Other versions
CN103207901B (en
Inventor
阮星华
才鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201310091285.XA priority Critical patent/CN103207901B/en
Publication of CN103207901A publication Critical patent/CN103207901A/en
Application granted granted Critical
Publication of CN103207901B publication Critical patent/CN103207901B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a method and a system for obtaining an internet protocol (IP) address attribution position based on a search engine. The method includes: S1, obtaining a user search record during a period, and distinguishing a place name term and a term with region attributes in a search term of the user search record, wherein the user search record comprises a user identity (ID), the search term and a user IP address; S2, using a user search record with the IP address attribution position labeled in advance as a sample for training to obtain a confidence coefficient of the term with the region attributes; and S3, determining the IP address attribution position according to the user ID, the place name term in the search term which is distinguished, the term with the region attributes and the confidence coefficient of the term with the region attributes. The method and the system for obtaining the IP address attribution position based on the search engine can use the search engine as a basic to accurately obtain the attribution position of the user IP address.

Description

A kind of method and apparatus that obtains IP address ownership place based on search engine
[technical field]
The present invention relates to Internet protocol (IP) address location technology, relate in particular to a kind of method and apparatus that obtains IP address ownership place based on search engine.
[background technology]
Along with the continuous development of search engine technique, the region expanded function of search engine also more and more is subject to people's attention." region expanded function " is that search engine refers to return the Search Results with regional characteristic to the user according to the geographic position at user place, for example, be positioned at Pekinese's user search queries word and be " weather ", then search engine can return Pekinese's weather forecast information to it, and similarly " region expanded function " can meet consumers' demand intelligently more exactly.
And one of key point that realizes " region expanded function " is exactly to determine the ownership place of IP address.In the existing method, have only Virtual network operator can grasp the ownership place that it has jurisdiction over the IP address usually, namely need the company of IP address information of home location to obtain to third parties such as Virtual network operators by commercial cooperation, increased certain cost.
[summary of the invention]
In view of this, the invention provides and a kind ofly obtain the method and apparatus of IP address ownership place based on search engine, can accurately obtain the geographical location information at place, IP address.
Concrete technical scheme is as follows:
A kind ofly obtain the method for IP address ownership place based on search engine, this method comprises:
S1, obtain the user search record in a period of time, described user search record comprises user ID (ID), query word and IP address, and identifies the ground noun in the query word of described user search record and the word that has the ground Domain Properties;
The user search record that S2, utilization mark IP address ownership place in advance obtains the described degree of confidence that has the word of ground Domain Properties as sample training;
S3, according to the ground noun in the user ID in the described user search record, the query word that identifies and word and the described degree of confidence that has the word of ground Domain Properties that has the ground Domain Properties, determine the ownership place of this IP address.
The one preferred exemplifying embodiment according to the present invention, the ground noun among the step S1 in the query word of the described user search of identification record and the word that has a ground Domain Properties specifically comprise:
S11, the query word in the described user search record is carried out participle, identification ground noun wherein;
Non-place name participle in S12, the extraction query word will be higher than the non-place name participle of predetermined threshold value as the word that has the ground Domain Properties with the co-occurrence rate of ground noun in query word.
According to one preferred embodiment of the present invention, also comprise behind the described step S12:
S13, the described word that has the ground Domain Properties is carried out meaning of a word analysis, extract the word that has the ground Domain Properties that meaning of a word weighted value is higher than predetermined threshold value.
According to one preferred embodiment of the present invention, also comprise behind the described step S13:
S14, basis have the affiliated classification of the word of ground Domain Properties, and the word that has the ground Domain Properties that described step S13 is extracted carries out normalized.
According to one preferred embodiment of the present invention, described step S2 specifically comprises:
According to formula
Figure BDA00002945631700021
Obtain the degree of confidence P[M of the word M that has the ground Domain Properties], wherein, T[place name i] be to have the word M of ground Domain Properties and the record number of ground noun i co-occurrence in the described training sample, R[place name i] the IP address ownership place that marks in advance during with ground noun i co-occurrence for the word M that has the ground Domain Properties in the described training sample is the record number of the corresponding region of this ground noun i, n be in the training sample with the ground noun number of M co-occurrence.
According to one preferred embodiment of the present invention, the ownership place of definite this IP address is described in the step S3:
Calculate first weighted value that IP address belongs to each region of described ground noun correspondence according to preset rule, determine the ownership place of this IP address according to this first weighted value.
According to one preferred embodiment of the present invention,, specifically comprise when calculating IP address and belong to first weighted value of each region of described ground noun correspondence according to preset rule described:
According to formula Obtain the first weighted value Z[L that the IP address belongs to region L], wherein, Cid contains the user ID number that comprises in this IP address user searching record of ground noun, C[L, word i] be the ground noun of region L correspondence in described this IP address user searching record that contains the ground noun and the corresponding user ID number of record of co-occurrence of the word i that has the ground Domain Properties, P[word i] for having the degree of confidence of word i of ground Domain Properties, m is the number that has the word of ground Domain Properties in described this IP address user searching record that contains the ground noun.
According to one preferred embodiment of the present invention, describedly determine that according to this first weighted value the ownership place of this IP address is:
IP address is belonged in first weighted value of each region of described ground noun correspondence, the highest region of first weighted value is as the ownership place of this IP address.
According to one preferred embodiment of the present invention, this method also comprises:
Acquiescence urban information and user ID that user in a period of time that S4, basis are obtained in advance arranges in the map search engine are calculated second weighted value that the IP address belongs to each region according to preset rule;
Describedly determine that according to this first weighted value the ownership place of this IP address is specially:
Integrate first weighted value and second weighted value that the IP address belongs to each region, obtain the final ownership place of IP address.
According to one preferred embodiment of the present invention, described calculating IP address belongs to second weighted value of each region, specifically comprises:
The ratio that the acquiescence city that the described user who obtains is in advance arranged in the map search engine belongs to the user ID number of a certain region and total user ID number belongs to second weighted value of described a certain region as the IP address.
According to one preferred embodiment of the present invention, described integration IP address belongs to first weighted value and second weighted value of each region, and the final ownership place that obtains the IP address specifically comprises:
First weighted value and second weighted value that the IP address are belonged to each region multiply each other, and obtain the comprehensive weight value that the IP address belongs to each region, and the region that the comprehensive weight value is the highest is as the ownership place of IP address.
A kind ofly obtain the device of IP address ownership place based on search engine, this device comprises:
Pretreatment unit is used for obtaining the user search record in a period of time, and described user search record comprises user ID, query word and IP address, and identifies the ground noun in the query word of described user search record and the word that has the ground Domain Properties;
Training unit is used for utilizing the user search record that marks IP address ownership place in advance to obtain the described degree of confidence that has the word of ground Domain Properties as sample training;
Judgement unit is used for user ID, the ground noun in the query word that identifies and word and the described degree of confidence that has the word of ground Domain Properties that has the ground Domain Properties according to described user search record, determines the ownership place of this IP address.
According to one preferred embodiment of the present invention, described pretreatment unit in the query word of the described user search of identification record the ground noun and when having the word of ground Domain Properties, the concrete execution:
S21, the query word in the described user search record is carried out participle, identification ground noun wherein;
Non-place name participle in S22, the extraction query word will be higher than the non-place name participle of predetermined threshold value as the word that has the ground Domain Properties with the co-occurrence rate of ground noun in query word.
According to one preferred embodiment of the present invention, described pretreatment unit is also carried out after carrying out S22:
S23, the described word that has the ground Domain Properties is carried out meaning of a word analysis, extract the word that has the ground Domain Properties that meaning of a word weighted value is higher than predetermined threshold value.
According to one preferred embodiment of the present invention, described pretreatment unit is also carried out after carrying out S23:
S24, basis have the affiliated classification of the word of ground Domain Properties, and the word that has the ground Domain Properties that described step S23 is extracted carries out normalized.
According to one preferred embodiment of the present invention, described training unit is specifically carried out:
According to formula
Figure BDA00002945631700041
Obtain the degree of confidence P[M of the word M that has the ground Domain Properties], wherein, T[place name i] be to have the word M of ground Domain Properties and the record number of ground noun i co-occurrence in the described training sample, R[place name i] the IP address ownership place that marks in advance during with ground noun i co-occurrence for the word M that has the ground Domain Properties in the described training sample is the record number of the corresponding region of this ground noun i, n be in the training sample with the ground noun number of M co-occurrence.
According to one preferred embodiment of the present invention, described judgement unit is specifically carried out when determining the ownership place of this IP address:
Calculate first weighted value that IP address belongs to each region of described ground noun correspondence according to preset rule, determine the ownership place of this IP address according to this first weighted value.
According to one preferred embodiment of the present invention, when described judgement unit calculates IP address and belongs to first weighted value of each region of described ground noun correspondence according to preset rule, concrete execution:
According to formula
Figure BDA00002945631700051
Obtain the first weighted value Z[L that the IP address belongs to region L], wherein, Cid contains the user ID number that comprises in this IP address user searching record of ground noun, C[L, word i] be the ground noun of region L correspondence in described this IP address user searching record that contains the ground noun and the corresponding user ID number of record of co-occurrence of the word i that has the ground Domain Properties, P[word i] for having the degree of confidence of word i of ground Domain Properties, m is the number that has the word of ground Domain Properties in described this IP address user searching record that contains the ground noun.
According to one preferred embodiment of the present invention, when described judgement unit is determined the ownership place of this IP address according to this first weighted value, the concrete execution:
IP address is belonged in first weighted value of each region of described ground noun correspondence, the highest region of first weighted value is as the ownership place of this IP address.
According to one preferred embodiment of the present invention, this device also comprises:
The cartographic information judgement unit for the acquiescence urban information and the user ID that arrange at the map search engine according to the user in a period of time of obtaining in advance, calculates second weighted value that the IP address belongs to each region according to preset rule;
When described judgement unit is determined the ownership place of this IP address according to this first weighted value, the concrete execution:
Integrate first weighted value and second weighted value that the IP address belongs to each region, obtain the final ownership place of IP address.
According to one preferred embodiment of the present invention, when described cartographic information judgement unit calculating IP address belongs to second weighted value of each region, the concrete execution:
The ratio that the acquiescence city that the described user who obtains is in advance arranged in the map search engine belongs to the user ID number of a certain region and total user ID number belongs to second weighted value of described a certain region as the IP address.
According to one preferred embodiment of the present invention, described judgement unit is integrated first weighted value and second weighted value that the IP address belongs to each region, when obtaining the final ownership place of IP address, and concrete the execution:
First weighted value and second weighted value that the IP address are belonged to each region multiply each other, and obtain the comprehensive weight value that the IP address belongs to each region, and the region that the comprehensive weight value is the highest is as the ownership place of IP address.
As can be seen from the above technical solutions, the present invention is by analyzing the query word Query in the user search record in a period of time of obtaining in advance, identification ground noun wherein and the word that has the ground Domain Properties, and combined training obtain have ground Domain Properties word and user ID, can obtain the ownership place of IP address, simultaneously, information such as the acquiescence city that can also arrange when using the map search engine according to the user and user ID are integrated the final ownership place that obtains IP address.The invention enables Internet firm can utilize search engine to analyze the ownership place that obtains the user ID address automatically.
[description of drawings]
Fig. 1 obtains the method flow diagram of IP address ownership place for what the embodiment of the invention one provided based on search engine;
Ground noun among the identification query word Query that Fig. 2 provides for the embodiment of the invention one and the method flow diagram that has the word of ground Domain Properties;
The user search record exemplary plot that marks IP address ownership place in advance that Fig. 3 provides for the embodiment of the invention one;
The user search record exemplary plot that Fig. 4 provides for the embodiment of the invention one;
The acquiescence urban information that the user that Fig. 5 provides for the embodiment of the invention one arranges in the map search engine and user ID record exemplary plot;
Ground noun among the identification query word Query that Fig. 6 provides for the embodiment of the invention two and the device synoptic diagram that has the word of ground Domain Properties.
[embodiment]
In order to make the purpose, technical solutions and advantages of the present invention clearer, describe the present invention below in conjunction with the drawings and specific embodiments.
Search behavior when analysis user is used search engine can find, the user can obtain usually that it is for information about on-site by search engine, and therefore, the user tends to the information in implicit its geographic position in the query word Query that search engine is searched for.The present invention obtains the geographical location information of IP address just by the searching record of analyzing user in a period of time.
Embodiment one
Fig. 1 obtains the method flow diagram of IP address ownership place for what the embodiment of the invention one provided based on search engine, and as shown in Figure 1, this method comprises:
User search record in a period of time that S101, analysis are obtained in advance, the ground noun among the query word Query of identification user search and the word that has the ground Domain Properties.
Information in can pre-recorded a period of time during user's access search engine, those information can comprise query word Query and the IP address of user ID, user search, those information are formed a user search record preserve.Wherein, user ID is when the browser access search engine web site of user's first pass terminal (such as PC, mobile phone, panel computer etc.), ID for user's distribution, this user ID is kept among the Cookie of user PC end, as user again during access search engine website, can directly from the Cookie of user PC end, obtain user ID afterwards.Preserving the time length of user search record can set as required, for example, can preserve the user search record in 30 days." 00017255861E0FE2D25B26B6BDB1139A; 114.112.29.35; Beijing 362 tunnel public transport " is the example of a user search record, wherein, " 00017255861E0FE2D25B26B6BDB1139A " is user ID, " 114.112.29.35 " is IP address, and " Beijing 362 tunnel public transport " are the query word Query of user search.
In order to analyze the ownership place that obtains IP address according to the query word Query of user search, behind the user search record that can be in a period of time of being obtained in advance, the query word Query of further analyzing and processing user search is with the word of identifying the ground noun and having the ground Domain Properties from Query.The word that has the ground Domain Properties namely refers to the word that the region correlativity is higher, and for example, the region correlativity of " public transport " and " weather " is higher, and the region correlativity of " universal gravitation " is lower, can think that " public transport " and " weather " is for having the word of ground Domain Properties.As shown in Figure 2, can be by the ground noun among the following step S1011-S1012 identification Query and the word that has the ground Domain Properties:
S1011, Query is carried out word segmentation processing, and obtain the ground noun among the Query.
Can carry out word segmentation processing to Query earlier, Query is divided into one by one independently participle, this process belongs to prior art, exceeds at this and gives unnecessary details.Afterwards, belong to the participle of ground noun in the participle of identification Query, can by with the participle among the Query respectively with the dictionary of place name of setting up in advance in the ground noun mate to finish this identifying.
Further, can also be in this step the place name root among the Query that identifies be region under it according to the subordinate relation in its geographic position with its normalizing, for example, a certain query word Query is " how to get to the apple orchard arrives northern shadow by the subway ", identifying wherein, " apple orchard " and " northern shadow " is place name, can further in the dictionary of place name of setting up in advance, inquire about the affiliated region of these two place names, learn that " apple orchard " and " northern shadow " all is positioned at Beijing, therefore, ground noun " apple orchard " and " northern shadow " normalizing that identifies in this Query can be " Beijing ", the ground noun of namely differentiating among this Query is " Beijing ".
S1012, extract non-place name participle among the Query, and check the co-occurrence rate of each non-place name participle and ground noun, with the non-place name participle that wherein is higher than predetermined threshold value with the co-occurrence rate of ground noun as have the word of Domain Properties.
After Query being carried out participle and identifying wherein ground noun, can extract the participle (follow-up be called non-place name participle) of non-ground noun among the Query, and check the co-occurrence rate of each non-place name participle and ground noun.Refer to namely that with the co-occurrence rate of ground noun a certain non-place name participle and all ground nouns appear at the frequency among the Query simultaneously, each non-place name participle can obtain by following method with the co-occurrence rate of ground noun: the Query that occurs a certain non-place name participle and arbitrary ground noun among the query word Query of the user search of a period of time that statistics is obtained in advance simultaneously counts N1, and the Query that occurs this a certain non-place name participle among the query word Query counts N2, and then this a certain non-place name participle is N1/N2 with the co-occurrence rate of ground noun.For example, occurred among 2000 Query of the user search record of " dining room " this participle in a period of time of obtaining in advance, and " dining room " and arbitrary ground noun occurred in 400 Query jointly, and then " dining room " is 400/2000=0.2 with the co-occurrence rate of ground noun.After the co-occurrence rate that has obtained each non-place name participle and ground noun, will be higher than the non-place name participle of predetermined threshold value with the co-occurrence rate of ground noun as the word that has the ground Domain Properties.
Can obtain having among the query word Query of user search the word of ground Domain Properties by above-mentioned steps S1011-S1012, further, can also in the resulting word that has a ground Domain Properties, extract core ground Domain Properties word by following step S1013.
S1013, the resulting word that has the ground Domain Properties is carried out meaning of a word analysis, and extract core ground Domain Properties word.
Can carry out meaning of a word analysis to the resulting word that has the ground Domain Properties, have the significance level of the meaning of a word in Query of the word of ground Domain Properties according to each, for weight set in each word that has the ground Domain Properties, wherein, the weighted value of the word that has the ground Domain Properties that the meaning of a word is more important is more high, can extract weighted value at last and be higher than the word that has the ground Domain Properties of predetermined threshold value as core ground Domain Properties word.For example, have among a certain Query " weather " and " " two words that have the ground Domain Properties, set weight by meaning of a word analysis after, the weighted value of " weather " is higher than predetermined threshold value, and " " weighted value less than predetermined threshold value, therefore, extract " weather " as core ground Domain Properties word.Participle among the Query is carried out the part of speech analysis, and belong to prior art according to meaning of a word setting weight, exceed at this and give unnecessary details.
Can from the word that has the ground Domain Properties, extract core ground Domain Properties word by step S1013, further, can also carry out normalized to resulting core ground Domain Properties word by following step S1014, obtain final core ground Domain Properties word.
Can carry out normalized to the core ground Domain Properties word that obtains among the step S1013, normalized refers to that namely the word that will belong to same type carries out normalization, for example, " public transport ", " bus ", " motorbus " all belongs to " public transport " this classification, therefore, with " public transport " in the core ground Domain Properties word, " bus ", " motorbus " all is normalized to " public transport ", " dining room ", " restaurant ", " restaurant " all belongs to " dining room " this classification, therefore, with " dining room " in the core ground Domain Properties word, " restaurant ", " restaurant " all is normalized to " dining room ".Be understandable that, above-mentioned giving an example only for exemplary purposes, embodiments of the invention are not limited thereto.Can realize by trained text classifier in advance the normalized of core ground Domain Properties word, namely, it is classified to resulting core ground Domain Properties word with trained text classification in advance, and each core ground Domain Properties word is normalized to classification under it, obtain final core ground Domain Properties word, this method belongs to prior art, exceeds at this and gives unnecessary details.
Can identify the ground noun among the query word Query of user search by above-mentioned steps S1011-S1014, and word (or the core ground Domain Properties word that has the ground Domain Properties, or the final core ground Domain Properties word after the normalization), can obtain the ownership place of IP address by step S102 according to those information analyses.
S102, utilize the user search record mark IP address ownership place in advance to obtain having the degree of confidence of the word of ground Domain Properties as sample training.
In order accurately to obtain the ownership place of IP address, can obtain the degree of confidence of the word that has the ground Domain Properties among the Query earlier, a certain degree of confidence that has the word of ground Domain Properties namely characterizes the significance level of the influence power of this word that has the ground Domain Properties when differentiating IP address ownership place.The degree of confidence that has the word of ground Domain Properties can be by obtaining as training behind the sample with the user search record that has marked IP address ownership place in advance, specifically can train by following method and obtain a certain degree of confidence that has the word of ground Domain Properties: obtain the user search record that has the ground noun and marked IP address ownership place in advance, statistics comprises the record number of this word that has the ground Domain Properties and each ground noun simultaneously in the Query of those records, be designated as T[place name 1 respectively], T[place name 2], T[place name n], add up simultaneously in those records this word that has the ground Domain Properties with certain during the noun co-occurrence, the IP ownership place is the record number of this place name, be designated as R[place name 1 respectively], R[place name 2], R[place name n], this degree of confidence that has the word of ground Domain Properties is designated as P, then For example, Fig. 3 is for having marked the user search record exemplary plot of IP address ownership place in advance, to from example shown in Figure 3, obtain to have the degree of confidence of the word " public transport " of ground Domain Properties, then add up " public transport " and the co-occurrence frequency of each place name in Query, as, " public transport " together occurred in the Query of 4 records with " Nanjing ", T[Nanjing then]=4, it is Nanjing that the IP address ownership place of 3 records is wherein arranged, R[Nanjing then]=3, same, can count T[Beijing at " public transport "], T[Tianjin], R[Beijing], R[Tianjin] etc., at last, the degree of confidence of " public transport " is
Figure BDA00002945631700103
Need to prove, if in step S101, further from the word that has the ground Domain Properties, extracted core ground Domain Properties word, perhaps further obtained the final core ground Domain Properties word after the normalization, the degree of confidence for the final core ground Domain Properties word after core ground Domain Properties word or the normalization that then training obtains in the above-mentioned training process.
S103, calculate first weighted value that the IP address belongs to each ground, region noun among the Query according to the query word Query of user ID, user search and the degree of confidence that has a word of ground Domain Properties by preset rule, the region that first weighted value is the highest is as the ownership place of IP address.
Searching record in analysis user, identify the ground noun among the query word Query of user search and had word (or the core ground Domain Properties word of ground Domain Properties, or the final core ground Domain Properties word after the normalization), and obtaining each word that has the ground Domain Properties (or core ground Domain Properties word, or the final core ground Domain Properties word after the normalization) behind the degree of confidence P, can calculate first weighted value that the IP address belongs to each ground, region noun among its corresponding Query according to preset rule, and the region that first weighted value is the highest is as the ownership place of IP address.Calculate first weighted value that a certain IP address belongs to each region for a kind of preferred implementation provided by the invention below: choose this IP address user searching record that contains the ground noun among the Query, statistics contains the user ID number in this IP address user searching record, be designated as Cid, statistics comprises this region ground noun simultaneously and each has word (or the core ground Domain Properties word of ground Domain Properties, or the final core ground Domain Properties word after the normalization) the corresponding user ID number of Query, be designated as the C[place name respectively, word 1], the C[place name, word 2] ... the C[place name, word m], first weighted value that this IP address is belonged to this region is designated as the Z[place name], then
Figure BDA00002945631700111
Wherein, word 1, word 2 ... word m refers to that namely each has the degree of confidence of the word of ground Domain Properties (or core ground Domain Properties word, or the final core ground Domain Properties word after the normalization).Can calculate first weighted value that a certain IP address belongs to each region by said method, last, the region that first weighted value is the highest is as the ownership place of IP address.Be understandable that, the method that above-mentioned calculating IP address belongs to first weighted value of each region only is a kind of preferred implementation provided by the invention, can set different rules in actual applications as required and calculate first weighted value that the IP address belongs to each region, the present invention does not limit this.
Further set forth the first weighted value computation process that the above-mentioned IP address belongs to each region below by concrete example, for example, Fig. 4 records exemplary plot for the IP address of extracting the record of the user search in a period of time of obtaining the in advance example for the user search that contains ground, region noun among " 114.112.29.35 " and the Query, as shown in Figure 4, occur the ground noun of " Nanjing " and " Beijing " these two regions among the Query of those user search records altogether, then can adopt said method to calculate first weighted value that this IP belongs to " Nanjing " and " Beijing " respectively.In those user search records, 3 different user ID have appearred altogether, Cid=3 then, suppose that it is the word that has the ground Domain Properties that " public transport " and " weather " two words are arranged in the Query of those user search records, wherein " public transport " and " Nanjing " occurred in the corresponding user search record of two different user ID altogether, C[Nanjing then, public transport]=2, " weather " occurred in the user search record that 1 user ID is answered altogether with " Nanjing ", C[Nanjing then, weather]=1, same, can obtain C[Beijing, public transport]=0, C[Beijing, weather]=1, the degree of confidence of supposing " public transport " and " weather " is respectively P[public transport]=0.6, P[weather]=0.75, then IP address " 114.112.29.35 " belongs to first weighted value in " Nanjing " and is
Figure BDA00002945631700121
Figure BDA00002945631700122
First weighted value that belongs to " Beijing " is
Figure BDA00002945631700123
Figure BDA00002945631700124
As seen, first weighted value that this IP address belongs to " Nanjing " is higher than first weighted value that belongs to " Beijing ", therefore, judges that the ownership place of this IP address is " Nanjing ".Be understandable that, above-mentioned giving an example only for exemplary purposes, embodiments of the invention are not limited thereto.
The method that above-mentioned steps S101-S103 provides can be by analyzing the query word Query in the user search record in a period of time obtain in advance, and in conjunction with user ID, accurately obtain the ownership place of IP address.Afterwards, can be further the record that obtained the IP address ownership place be used for the degree of confidence P that the word that has the ground Domain Properties of said method (or core ground Domain Properties word, or the final core ground Domain Properties word after the normalization) is obtained in training as sample.
Further, method provided by the present invention can also comprise the steps that S104-S105 comes graph search engine in combination to obtain the ownership place of user's IP address.
Acquiescence urban information and user ID that user in a period of time that S104, basis are obtained in advance arranges in the map search engine are calculated second weighted value that the IP address belongs to each region according to preset rule.
Usually, the map search engine is when providing the map search service to the user, can set the acquiescence city for the user, so that the user directly searches for correlation map information in the acquiescence city in its setting when using access graph search engine website, and the user often is exactly its location in the acquiescence city that the map search engine sets, therefore, analyze the acquiescence urban information that the user arranges in a period of time and the ownership place that can obtain IP address in conjunction with user ID in the map search engine.
Can be in advance with the acquiescence city that sets during user's access graph search engine website in a period of time, and information such as user ID and IP address form the record back and preserve.For example " 43179D117F6AC7BD4856744B31F4E0E8; 125.34.37.129; Beijing " is the acquiescence city of user's setting of preserving and the record of user ID and IP address, wherein, " 43179D117F6AC7BD4856744B31F4E0E8 " is user ID, " 125.34.37.129 " is IP address, the acquiescence city that " Beijing " is set for the user.
Acquiescence urban information and the user ID that can in the map search engine, arrange according to the user in a period of time of obtaining in advance, obtain the ownership place of IP address, concrete grammar can for: calculate the second weighted value Z[map that User IP belongs to different cities according to acquiescence urban information and user ID that the user arranges, place name], and with the second weighted value Z[map, place name] the highest city is as the ownership place of IP address, wherein, User IP belongs to the second weighted value Z[map in a certain city, place name] be the ratio of user ID number and the total user ID number in this city for acquiescence city in the record that is obtained in advance.Fig. 5 is acquiescence urban information and the user ID record exemplary plot of " 218.25.103.196 " for the IP address, as shown in Figure 5, in the record of this IP address that obtains, have 4 user ID, wherein the acquiescence city of 3 user ID correspondences is " Shenyang ", the acquiescence city of 1 user ID correspondence is " Changchun ", then this IP address belongs to the second weighted value Z[map in " Shenyang ", Shenyang]=3/4=0.75, this IP address belongs to the second weighted value Z[map in " Changchun ", Changchun]=1/4=0.25, judge that therefore the ownership place of this IP address is " Shenyang ".
Information such as the acquiescence city that arranges in the time of can using the map search engine according to the user by step S104 and user ID are obtained the ownership place of IP address.Afterwards, can be further by the ownership place of step S105 to the User IP that obtains according to the query word Query in the user search record and user ID, and the ownership place of the User IP that obtains of the acquiescence city that arranges in the map search engine according to the user and user ID is integrated.
S105, first weighted value and second weighted value that belong to each region according to IP address are integrated the ownership place of IP address.
Can integrate the ownership place of IP address according to first weighted value and second weighted value that IP address belongs to each region, specifically can adopt following manner to realize:
The IP address is belonged to the first weighted value Z[place name of same region] and the second weighted value Z[map, place name] multiply each other, obtain the comprehensive weight value that the IP address belongs to each region, and the region that the comprehensive weight value is the highest is as the final ownership place of IP address.For example, a certain IP address is respectively Z[Nanjing according to first weighted value that belongs to " Nanjing " and " Beijing " that the query word Query of user ID and user search obtains]=0.65, Z[Beijing]=0.25, second weighted value that belongs to " Nanjing " and " Beijing " that the acquiescence city that this IP address arranges in the map search engine according to the user and user ID obtain is respectively the Z[map, Nanjing]=0.45, the Z[map, Beijing]=0.3, then to belong to the comprehensive weight value in " Nanjing " be Z[Nanjing in this IP address] the Z[map, Nanjing]=0.2925, the comprehensive weight value that belongs to " Nanjing " is Z[Beijing] the Z[map, Beijing]=0.075, it is higher that this IP address belongs to the comprehensive weight value in " Nanjing ", judges that the final ownership place of this IP address is " Nanjing ".
The above-mentioned description of carrying out for the method that the embodiment of the invention one is provided, as can be seen, the present invention can be based on search engine, the ownership place that accurately analyzes IP address according to the user ID in the user search of obtaining the in advance record and query word Query, simultaneously, the ownership place that the acquiescence urban information that the present invention can also arrange in map according to the user and user ID are obtained IP address, and the analysis result of two kinds of methods integrated, result more accurately obtained.By method provided by the present invention, make Internet firm can utilize the search engine analysis to obtain user's location, thereby can further provide the search service with regional characteristic to the user.
Embodiment two
A kind of device synoptic diagram that obtains IP address ownership place based on search engine that Fig. 6 provides for the embodiment of the invention two, as shown in Figure 6, this device comprises: pretreatment unit 10, training unit 20 and judgement unit 30, this device can further include cartographic information judgement unit 40.
Pretreatment unit 10 obtains the user search record in a period of time, and this user search record comprises user ID, query word and IP address, and identifies the ground noun in the query word of described user search record and the word that has the ground Domain Properties.
Information in can pre-recorded a period of time during user's access search engine, those information can comprise query word Query and the IP address of user ID, user search, those information are formed a user search record preserve.Wherein, user ID is when the browser access search engine web site of user's first pass PC end, is the ID that the user distributes, and this user ID is kept among the Cookie of user PC end, as user again during access search engine website, can directly from the Cookie of user PC end, obtain user ID afterwards.Preserving the time length of user search record can set as required, for example, can preserve the user search record in 30 days." 00017255861E0FE2D25B26B6BDB1139A; 114.112.29.35; Beijing 362 tunnel public transport " is the example of a user search record, wherein, " 00017255861E0FE2D25B26B6BDB1139A " is user ID, " 114.112.29.35 " is IP address, and " Beijing 362 tunnel public transport " are the query word Query of user search.
In order to analyze the ownership place that obtains IP address according to the query word Query of user search, behind the user search record that pretreatment unit 10 can be in a period of time of being obtained in advance, the query word Query of further analyzing and processing user search is with the word of identifying the ground noun and having the ground Domain Properties from Query.The word that has the ground Domain Properties namely refers to the word that the region correlativity is higher, and for example, the region correlativity of " public transport " and " weather " is higher, and the region correlativity of " universal gravitation " is lower, can think that " public transport " and " weather " is for having the word of ground Domain Properties.Pretreatment unit 10 can be carried out the ground noun among the following operation S2011-S2012 identification Query and the word that has the ground Domain Properties:
S2011, Query is carried out word segmentation processing, and obtain the ground noun among the Query.
Pretreatment unit 10 can carry out word segmentation processing to Query earlier, and Query is divided into one by one independently participle, and this process belongs to prior art, exceeds at this and gives unnecessary details.Afterwards, belong to the participle of ground noun in the participle of identification Query, pretreatment unit 10 can by with the participle among the Query respectively with the dictionary of place name of setting up in advance in the ground noun mate to finish this identifying.
Further, pretreatment unit 10 can also be in this step be it under region according to the subordinate relation in its geographic position with its normalizing with the place name root among the Query that identifies, for example, a certain query word Query is " how to get to the apple orchard arrives northern shadow by the subway ", identifying wherein, " apple orchard " and " northern shadow " is place name, can further in the dictionary of place name of setting up in advance, inquire about the affiliated region of these two place names, learn that " apple orchard " and " northern shadow " all is positioned at Beijing, therefore, ground noun " apple orchard " and " northern shadow " normalizing that identifies in this Query can be " Beijing ", the ground noun of namely differentiating among this Query is " Beijing ".
S2012, extract the participle of non-ground noun among the Query, and check the co-occurrence rate of each non-place name participle and ground noun, with the non-place name participle that wherein is higher than predetermined threshold value with the co-occurrence rate of ground noun as have the word of Domain Properties.
After Query being carried out participle and identifying wherein ground noun, pretreatment unit 10 can extract the participle of non-ground noun among the Query, and checks the co-occurrence rate of each non-place name participle and ground noun.Refer to namely that with the co-occurrence rate of ground noun a certain non-ground noun participle and all ground nouns appear at the frequency among the Query simultaneously, pretreatment unit 10 can be carried out the co-occurrence rate that following operation obtains each non-place name participle and ground noun: the Query that occurs a certain non-place name participle and arbitrary ground noun among the query word Query of the user search of a period of time that statistics is obtained in advance simultaneously counts N1, and the Query that occurs this a certain non-place name participle among the query word Query counts N2, and then this a certain non-place name participle is N1/N2 with the co-occurrence rate of ground noun.For example, occurred among 2000 Query of the user search record of " dining room " this participle in a period of time of obtaining in advance, and " dining room " and arbitrary ground noun occurred in 400 Query jointly, and then " dining room " is 400/2000=0.2 with the co-occurrence rate of ground noun.After the co-occurrence rate that has obtained each non-place name participle and ground noun, will be higher than the non-place name participle of predetermined threshold value with the co-occurrence rate of ground noun as the word that has the ground Domain Properties.
By carrying out the word that has the ground Domain Properties among the query word Query that aforesaid operations S2011-S2012 pretreatment unit 10 can obtain user search, further, pretreatment unit 10 can also be carried out following operation S2013 and extract core ground Domain Properties word in the resulting word that has a ground Domain Properties.
S2013, the resulting ground Domain Properties that has is carried out meaning of a word analysis, and extract core ground Domain Properties word.
Pretreatment unit 10 can carry out meaning of a word analysis to the resulting word that has the ground Domain Properties, have the significance level of the meaning of a word in Query of the word of ground Domain Properties according to each, for weight set in each word that has the ground Domain Properties, wherein, the weighted value of the word that has the ground Domain Properties that the meaning of a word is more important is more high, can extract weighted value at last and be higher than the word that has the ground Domain Properties of predetermined threshold value as core ground Domain Properties word.For example, have among a certain Query " weather " and " " two words that have the ground Domain Properties, set weight by meaning of a word analysis after, the weighted value of " weather " is higher than predetermined threshold value, and " " weighted value less than predetermined threshold value, therefore, extract " weather " as core ground Domain Properties word.Participle among the Query is carried out the part of speech analysis, and belong to prior art according to meaning of a word setting weight, exceed at this and give unnecessary details.
Behind the executable operations S2013, pretreatment unit 10 can extract core ground Domain Properties word from the word that has the ground Domain Properties, further, pretreatment unit 10 can also be carried out following operation S2014 resulting core ground Domain Properties word is carried out normalized, obtains final core ground Domain Properties word.
S2014, resulting core ground Domain Properties word is carried out normalization, obtain final core ground Domain Properties word.
Pretreatment unit 10 can carry out normalized to the core ground Domain Properties word that obtains among the step S2013, normalized refers to that namely the word that will belong to same type carries out normalization, for example, " public transport ", " bus ", " motorbus " all belongs to " public transport " this classification, therefore, with " public transport " in the core ground Domain Properties word, " bus ", " motorbus " all is normalized to " public transport ", " dining room ", " restaurant ", " restaurant " all belongs to " dining room " this classification, therefore, with " dining room " in the core ground Domain Properties word, " restaurant ", " restaurant " all is normalized to " dining room ".Be understandable that, above-mentioned giving an example only for exemplary purposes, embodiments of the invention are not limited thereto.Can realize by trained text classifier in advance the normalized of core ground Domain Properties word, namely, it is classified to resulting core ground Domain Properties word with trained text classification in advance, and each core ground Domain Properties word is normalized to classification under it, obtain final core ground Domain Properties word, this method belongs to prior art, exceeds at this and gives unnecessary details.
After carrying out aforesaid operations S2011-S2014, pretreatment unit 10 can identify the ground noun among the query word Query of user search, and the word (or core ground Domain Properties word, or the final core ground Domain Properties word after the normalization) that has the ground Domain Properties.
Training unit 20 is used for utilizing the user search record that marks IP address ownership place in advance to obtain the described degree of confidence that has the word of ground Domain Properties as sample training.
In order to cross the ownership place of accurate acquisition IP address, can obtain the degree of confidence of the word that has the ground Domain Properties among the Query by training unit 20, a certain degree of confidence that has the word of ground Domain Properties namely characterizes the significance level of the influence power of this word that has the ground Domain Properties when differentiating IP address ownership place.The degree of confidence that has the word of ground Domain Properties can be by obtaining as training behind the sample with the user search record that has marked IP address ownership place in advance, training unit 20 specifically can be carried out following operation and train a certain degree of confidence that has the word of ground Domain Properties of acquisition: obtain the user search record that has the ground noun and marked IP address ownership place in advance, statistics comprises the record number of this word that has the ground Domain Properties and each ground noun simultaneously in the Query of those records, be designated as T[place name 1 respectively], T[place name 2], T[place name n], add up simultaneously in those records this word that has the ground Domain Properties with certain during the noun co-occurrence, the IP ownership place is the record number of this place name, be designated as R[place name 1 respectively], R[place name 2], R[place name n], this degree of confidence that has the word of ground Domain Properties is designated as P, then
Figure BDA00002945631700181
Need to prove, if pretreatment unit 10 has further extracted core ground Domain Properties word from the word that has the ground Domain Properties, perhaps further obtained the final core ground Domain Properties word after the normalization, then training unit 20 is trained the degree of confidence for the final core ground Domain Properties word after core ground Domain Properties word or the normalization that obtains in above-mentioned training process.
Judgement unit 30 is used for user ID, the ground noun in the query word that identifies and word and the described degree of confidence that has the word of ground Domain Properties that has the ground Domain Properties according to described user search record, determines the ownership place of this IP address.Preferably, can calculate first weighted value that IP address belongs to each region of described ground noun correspondence according to preset rule, determine the ownership place of this IP address according to this first weighted value.
Searching record in analysis user, identify the ground noun among the query word Query of user search and had word (or the core ground Domain Properties word of ground Domain Properties, or the final core ground Domain Properties word after the normalization), obtaining each word that has the ground Domain Properties (or core ground Domain Properties word, or the final core ground Domain Properties word after the normalization) behind the degree of confidence P, judgement unit 30 can calculate first weighted value that the IP address belongs to each ground, region noun among its corresponding Query according to preset rule, and the region that first weighted value is the highest is as the ownership place of IP address, calculate first weighted value that a certain IP address belongs to each region for a kind of preferred implementation provided by the invention below: choose this IP address user searching record that contains the ground noun among the Query, statistics contains the user ID number in this IP address user searching record, be designated as Cid, statistics comprises this region ground noun simultaneously and each has word (or the core ground Domain Properties word of ground Domain Properties, or the final core ground Domain Properties word after the normalization) the corresponding user ID number of Query, be designated as the C[place name respectively, word 1], the C[place name, word 2] ... the C[place name, word m], first weighted value that this IP address is belonged to this region is designated as the Z[place name], then
Figure BDA00002945631700191
Figure BDA00002945631700192
Wherein, word 1, word 2 ... word m refers to that namely each has the degree of confidence of the word of ground Domain Properties (or core ground Domain Properties word, or the final core ground Domain Properties word after the normalization).Can calculate first weighted value that a certain IP address belongs to each region by said method, last, the region that first weighted value is the highest is as the ownership place of IP address.Be understandable that, the method that above-mentioned calculating IP address belongs to first weighted value of each region only is a kind of preferred implementation provided by the invention, can set different rules in actual applications as required and calculate the weighted value that the IP address belongs to each region, the present invention does not limit this.
Utilize above-mentioned pretreatment unit 10, training unit 20, judgement unit 30, can be by analyzing the query word Query in the user search record in a period of time obtain in advance, and in conjunction with user ID, accurately obtain the ownership place of IP address.Afterwards, can be further the record that obtained the IP address ownership place be used for the degree of confidence P that the word that has the ground Domain Properties of said method (or core ground Domain Properties word, or the final core ground Domain Properties word after the normalization) is obtained in training as sample.
Further, device provided by the present invention can also comprise that following apparatus cartographic information judgement unit 40 comes graph search engine in combination to obtain the ownership place of user's IP address.
Cartographic information judgement unit 40 for the acquiescence urban information and the user ID that arrange at the map search engine according to the user in a period of time of obtaining in advance, calculates second weighted value that the IP address belongs to each region according to preset rule.
Usually, the map search engine is when providing the map search service to the user, can set the acquiescence city for the user, so that the user directly searches for correlation map information in the acquiescence city in its setting when using access graph search engine website, and the user often is exactly its location in the acquiescence city that the map search engine sets, therefore, analyze the acquiescence urban information that the user arranges in a period of time and the ownership place that can obtain IP address in conjunction with user ID in the map search engine.
Can be in advance with the acquiescence city that sets during user's access graph search engine website in a period of time, and information such as user ID and IP address form the record back and preserve.For example " 43179D117F6AC7BD4856744B31F4E0E8; 125.34.37.129; Beijing " is the acquiescence city of user's setting of preserving and the record of user ID and IP address, wherein, " 43179D117F6AC7BD4856744B31F4E0E8 " is user ID, " 125.34.37.129 " is IP address, the acquiescence city that " Beijing " is set for the user.
Afterwards, acquiescence urban information and user ID that cartographic information judgement unit 40 can arrange in the map search engine according to the user in a period of time of obtaining in advance, obtain the ownership place of IP address, cartographic information judgement unit 40 specifically can be carried out following operation: calculate the second weighted value Z[map that User IP belongs to different cities according to acquiescence urban information and user ID that the user arranges, place name], and with the second weighted value Z[map, place name] the highest city is as the ownership place of IP address, wherein, User IP belongs to the second weighted value Z[map in a certain city, place name] be the ratio of user ID number and the total user ID number in this city for acquiescence city in the record that is obtained in advance.
Information such as the acquiescence city that cartographic information judgement unit 40 arranges in the time of can using the map search engine according to the user and user ID are obtained the ownership place of IP address.Afterwards, judgement unit 30 can be further to the ownership place of the User IP that obtains according to the query word Query in the user search record and user ID, and the ownership place of the User IP that obtains of the acquiescence city that arranges in the map search engine according to the user and user ID is integrated.
Judgement unit 30 can be integrated the ownership place of IP address according to first weighted value and second weighted value that IP address belongs to each region, specifically can adopt following manner to realize:
The IP address is belonged to the first weighted value Z[place name of same region] and the second weighted value Z[map, place name] multiply each other, obtain the comprehensive weight value that the IP address belongs to each region, and the region that the comprehensive weight value is the highest is as the final ownership place of IP address.For example, a certain IP address is respectively Z[Nanjing according to first weighted value that belongs to " Nanjing " and " Beijing " that the query word Query of user ID and user search obtains]=0.65, Z[Beijing]=0.25, second weighted value that belongs to " Nanjing " and " Beijing " that the acquiescence city that this IP address arranges in the map search engine according to the user and user ID obtain is respectively the Z[map, Nanjing]=0.45, the Z[map, Beijing]=0.3, then to belong to the comprehensive weight value in " Nanjing " be Z[Nanjing in this IP address] the Z[map, Nanjing]=0.2925, the comprehensive weight value that belongs to " Nanjing " is Z[Beijing] the Z[map, Beijing]=0.075, it is higher that this IP address belongs to the comprehensive weight value in " Nanjing ", judges that the final ownership place of this IP address is " Nanjing ".
The above only is preferred embodiment of the present invention, and is in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, is equal to replacement, improvement etc., all should be included within the scope of protection of the invention.

Claims (22)

1. one kind is obtained the method for Internet protocol IP address ownership place based on search engine, it is characterized in that this method comprises:
S1, obtain the user search record in a period of time, described user search record comprises user ID ID, query word and IP address, and identifies the ground noun in the query word of described user search record and the word that has the ground Domain Properties;
The user search record that S2, utilization mark IP address ownership place in advance obtains the described degree of confidence that has the word of ground Domain Properties as sample training;
S3, according to the ground noun in the user ID in the described user search record, the query word that identifies and word and the described degree of confidence that has the word of ground Domain Properties that has the ground Domain Properties, determine the ownership place of this IP address.
2. method according to claim 1 is characterized in that, the ground noun among the step S1 in the query word of the described user search of identification record and the word that has a ground Domain Properties specifically comprise:
S11, the query word in the described user search record is carried out participle, identification ground noun wherein;
Non-place name participle in S12, the extraction query word will be higher than the non-place name participle of predetermined threshold value as the word that has the ground Domain Properties with the co-occurrence rate of ground noun in query word.
3. method according to claim 2 is characterized in that, also comprises behind the described step S12:
S13, the described word that has the ground Domain Properties is carried out meaning of a word analysis, extract the word that has the ground Domain Properties that meaning of a word weighted value is higher than predetermined threshold value.
4. method according to claim 3 is characterized in that, also comprises behind the described step S13:
S14, basis have the affiliated classification of the word of ground Domain Properties, and the word that has the ground Domain Properties that described step S13 is extracted carries out normalized.
5. method according to claim 1 is characterized in that, described step S2 specifically comprises:
According to formula
Figure FDA00002945631600011
Obtain the degree of confidence P[M of the word M that has the ground Domain Properties], wherein, T[place name i] be to have the word M of ground Domain Properties and the record number of ground noun i co-occurrence in the described training sample, R[place name i] the IP address ownership place that marks in advance during with ground noun i co-occurrence for the word M that has the ground Domain Properties in the described training sample is the record number of the corresponding region of this ground noun i, n be in the training sample with the ground noun number of M co-occurrence.
6. according to the described method of the arbitrary claim of claim 1 to 5, it is characterized in that, determine described in the step S3 that the ownership place of this IP address is:
Calculate first weighted value that IP address belongs to each region of described ground noun correspondence according to preset rule, determine the ownership place of this IP address according to this first weighted value.
7. method according to claim 6 is characterized in that,, specifically comprises when calculating IP address and belong to first weighted value of each region of described ground noun correspondence according to preset rule described:
According to formula
Figure FDA00002945631600021
Obtain the first weighted value Z[L that the IP address belongs to region L], wherein, Cid contains the user ID number that comprises in this IP address user searching record of ground noun, C[L, word i] be the ground noun of region L correspondence in described this IP address user searching record that contains the ground noun and the corresponding user ID number of record of co-occurrence of the word i that has the ground Domain Properties, P[word i] for having the degree of confidence of word i of ground Domain Properties, m is the number that has the word of ground Domain Properties in described this IP address user searching record that contains the ground noun.
8. method according to claim 6 is characterized in that, describedly determines that according to this first weighted value the ownership place of this IP address is:
IP address is belonged in first weighted value of each region of described ground noun correspondence, the highest region of first weighted value is as the ownership place of this IP address.
9. method according to claim 6 is characterized in that, this method also comprises:
Acquiescence urban information and user ID that user in a period of time that S4, basis are obtained in advance arranges in the map search engine are calculated second weighted value that the IP address belongs to each region according to preset rule;
Describedly determine that according to this first weighted value the ownership place of this IP address is specially:
Integrate first weighted value and second weighted value that the IP address belongs to each region, obtain the final ownership place of IP address.
10. method according to claim 9 is characterized in that, described calculating IP address belongs to second weighted value of each region, specifically comprises:
The ratio that the acquiescence city that the described user who obtains is in advance arranged in the map search engine belongs to the user ID number of a certain region and total user ID number belongs to second weighted value of described a certain region as the IP address.
11. method according to claim 9 is characterized in that, described integration IP address belongs to first weighted value and second weighted value of each region, and the final ownership place that obtains the IP address specifically comprises:
First weighted value and second weighted value that the IP address are belonged to each region multiply each other, and obtain the comprehensive weight value that the IP address belongs to each region, and the region that the comprehensive weight value is the highest is as the ownership place of IP address.
12. one kind is obtained the device of IP address ownership place based on search engine, it is characterized in that this device comprises:
Pretreatment unit is used for obtaining the user search record in a period of time, and described user search record comprises user ID, query word and IP address, and identifies the ground noun in the query word of described user search record and the word that has the ground Domain Properties;
Training unit is used for utilizing the user search record that marks IP address ownership place in advance to obtain the described degree of confidence that has the word of ground Domain Properties as sample training;
Judgement unit is used for user ID, the ground noun in the query word that identifies and word and the described degree of confidence that has the word of ground Domain Properties that has the ground Domain Properties according to described user search record, determines the ownership place of this IP address.
13. device according to claim 12 is characterized in that, described pretreatment unit in the query word of the described user search of identification record the ground noun and when having the word of ground Domain Properties, the concrete execution:
S21, the query word in the described user search record is carried out participle, identification ground noun wherein;
Non-place name participle in S22, the extraction query word will be higher than the non-place name participle of predetermined threshold value as the word that has the ground Domain Properties with the co-occurrence rate of ground noun in query word.
14. device according to claim 13 is characterized in that, described pretreatment unit is also carried out after carrying out S22:
S23, the described word that has the ground Domain Properties is carried out meaning of a word analysis, extract the word that has the ground Domain Properties that meaning of a word weighted value is higher than predetermined threshold value.
15. device according to claim 14 is characterized in that, described pretreatment unit is also carried out after carrying out S23:
S24, basis have the affiliated classification of the word of ground Domain Properties, and the word that has the ground Domain Properties that described step S23 is extracted carries out normalized.
16. device according to claim 12 is characterized in that, described training unit is specifically carried out:
According to formula
Figure FDA00002945631600041
Obtain the degree of confidence P[M of the word M that has the ground Domain Properties], wherein, T[place name i] be to have the word M of ground Domain Properties and the record number of ground noun i co-occurrence in the described training sample, R[place name i] the IP address ownership place that marks in advance during with ground noun i co-occurrence for the word M that has the ground Domain Properties in the described training sample is the record number of the corresponding region of this ground noun i, n be in the training sample with the ground noun number of M co-occurrence.
17., it is characterized in that described judgement unit is specifically carried out according to the described device of the arbitrary claim of claim 12 to 16 when determining the ownership place of this IP address:
Calculate first weighted value that IP address belongs to each region of described ground noun correspondence according to preset rule, determine the ownership place of this IP address according to this first weighted value.
18. device according to claim 17 is characterized in that, when described judgement unit calculates IP address and belongs to first weighted value of each region of described ground noun correspondence according to preset rule, and concrete execution:
According to formula
Figure FDA00002945631600042
Obtain the first weighted value Z[L that the IP address belongs to region L], wherein, Cid contains the user ID number that comprises in this IP address user searching record of ground noun, C[L, word i] be the ground noun of region L correspondence in described this IP address user searching record that contains the ground noun and the corresponding user ID number of record of co-occurrence of the word i that has the ground Domain Properties, P[word i] for having the degree of confidence of word i of ground Domain Properties, m is the number that has the word of ground Domain Properties in described this IP address user searching record that contains the ground noun.
19. device according to claim 17 is characterized in that, when described judgement unit is determined the ownership place of this IP address according to this first weighted value, and concrete the execution:
IP address is belonged in first weighted value of each region of described ground noun correspondence, the highest region of first weighted value is as the ownership place of this IP address.
20. device according to claim 17 is characterized in that, this device also comprises:
The cartographic information judgement unit for the acquiescence urban information and the user ID that arrange at the map search engine according to the user in a period of time of obtaining in advance, calculates second weighted value that the IP address belongs to each region according to preset rule;
When described judgement unit is determined the ownership place of this IP address according to this first weighted value, the concrete execution:
Integrate first weighted value and second weighted value that the IP address belongs to each region, obtain the final ownership place of IP address.
21. device according to claim 20 is characterized in that, when described cartographic information judgement unit calculating IP address belongs to second weighted value of each region, and concrete the execution:
The ratio that the acquiescence city that the described user who obtains is in advance arranged in the map search engine belongs to the user ID number of a certain region and total user ID number belongs to second weighted value of described a certain region as the IP address.
22. device according to claim 20 is characterized in that, described judgement unit is integrated first weighted value and second weighted value that the IP address belongs to each region, when obtaining the final ownership place of IP address, and concrete the execution:
First weighted value and second weighted value that the IP address are belonged to each region multiply each other, and obtain the comprehensive weight value that the IP address belongs to each region, and the region that the comprehensive weight value is the highest is as the ownership place of IP address.
CN201310091285.XA 2013-03-21 2013-03-21 A kind of method and apparatus that IP address ownership place is obtained based on search engine Active CN103207901B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310091285.XA CN103207901B (en) 2013-03-21 2013-03-21 A kind of method and apparatus that IP address ownership place is obtained based on search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310091285.XA CN103207901B (en) 2013-03-21 2013-03-21 A kind of method and apparatus that IP address ownership place is obtained based on search engine

Publications (2)

Publication Number Publication Date
CN103207901A true CN103207901A (en) 2013-07-17
CN103207901B CN103207901B (en) 2019-03-08

Family

ID=48755123

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310091285.XA Active CN103207901B (en) 2013-03-21 2013-03-21 A kind of method and apparatus that IP address ownership place is obtained based on search engine

Country Status (1)

Country Link
CN (1) CN103207901B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104168163A (en) * 2014-08-27 2014-11-26 福建富士通信息软件有限公司 Intelligent network line quality detection and data analysis method
CN104780235A (en) * 2014-01-14 2015-07-15 腾讯科技(深圳)有限公司 IP attribution inquiry method and device and server
CN104780234A (en) * 2014-01-14 2015-07-15 腾讯科技(深圳)有限公司 Method, device and system for inquiring Internet protocol (IP) address location
CN105335480A (en) * 2015-10-13 2016-02-17 国家电网公司 Internet website liability subject identifying method
CN106096040A (en) * 2016-06-29 2016-11-09 中国人民解放军国防科学技术大学 Organization web ownership place method of discrimination based on search engine and device thereof
CN106357835A (en) * 2016-09-05 2017-01-25 百度在线网络技术(北京)有限公司 Method and device for determining subordinate region of target IP address
CN111327721A (en) * 2020-02-28 2020-06-23 加和(北京)信息科技有限公司 IP address positioning method and device, storage medium and electronic device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012900A (en) * 2009-09-04 2011-04-13 阿里巴巴集团控股有限公司 An information retrieval method and system
CN102033947A (en) * 2010-12-22 2011-04-27 百度在线网络技术(北京)有限公司 Region recognizing device and method based on retrieval word
CN102880721A (en) * 2012-10-15 2013-01-16 瑞庭网络技术(上海)有限公司 Implementation method of vertical search engine
CN102932492A (en) * 2011-09-12 2013-02-13 微软公司 Correlation of users to ip address lease events

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012900A (en) * 2009-09-04 2011-04-13 阿里巴巴集团控股有限公司 An information retrieval method and system
CN102033947A (en) * 2010-12-22 2011-04-27 百度在线网络技术(北京)有限公司 Region recognizing device and method based on retrieval word
CN102932492A (en) * 2011-09-12 2013-02-13 微软公司 Correlation of users to ip address lease events
CN102880721A (en) * 2012-10-15 2013-01-16 瑞庭网络技术(上海)有限公司 Implementation method of vertical search engine

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
谢幸等: "基于地理信息的用户行为理解", 《HTTPS://WENKU.BAIDU.COM/VIEW/927FED7202768E73876.HTML》 *
黄西安: "利用"百度"搜索网络信息资源", 《科技情报开发与经济》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104780235A (en) * 2014-01-14 2015-07-15 腾讯科技(深圳)有限公司 IP attribution inquiry method and device and server
CN104780234A (en) * 2014-01-14 2015-07-15 腾讯科技(深圳)有限公司 Method, device and system for inquiring Internet protocol (IP) address location
CN104780235B (en) * 2014-01-14 2019-08-06 腾讯科技(深圳)有限公司 IP attribution inquiry method, device and server
CN104780234B (en) * 2014-01-14 2019-09-17 腾讯科技(深圳)有限公司 IP attribution inquiry method, apparatus and system
CN104168163A (en) * 2014-08-27 2014-11-26 福建富士通信息软件有限公司 Intelligent network line quality detection and data analysis method
CN105335480A (en) * 2015-10-13 2016-02-17 国家电网公司 Internet website liability subject identifying method
CN106096040A (en) * 2016-06-29 2016-11-09 中国人民解放军国防科学技术大学 Organization web ownership place method of discrimination based on search engine and device thereof
CN106096040B (en) * 2016-06-29 2019-06-04 中国人民解放军国防科学技术大学 Organization web ownership place method of discrimination and its device based on search engine
CN106357835A (en) * 2016-09-05 2017-01-25 百度在线网络技术(北京)有限公司 Method and device for determining subordinate region of target IP address
CN106357835B (en) * 2016-09-05 2020-03-06 百度在线网络技术(北京)有限公司 Method and equipment for determining region of target IP address
CN111327721A (en) * 2020-02-28 2020-06-23 加和(北京)信息科技有限公司 IP address positioning method and device, storage medium and electronic device
CN111327721B (en) * 2020-02-28 2023-01-10 加和(北京)信息科技有限公司 IP address positioning method and device, storage medium and electronic device

Also Published As

Publication number Publication date
CN103207901B (en) 2019-03-08

Similar Documents

Publication Publication Date Title
CN103207901A (en) Method and system for obtaining internet protocol address attribution position based on search engine
CN109145169B (en) Address matching method based on statistical word segmentation
CN103955505B (en) A kind of event method of real-time and system based on microblogging
CN102591867B (en) Searching service method based on mobile device position
CN104143005A (en) Related searching system and method
CN103186574B (en) A kind of generation method and apparatus of Search Results
CN106919641A (en) A kind of interest point search method and device, electronic equipment
CN105244031A (en) Speaker identification method and device
CN102426603B (en) Text information regional recognition method and device
CN104537027A (en) Information recommendation method and device
CN103995837A (en) Personalized tourist track planning method based on group footprints
CN107203526B (en) Query string semantic demand analysis method and device
CN103106189B (en) A kind of method and apparatus excavating synonym attribute word
CN105718585B (en) Document and label word justice correlating method and its device
CN109492066B (en) Method, device, equipment and storage medium for determining branch names of points of interest
CN102541936A (en) Method and device for acquiring popularity of POI (Point of Interest)
CN106096040B (en) Organization web ownership place method of discrimination and its device based on search engine
CN111524353B (en) Method for traffic text data for speed prediction and travel planning
CN101814290A (en) Method for enhancing robustness of voice recognition system
CN104866623A (en) Searching method and searching server
CN114510566B (en) Method and system for mining, classifying and analyzing hotword based on worksheet
CN108984640A (en) A kind of geography information acquisition methods excavated based on web data
CN107577744A (en) Nonstandard Address automatic matching model, matching process and method for establishing model
CN111104468A (en) Method for deducing user activity based on semantic track
Geng et al. Geographically Weighted Regression model (GWR) based spatial analysis of house price in Shenzhen

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant