Summary of the invention
The invention provides a kind of sort method and device of electronic map data, cause ordering weak effect, labor intensive, problem that cost is too high to solve traditional artificial sort method.
For solving the problems of the technologies described above, according to specific embodiment provided by the invention, the invention discloses following technical scheme:
A kind of sort method of electronic map data comprises:
Extract the keyword of each electronic map data;
Utilize described keyword search, obtain the search result web page set of corresponding each electronic map data;
For each search result web page in the set, calculate respectively the first numerical value that is used for expression webpage significance level and the second value that is used for expression webpage and keyword matching degree, according to the first numerical value and the second value of all search result web page in the corresponding set, calculate the importance degree of this electronic map data;
According to described importance degree described electronic map data is sorted;
Wherein, described the first numerical value and second value according to all search result web page in the corresponding set, calculate the importance degree of this electronic map data, specifically comprise: the first numerical value and the second value of each search result web page multiplies each other in will gathering, and then the multiplied result of all search result web page is sued for peace in will gathering, and obtains the importance degree of this electronic map data.
Preferably, described the first numerical value obtains by calculating the webpage rank.
Preferably, after the importance degree of described this electronic map data of calculating, also comprise: the different weights that have according to classification under the electronic map data, the importance degree of this electronic map data be multiply by the weighted value of classification under this electronic map data, result data after being adjusted is used for ordering.
Wherein, the described keyword that extracts each electronic map data specifically comprises: the name that extracts each electronic map data is referred to as keyword.
Preferably, also comprise: extract the address information of each electronic map data, with title together as keyword.
Preferably, before the described keyword that extracts each electronic map data, also comprise: original electronic map data is carried out pre-service, and described pre-service comprises removes irrelevant symbol, character code conversion, adjusts consolidation form; The pre-service result is used for the extraction of keyword;
Preferably, after according to described importance degree described electronic map data being sorted, also comprise: in the electronic chart retrieval, the query word of inputting according to the user returns the result for retrieval that is complementary, and the forward electronic map data of ordering in the result for retrieval is preferentially shown.
Preferably, after according to described importance degree described electronic map data being sorted, also comprise: when the figure layer shows, choose the forward electronic map data of indication range internal sort and show.
Preferably, after according to described importance degree described electronic map data being sorted, also comprise: forward electronic map data preferentially upgrades to sorting.
The present invention also provides a kind of collator of electronic map data, comprising:
Keyword extracting unit is for the keyword that extracts each electronic map data;
Query unit is used for utilizing described keyword to search for, and obtains the search result web page set of corresponding each electronic map data;
Computing unit comprises: the first computation subunit is used for calculating respectively the first numerical value that is used for expression webpage significance level for each search result web page of set; The second computation subunit is used for calculating respectively the second value that is used for expression webpage and keyword matching degree for each search result web page of set; The COMPREHENSIVE CALCULATING subelement is used for the first numerical value and second value according to all search result web page of each electronic map data corresponding set, calculates the importance degree of this electronic map data;
Sequencing unit is used for according to described importance degree described electronic map data being sorted;
Wherein, the first numerical value and the second value of each search result web page multiplied each other during described COMPREHENSIVE CALCULATING subelement will be gathered, and then the multiplied result of all search result web page is sued for peace in will gathering, and obtains the importance degree of this electronic map data.
Preferably, described the first computation subunit obtains the first numerical value by calculating the webpage rank.
Preferably, described device also comprises: adjustment unit, for the different weights that have according to classification under the electronic map data, the importance degree of this electronic map data be multiply by the weighted value of classification under this electronic map data, result data after being adjusted, and output to sequencing unit for ordering.
Wherein, described keyword extracting unit is referred to as keyword with the name of the electronic map data that extracts.
Preferably, described keyword extracting unit is also with the address information of the electronic map data that extracts, with title together as keyword.
Preferably, described device also comprises: pretreatment unit is used for original electronic map data is carried out pre-service, and the pre-service result is outputed to keyword extracting unit; Wherein, described pre-service comprises the irrelevant symbol of removal, character code conversion, adjusts consolidation form.
Preferably, described device also comprises: retrieval unit, be used in the electronic chart retrieval, and the query word of inputting according to the user returns the result for retrieval that is complementary, and the forward electronic map data of ordering in the result for retrieval is preferentially shown.
Preferably, described device also comprises: figure layer display unit is used for choosing the forward electronic map data of indication range internal sort and showing when the figure layer shows.
Preferably, described device also comprises: data updating unit is used for the forward electronic map data that sorts is preferentially upgraded.
The present invention also provides a kind of search engine system, and described system comprises the described device of above-mentioned arbitrary device embodiment.
According to specific embodiment provided by the invention, the present invention has following technique effect:
At first, the present invention utilizes Internet technology that the POI data are sorted, portray the significance level of POI data with the network popularity of internet, and the network popularity is to calculate according to the results web page that keyword (being to go out from the POI extracting data) returns search engine.Because this portrayal has represented numerous netizens and even broad masses' understanding, therefore, utilizes the network popularity to come the POI data are sorted, the effect of ordering is relatively good, has good mass foundation and rationality.And, use machine automatically the POI data to be given a mark and sorted, greatly saved manpower, efficient is higher, and cost is very cheap.
Secondly, when utilizing the significance level of network popularity portrayal POI data, the present invention has mainly used these two indexs of matching degree of significance level, webpage and the keyword of webpage, and each index also has different computing method.
Again, the present invention has also taken into full account the classification of POI data to the impact of POI significance level, the classification information of utilizing the POI data is come thereby basic network popularity score is adjusted the final score that obtains POI, thereby has portrayed more exactly the significance level of POI data.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
Embodiment one:
For the artificial sort method of traditional POI, the embodiment of the invention provides a kind of sort method that utilizes Internet technology to carry out.With reference to Fig. 1, it is the sort method process flow diagram of the embodiment of the invention one described a kind of electronic map data.In the present embodiment, described electronic map data describes with the POI data instance, but described electronic map data includes but not limited to the POI data.
S101 extracts the keyword of each POI data;
Present embodiment need to go out a keyword from each POI extracting data, is used for inquiring about in the search engine of internet.Because each POI data has some attributes, comprises title, classification, coordinate or other attribute information, can from these attribute informations, extract when therefore extracting and to represent the word of these POI data as keyword.In the present embodiment, the essential part of keyword is the title of POI, because title is the most important parts of POI data.
Preferably, when extracting the title of POI data, need to carry out some to title and process, such as information such as the branch in the removal title, branch officies.Because often there are the situation of branch, branch office in the title as food and drink, company, the inside, and the purpose of POI ordering is for home office, main office being come forward position, so at this moment just can remove the character of this branch, branch office.Such as " branch, xx company five road junction ", just can remove only surplus " xx company " to " branch, five road junctions ".
Preferably, also can add some other information as the replenishing of title, such as address, district etc.Because some title is too short, do not have practical significance, such as words such as public lavatory, parking lots, at this time just can add the address of POI come in, and title be together as keyword the better effects if of processing like this.
S102 utilizes described keyword to search for, and obtains the search result web page set of corresponding each POI data;
The results set that returns is inquired about and obtained to the keyword that said extracted goes out in search engine.
S103 according to the corresponding search result web page set of each POI data, calculates the importance degree of these POI data;
The present invention utilizes the network popularity of internet to portray the significance level of POI data, and the network popularity of POI is according to search result web page set that should POI is calculated.Wherein, described network popularity refers to the well-known degree of a title in network.
For each POI data, the keyword that utilization extracts is inquired about and can be accessed a plurality of search result web page (being collections of web pages), and each webpage has two indexs: one is the significance level of webpage, and another is the matching degree of webpage and keyword.Present embodiment mainly utilizes described two indexs to weigh the network popularity of POI data.
Because every kind of index has different computing method, present embodiment only adopts wherein a kind of method relatively more commonly used.For the significance level of webpage, adopt the method for calculating webpage rank (PageRank).The PageRank of webpage is a kind of index of tolerance webpage significance level, is to calculate according to the hyperlink between the webpage, stems from the PageRank algorithm that Google founder proposes.The significance level that certainly, also can represent with the flow of webpage webpage.For the matching degree (MatchRank) of webpage and keyword, the computing method that usually adopt are: if keyword complete appearance in webpage, then matching degree is higher, if keyword occurs after by cutting, then matching degree is lower.The present invention is including but not limited to above computing method.
After obtaining the PageRank and MatchRank of each webpage, PageRank and the MatchRank of each webpage multiplied each other, and then with the multiplied result addition of all webpages of the same POI data of correspondence, namely obtain the result of calculation of POI data.In the present embodiment, adopting the mode to the marking of POI data, is a score value that the network popularity of these POI data is portrayed so described result of calculation obtains.
Need to prove that above-mentioned PageRank and MatchRank according to webpage adopts the calculating of the again addition of multiplying each other to obtain the method for a POI score value, only as a kind of implementation of present embodiment, the present invention includes but be not limited to described method.
S104 sorts to described POI data according to described importance degree.
After obtaining the score of each POI data, utilize described score namely can all POI data to be sorted.
By above-mentioned treatment scheme as can be known, the present invention portrays the significance level of POI data with the network popularity of internet, because this portrayal has represented numerous netizens and even broad masses' understanding, therefore utilize the network popularity to come the POI data are sorted, the effect of ordering is relatively good, has good mass foundation and rationality.And, use machine automatically the POI data to be given a mark and sorted, greatly saved manpower, efficient is higher, and cost is very cheap.
Embodiment two:
The embodiment of the invention two provides a kind of concrete application example.
With reference to Fig. 2, it is the sort method schematic flow sheet of the embodiment of the invention two described a kind of POI data.
S201 carries out pre-service to original POI data;
Original POI data are carried out cleaning and filtering, and major function is the input standard that makes data fit certain.Described pre-service mainly comprises removes irrelevant symbol, character code conversion, three parts of adjustment consolidation form.Wherein,
1) remove irrelevant symbol: because may there be some irrelevant symbols in source or the other problems of data in the data, these symbols do not have practical significance, as! , the symbol such as #, also have mess code etc., these irrelevant symbols need to be removed, play a cleaning and filtering effect;
2) character code conversion: make the coding of character consistent, the justice that can be conducive to give a mark later.Turn full-shape such as half-angle, it is simplified etc. that the traditional font turns;
3) adjust form: the input format of data should be unified, and is beneficial to like this programming.
S202 for pretreated POI data, extracts the keyword of each POI data;
In the leaching process, can identify the information such as the branch that comprises in the title, branch office according to the bank of geographical names and another name storehouse, then remove these information.For example " branch, xx company five road junction ", if " five road junctions " is a word in the bank of geographical names, " branch " is the word in the peculiar dictionary, so just can remove only surplus " xx company " to " branch, five road junctions ".
S203 utilizes described keyword to search for, and obtains the search result web page set of corresponding each POI data;
S204 for each POI data, calculates for the basic score value that represents this POI data significance level according to corresponding search result web page set;
In the present embodiment, the score value that calculates according to PageRank and the MatchRank of webpage is as the basic score value of POI data, and this basic score value is the portrayal to the network popularity of these POI data.
S205 adjusts described basic score value according to the classification information of POI data;
Because the POI data have a lot of classifications, and different classes of data have different character at network.For example, the POI data of food and drink class more receive publicity on network than the POI data of government bodies class, but the POI data of government bodies' class are more even more important than the POI data of food and drink class, because people more pay close attention to the POI data of government bodies' class in real life.Therefore, for the score of the different classes of POI data of balance, present embodiment has been introduced the classification weight, need to adjust according to the weight of classification the basic score of POI, so that the important POI score of classification improves, the unessential POI score of classification reduces.The weight of classification can rule of thumb be set, and also can train acquisition with some training datas.Adjustment process is: multiply by the weight size of classification under it with the basic score of POI data, so just obtain final score.
For example, two POI data are arranged, one is The Third Affiliated Hospital of Peking University, and one is the Guo Lin home cooking.Because the title of food and drink class occurs in webpage often, so the basic of Guo Lin home cooking must be divided into 5 minutes, and The Third Affiliated Hospital of Peking University must be divided into 4 minutes.But according to people's experience and custom, hospital can be more important than food and drink class, so the classification weight of hospital's class is larger, be made as 1.5, and the weight of food and drink is lower, is made as 0.8.The score of final like this two POI is respectively: the 4 * 1.5=6 of The Third Affiliated Hospital of Peking University, Guo Lin home cooking 5 * 0.8=4.Thereby The Third Affiliated Hospital of Peking University is higher than the score of Guo Lin home cooking, and it is forward to sort, and this has just met people's general understanding.
S206 sorts to described POI data according to the final score value after the described adjustment.
Comparative example one and embodiment two, embodiment two have increased the adjustment process of preprocessing process and basic score value.Embodiment two has also taken into full account the classification of POI data to the impact of POI significance level, the classification information of utilizing the POI data is come thereby basic network popularity score is adjusted the final score that obtains POI, thereby has portrayed more exactly the significance level of POI data.
The ordering of electronic chart POI data has a lot of practical values, for example:
1) query and search aspect: the user inputs a query word when electronic map query, can return a lot of result for retrieval, and these result for retrieval all mate with this query word, but often also has dividing of significance level among these results.After if POI sorted, just can in coupling, be presented at the front to important POI, unessential putting behind, more convenient like this user's use.For example, inquiry " Quanjude ", a lot of branch and some subsidiary corporatioies or the training organization that Quanjude can occur, they all mate with this query word, but can not be presented at the front to some subsidiary corporatioies and training organization, because generally these are not too important, and should come the front to important home office or branch.For another example: inquiry Peking University, the cum rights of Peking University and it can appear, Peking University should make number one, but its numerous cum rights should have an ordering front and back minute.
2) a figure layer demonstration aspect: electronic chart generally is comprised of multi-layer image very, when the user when checking certain figure layer, POI that should figure layer should be shown the confession user and check.But the user in certain figure layer focus around perhaps a lot of POI is arranged, if these POI are all shown, then full page can be very mixed and disorderly and too fat to move, this just is unfavorable for that the user checks.Therefore, need to choose a part of POI according to significance level and show, so not only the user can view the information that oneself needs, and whole display effect is relatively good.
3) Data Update aspect: because the POI renewal speed is very fast, and renewal amount is larger, if can only upgrade first for important data in the limited situation of energy.
For said method embodiment, the present invention also provides a kind of collator embodiment of electronic map data.With reference to Fig. 3, it is the collator structural drawing of the described a kind of electronic map data of the embodiment of the invention.Described device mainly comprises:
Keyword extracting unit U32 is for the keyword that extracts each electronic map data;
Query unit U33 is used for utilizing described keyword to search for, and obtains the search result web page set of corresponding each electronic map data;
Computing unit U34 is used for the corresponding search result web page set according to each electronic map data, calculates the importance degree of this electronic map data;
Sequencing unit U36 is used for according to described importance degree described electronic map data being sorted.
Wherein, described computing unit U34 specifically comprises:
The first computation subunit is used for calculating respectively the first numerical value that is used for expression webpage significance level for each search result web page of set; The significance level of webpage can be represented by webpage rank (PageRank), so described the first numerical value namely refers to calculate the PageRank of gained; Certainly, also can represent with the flow of webpage;
The second computation subunit is used for calculating respectively the second value that is used for expression webpage and query word matching degree for each search result web page of set; The matching degree of webpage and query word (MatchRank) can be calculated by several different methods;
The COMPREHENSIVE CALCULATING subelement is used for for each electronic map data, according to the first numerical value and the second value of all search result web page in the corresponding set, calculates the result data that is used for this electronic map data significance level of expression.A kind of account form is: the first numerical value and the second value of each search result web page multiplied each other during described COMPREHENSIVE CALCULATING subelement will be gathered, and then the multiplied result of all search result web page is sued for peace in will gathering, and obtains the significance level value of this electronic map data.
Wherein, described keyword extracting unit U32 is referred to as keyword with the name of the electronic map data that extracts; Perhaps, with the address information of the electronic map data that extracts, with title together as keyword.Preferably, when extracting title, remove the information that comprises branch, branch office.
Preferably, in another device embodiment of the present invention, described device also comprises adjustment unit U35, for the different weights that have according to classification under the electronic map data, the importance degree of this electronic map data be multiply by the weighted value of classification under this electronic map data, result data after being adjusted, and output to sequencing unit U36 for ordering.
Preferably, in another device embodiment of the present invention, described device also comprises pretreatment unit U31, is used for original electronic map data is carried out pre-service, and the pre-service result is outputed to keyword extracting unit U32; Wherein, described pre-service comprises the irrelevant symbol of removal, carries out the character code conversion, adjusts consolidation form.
Preferably, in another device embodiment of the present invention, described device also comprises retrieval unit U37, is used for retrieving at electronic chart, query word according to user's input returns the result for retrieval that is complementary, and the forward electronic map data of ordering in the result for retrieval is preferentially shown.
Preferably, in another device embodiment of the present invention, described device also comprises figure layer display unit U38, is used for choosing the forward electronic map data of indication range internal sort and showing when the figure layer shows.
Preferably, in another device embodiment of the present invention, described device also comprises data updating unit U39, is used for the forward electronic map data that sorts is preferentially upgraded.
The part that does not describe in detail in the device shown in Figure 3 can be considered for length referring to the relevant portion of Fig. 1, method shown in Figure 2, is not described in detail in this.
In addition, the present invention also provides a kind of search engine system, and described system comprises the described device of above-mentioned arbitrary device embodiment.Described search engine system can provide the more result for retrieval of high-quality in the search application facet of electronic map data.
More than to sort method and the device of a kind of electronic map data provided by the present invention, be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications.In sum, this description should not be construed as limitation of the present invention.