Summary of the invention
In view of the above problems, propose the present invention in case provide a kind of overcome the problems referred to above or solve at least in part or slow down the problems referred to above Network Based in address date determine POI effectiveness of information system and Network Based accordingly in address date determine the method for POI effectiveness of information.
According to an aspect of the present invention, provide a kind of Network Based in address date determine the system of POI effectiveness of information, this system comprises:
POI information acquisition unit, for the multiple relevant POI information utilizing the address date in network to obtain corresponding identical POI title based on search engine;
Statistic unit, for adding up the occurrence number in the address date of described POI information in described network;
POI information determination unit, for determining effective POI information of corresponding described identical POI title according to the occurrence number in the address date of described POI information in described network.
Preferably, described multiple relevant POI information is the information of corresponding at least one preset attribute of POI.
Preferably, described preset attribute be longitude and latitude, address, building name or included organization.
Preferably, described statistic unit comprises further:
POI information source acquisition module, for obtaining the source of described POI information;
POI information source reliability judge module, for judging whether described source belongs to reliable sources;
Statistical module, for adding up the occurrence number in the address date of described POI information in described network when originating and belonging to reliable sources; Otherwise do not add up.
Preferably, described POI information determination unit comprises further:
Judgment sub-unit, for judging that whether occurrence number in the address date of described POI information in described network is higher than predetermined threshold;
Information point information determination subelement, for when described judgment sub-unit is judged as YES, determines that obtained POI information is effective.
Preferably, described reliable sources are the source with predetermined confidence level.
Preferably, described source is website or webpage.
According to another aspect of the present invention, provide a kind of Network Based in address date determine to comprise the method for POI effectiveness of information:
The address date in network is utilized to obtain the multiple relevant POI information of corresponding identical POI title;
Add up the occurrence number in the address date of described POI information in described network;
Effective POI information of corresponding described identical POI title is determined according to the occurrence number in the address date of described POI information in described network.
Preferably, described multiple relevant POI information is the information of corresponding at least one preset attribute of POI.
Preferably, described preset attribute be longitude and latitude, address, building name or included organization.
Preferably, described step: add up the occurrence number in the address date of described POI information in described network, comprise further:
Obtain the source of described POI information;
Judge whether described source belongs to reliable sources, if so, then add up the occurrence number in the address date of described POI information in described network, otherwise do not add up.
Preferably, described step: the effective POI information determining corresponding described identical POI title according to the occurrence number in the address date of described POI information in described network, comprises further:
Judge that whether occurrence number in the address date of described POI information in described network is higher than predetermined threshold;
If so, then determine that described POI information is effective.
Preferably, described reliable sources are the source with predetermined confidence level.
Preferably, described source is website or webpage.
Beneficial effect of the present invention is:
The present invention obtains the multiple relevant POI information of corresponding identical POI title to the address date utilized in network, effective POI information of corresponding described identical POI title is determined according to the occurrence number in POI information address date in a network, thus make user can be quick, search exactly with once, one or more POI titles that the POI address of latitude is corresponding, then the frequency utilizing network voting mechanism to occur on the internet from one or more POI title according to information source and its is filtered, select POI name with a high credibility and be referred to as POI title corresponding to current POI address, improve the validity of POI information.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to technological means of the present invention can be better understood, and can be implemented according to the content of specification, and can become apparent, below especially exemplified by the specific embodiment of the present invention to allow above and other objects of the present invention, feature and advantage.
Embodiment
Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Being exemplary below by the embodiment be described with reference to the drawings, only for explaining the present invention, and can not limitation of the present invention being interpreted as.
Those skilled in the art of the present technique are appreciated that unless expressly stated, and singulative used herein " ", " one ", " described " and " being somebody's turn to do " also can comprise plural form.Should be further understood that, the wording used in specification of the present invention " comprises " and refers to there is described feature, integer, step, operation, element and/or assembly, but does not get rid of and exist or add other features one or more, integer, step, operation, element, assembly and/or their group.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, and all terms used herein (comprising technical term and scientific terminology), have the meaning identical with the general understanding of the those of ordinary skill in field belonging to the present invention.Should also be understood that those terms defined in such as general dictionary, should be understood to that there is the meaning consistent with the meaning in the context of prior art, unless and by specific definitions, otherwise can not explain by idealized or too formal implication.
Fig. 1 diagrammatically illustrate one embodiment of the invention Network Based in address date determine the block diagram of the system of POI effectiveness of information.
With reference to Fig. 1, the embodiment of the present invention Network Based in address date determine to comprise the system of POI effectiveness of information:
POI information acquisition unit 11, for the multiple relevant POI information utilizing the address date in network to obtain corresponding identical POI title based on search engine;
In the embodiment of the present invention, described multiple relevant POI information is the information of corresponding at least one preset attribute of POI.Further, described preset attribute be longitude and latitude, address, building name or included organization.
Statistic unit 12, for adding up the occurrence number in the address date of described POI information in described network;
POI information determination unit 13, for determining effective POI information of corresponding described identical POI title according to the occurrence number in the address date of described POI information in described network.
The embodiment of the present invention, captures address date based on search engine from network data, and described address date comprises name field and address information, based on the map address date that search engine excavates from the Internet, and such as name: Heng great Kunming company of real estate group; Address: 14th floor, North Star Fortune Center Building A, Panlong District, Kunming office building, wherein " Kunming company of Heng great real estate group " be the title of POI, the address of " 14th floor, North Star Fortune Center Building A, Panlong District, Kunming office building " POI for this reason, by resolving the latitude and longitude information that can obtain this place, address to the longitude and latitude of address, such as address " 14th floor, North Star Fortune Center Building A, Panlong District, Kunming office building " longitude and latitude is resolved the longitude and latitude obtained and is: east longitude: 102.733445 north latitude: 25.08108.In addition, the number of times needing statistics POI information to occur on the internet and record source.
So the form of the POI information of the different information sources that the address date finally excavated from the Internet is corresponding is as shown in table 1, specific as follows:
The form shfft of the POI information of the different information source of table 1
From table 1, same geographical position (longitude and latitude is identical) from the POI data that different source web obtains, likely there are repeated data, namely may there is multiple POI name in same address (longitude and latitude), as longitude and latitude same in table 1 exists multiple company, the POI longitude of its reality, latitude are identical, but the describing mode of POI title and POI address is different; It can also be seen that, the multiple different saying of same poi name possibility, such as " Baoshan show one's high ideals sale of automobile Co., Ltd " and " Baoshan show one's high ideals sale of automobile Services Co., Ltd ", the POI data of repeatability causes user cannot search POI title corresponding to the POI address of same POI geographical position (longitude and latitude) fast and accurately.
In the embodiment of the present invention, the address date in network is utilized to obtain the multiple relevant POI information of corresponding identical POI title based on search engine, wherein, multiple relevant POI information are the information of corresponding at least one preset attribute of POI, described preset attribute is longitude and latitude, address, building name or included organization, determines effective POI information of corresponding described identical POI title according to the occurrence number in the address date of described POI information in described network.
Further, step determines effective POI information of corresponding described identical POI title according to the occurrence number in the address date of described POI information in described network, comprise: the name field of corresponding same address information is carried out cluster according to keyword by the information according to the preset attribute of relevant POI information, the frequency that after Statistical Clustering Analysis, middle name field of all categories occurs, as second frequency, the POI title of this classification this address information corresponding is determined according to described second frequency, the POI title of this classification this address information corresponding is determined according to described second frequency, utilize the Internet " ballot " mechanism to choose the believable of identical POI title, effective POI information.
Further, determine one or more keyword based on described name field, the described keyword of corresponding same address information is carried out cluster, according to the name field after the keyword determination cluster after cluster.
Further, word process is cut to the title in described name field and generates participle, obtain the keyword of described name field according to described participle.
Further, the frequency that each participle adding up corresponding same address information occurs, as first frequency, generates the keyword of described name field according to described first frequency, be specially, select described first frequency minimum and be the keyword of participle as described name field of non-place name.
Further, name field the highest for second frequency described in each class described can be identified title as class by the present invention, using every class mark title all as to should the POI title of address information; Or, name field the highest for second frequency in each class described is identified title as class, class identification names maximum for occurrence number on network is referred to as to should the POI title of address information.
Wherein, word is cut to the title of POI information in excavated address date, and the number of times that after statistics cuts word, each word occurs, in same POI title, the minimum amount of information namely comprised of frequency of occurrence is maximum, and be the keyword that word of non-place name is designated as this POI title, data (word frequency is the poi name statistics according to about 9,000 ten thousand) as shown in table 2 after POI title cuts word in relevant POI information corresponding to address date occurred in such as table 1, in table 2, second is classified as the keyword got, specific as follows:
The tables of data of cutting after word of table 2POI title
According to keyword clustering: the POI title that same keyword is corresponding is designated as same class, above-mentioned several POI title can be classified as 5 classes, and the poi name that is existence 5 is different on this POI address, is respectively:
A: Bo Xin source, Baoshan automotive trade Co., Ltd;
B: Lancang River in Yunnan Province beer brewery groups Baoshan Co., Ltd's Lancang River in Yunnan Province beer brewery groups Baoshan Co., Ltd (map label);
C: show one's high ideals sale of automobile Services Co., Ltd in sale of automobile Co., Ltd Baoshan of showing one's high ideals, the Baoshan
D: Great Wall Automobile 4S shop, the Baoshan;
E: sale Co., Ltd (Chevrolet 4S shop) that is easily open to the traffic is melted in the Baoshan.
In order to embody the superiority of invention further, the following address date that the present invention is based in network of announcement further determines the internal structure in another embodiment of the statistic unit 12 in the system of POI effectiveness of information, embodies the details of another embodiment realized according to statistic unit 12.With reference to Fig. 2, statistic unit 12 comprises POI information source acquisition module 121, POI information source reliability judge module 122 and statistical module 123 further:
Described POI information source acquisition module 121, for obtaining the source of described POI information;
Described POI information source reliability judge module 122, for judging whether described source belongs to reliable sources;
Described statistical module 123, for adding up the occurrence number in the address date of described POI information in described network when originating and belonging to reliable sources; Otherwise do not add up.
In the present embodiment, in of a sort POI title, choose best POI title to solve according to " ballot " on interconnected, the confidence level in the frequency that so-called " ballot " mainly occurs on the internet according to this POI title and source, the frequency that the Internet occurs is the highest, that name the most believable of originating is the best name that will choose.Such as:
Only have a name in category-A, best is also this.
Have two names in category-B, wherein " Lancang River in Yunnan Province beer brewery groups Baoshan Co., Ltd " frequency of occurring is the highest, as best name.
Have two names in C class, wherein " Baoshan show one's high ideals sale of automobile Services Co., Ltd " frequency of occurring is the highest, as best name.
Only have a name, similar A equally in D class and E class.
In order to embody the superiority of invention further, the following address date that the present invention is based in network of announcement further determines the internal structure in another embodiment of the POI information determination unit 13 in the system of POI effectiveness of information, embodies the details of another embodiment realized according to POI information determination unit 13.Judgment sub-unit 131 and information point information determination subelement 132 is comprised further with reference to Fig. 3, POI information determination unit 13:
Described judgment sub-unit 131, for judging that whether occurrence number in the address date of described POI information in described network is higher than predetermined threshold;
Described information point information determination subelement 132, for when described judgment sub-unit is judged as YES, determines that obtained POI information is effective.
In the embodiment of the present invention, the frequency that POI information occurs on interconnected confidence level that is higher, source is more credible, then POI information is more credible.To the best POI name finally chosen according to its frequency occurred on interconnected and filter, higher than certain threshold value be then the believable POI information of finally excavation.
In the embodiment of the present invention, described reliable sources are the source with predetermined confidence level.Wherein, described source is website or webpage.
In the embodiment of the present invention, website or the webpage in the source of predetermined confidence level include but not limited to, as large-scale websites such as Sina, phoenix nets, by the website of official's certification, visitation frequency is higher, data traffic is large website and do not carry malicious link, virus link and customer satisfaction hands over high website etc.
In the embodiment of the present invention, confidence level is quantifiable, can quantize according to the access times of user and customer evaluation etc. to the confidence level of each website or webpage.And the confidence level of each website or webpage is dynamic change, if current site occur virus, swindle advertisement or utilize by other dolus malus websites, then its confidence level can decrease, the present invention passes through quantification and the dynamic conditioning of website confidence level, ensures the reliable, effective of the POI information obtained further.
The present embodiment obtains the multiple relevant POI information of corresponding identical POI title to the address date utilized in network, effective POI information of corresponding described identical POI title is determined according to the occurrence number in POI information address date in a network, thus make user can be quick, search exactly with once, one or more POI titles that the POI address of latitude is corresponding, then the frequency utilizing network voting mechanism to occur on the internet from one or more POI title according to information source and its is filtered, select POI name with a high credibility and be referred to as POI title corresponding to current POI address, improve the validity of POI information.
Fig. 4 diagrammatically illustrate one embodiment of the invention Network Based in address date determine the flow chart of the method for POI effectiveness of information.
With reference to Fig. 4, the embodiment of the present invention Network Based in address date determine that the method for POI effectiveness of information comprises the following steps:
S11, the address date in network is utilized to obtain the multiple relevant POI information of corresponding identical POI title;
S12, the occurrence number of adding up in the address date of described POI information in described network;
S13, determine effective POI information of corresponding described identical POI title according to the occurrence number in the address date of described POI information in described network.
In the embodiment of the present invention, described multiple relevant POI information is the information of corresponding at least one preset attribute of POI.Wherein, described preset attribute be longitude and latitude, address, building name or included organization.
The embodiment of the present invention, captures address date based on search engine from network data, and described address date comprises name field and address information, based on the map address date that search engine excavates from the Internet, and such as name: Heng great Kunming company of real estate group; Address: 14th floor, North Star Fortune Center Building A, Panlong District, Kunming office building, wherein " Kunming company of Heng great real estate group " be the title of POI, the address of " 14th floor, North Star Fortune Center Building A, Panlong District, Kunming office building " POI for this reason, by resolving the latitude and longitude information that can obtain this place, address to the longitude and latitude of address, such as address " 14th floor, North Star Fortune Center Building A, Panlong District, Kunming office building " longitude and latitude is resolved the longitude and latitude obtained and is: east longitude: 102.733445 north latitude: 25.08108.In addition, the number of times needing statistics POI information to occur on the internet and record source.
But, same geographical position (longitude and latitude is identical) from the POI data that different source web obtains, likely there are repeated data, namely may there is multiple POI name in same address (longitude and latitude), as, there is multiple company in same longitude and latitude, the POI longitude of its reality, latitude are identical, but the describing mode of POI title and POI address is different; It can also be seen that, the multiple different saying of same poi name possibility, such as " Baoshan show one's high ideals sale of automobile Co., Ltd " and " Baoshan show one's high ideals sale of automobile Services Co., Ltd ", the POI data of repeatability causes user cannot search POI title corresponding to the POI address of same POI geographical position (longitude and latitude) fast and accurately.
To this, the embodiment of the present invention, cuts word to the title of POI information in excavated address date, and the statistics number of times that after cutting word, each word occurs, in same POI title, the minimum amount of information namely comprised of frequency of occurrence is maximum, and is the keyword that word of non-place name is designated as this POI title.
In order to embody the superiority of invention further, the following address date that the present invention is based in network of announcement further determines the fine division step of step S12 in the method for POI effectiveness of information, embodies another embodiment realized according to this step.With reference to Fig. 5, the fine division step of this step comprises:
S121, obtain the source of described POI information;
S122, judge whether described source belongs to reliable sources, if so, then perform step S123;
S123, when described source belongs to reliable sources, add up the occurrence number in the address date of described POI information in described network, otherwise do not add up.
In the present embodiment, in of a sort POI title, choose best POI title to solve according to " ballot " on interconnected, the confidence level in the frequency that so-called " ballot " mainly occurs on the internet according to this POI title and source, the frequency that the Internet occurs is the highest, that name the most believable of originating is the best name that will choose.
In order to embody the superiority of invention further, the following address date that the present invention is based in network of announcement further determines the fine division step of step S13 in the method for POI effectiveness of information, embodies another embodiment realized according to this step.With reference to Fig. 6, the fine division step of this step comprises:
S131, judge that whether occurrence number in the address date of described POI information in described network is higher than predetermined threshold; If so, then step S132 is performed,
S132, determine that described POI information is effective.
In the embodiment of the present invention, the frequency that POI information occurs on interconnected confidence level that is higher, source is more credible, then POI information is more credible.To the best POI name finally chosen according to its frequency occurred on interconnected and filter, higher than certain threshold value be then the believable POI information of finally excavation.
In the embodiment of the present invention, described reliable sources are the source with predetermined confidence level.Wherein, described source is website or webpage.
In the embodiment of the present invention, website or the webpage in the source of predetermined confidence level include but not limited to, as large-scale websites such as Sina, phoenix nets, by the website of official's certification, visitation frequency is higher, data traffic is large website and do not carry malicious link, virus link and customer satisfaction hands over high website etc.
In the embodiment of the present invention, confidence level is quantifiable, can quantize according to the access times of user and customer evaluation etc. to the confidence level of each website or webpage.And the confidence level of each website or webpage is dynamic change, if current site occur virus, swindle advertisement or utilize by other dolus malus websites, then its confidence level can decrease, the present invention passes through quantification and the dynamic conditioning of website confidence level, ensures the reliable, effective of the POI information obtained further.
There is provided by adopting the embodiment of the present invention Network Based in address date determine the method for POI effectiveness of information, according to word frequency after cutting word time number excavate the keyword of poi name, and carry out cluster with this keyword, it is a class that the same poi name of different saying is gathered, solve the problem of the corresponding multiple poi name of same longitude and latitude, utilize the Internet " ballot " mechanism to choose best poi name, utilize interconnected upper " ballot " mechanism to choose believable poi information.
In sum, the present invention obtains the multiple relevant POI information of corresponding identical POI title to the address date utilized in network, effective POI information of corresponding described identical POI title is determined according to the occurrence number in POI information address date in a network, thus make user can be quick, search exactly with once, one or more POI titles that the POI address of latitude is corresponding, then the frequency utilizing network voting mechanism to occur on the internet from one or more POI title according to information source and its is filtered, select POI name with a high credibility and be referred to as POI title corresponding to current POI address, improve the validity of POI information.
It should be noted that the algorithm provided at this is intrinsic not relevant to any certain computer, virtual system or miscellaneous equipment with formula.Various general-purpose system also can with use based on together with this example.According to description above, the structure constructed required by this type systematic is apparent.In addition, the present invention is not also for any certain programmed language.It should be understood that and various programming language can be utilized to realize content of the present invention described here, and the description done language-specific is above to disclose preferred forms of the present invention.
In specification provided herein, describe a large amount of detail.But can understand, embodiments of the invention can be put into practice when not having these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the present invention and to help to understand in various aspects of the present invention one or more, in the description above to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes.But, the method and apparatus of the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires feature more more than the feature clearly recorded in each claim.Or rather, as claims reflect, all features of disclosed single embodiment before inventive aspect is to be less than.Therefore, the claims following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and adaptively can change the module in the equipment in embodiment and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and multiple submodule or subelement or sub-component can be put them in addition.Except at least some in such feature and/or process or unit be mutually repel except, any combination can be adopted to combine all processes of all features disclosed in this specification (comprising adjoint claim, summary and accompanying drawing) and so disclosed any method or equipment or unit.Unless expressly stated otherwise, each feature disclosed in this specification (comprising adjoint claim, summary and accompanying drawing) can by providing identical, alternative features that is equivalent or similar object replaces.
In addition, those skilled in the art can understand, although embodiments more described herein to comprise in other embodiment some included feature instead of further feature, the combination of the feature of different embodiment means and to be within scope of the present invention and to form different embodiments.
All parts embodiment of the present invention with hardware implementing, or can realize with the software module run on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions that microprocessor or digital signal processor (DSP) can be used in practice to realize according to the some or all parts in the web portal security checkout equipment of the embodiment of the present invention.The present invention can also be embodied as part or all equipment for performing method as described herein or device program (such as, computer program and computer program).Realizing program of the present invention and can store on a computer-readable medium like this, or the form of one or more signal can be had.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.
The above is only some embodiments of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.