WO2016107352A1

WO2016107352A1 - System and method for determining poi name and for determining validity of poi information

Info

Publication number: WO2016107352A1
Application number: PCT/CN2015/095857
Authority: WO
Inventors: 王智广; 魏少俊
Original assignee: 北京奇虎科技有限公司; 奇智软件（北京）有限公司
Priority date: 2014-12-29
Filing date: 2015-11-27
Publication date: 2016-07-07

Abstract

A system and method for determining a POI name and for determining validity of POI information. The method comprises: capturing address data from network data (S11); respectively extracting a name field and address information from one or a plurality of items of captured address data (S12); on the basis of the name field, determining one or more key words (S13); clustering the key words corresponding to the same address information to generate at least one cluster (S14); and according to the clustered key word, determining a POI name corresponding to the address information (S15). The method enables a user to rapidly and accurately search for a POI name corresponding to the POI addresses at the same latitude and longitude, thereby improving the user experience.

Description

System and method for determining POI name and determining validity of POI information

Technical field

The present invention relates to the field of electronic map technology, and in particular to a system and method for determining a POI name based on clustering, a cluster-based POI name determining system and method, and a POI information based on address data in a network. System and method of effectiveness.

Background technique

The Point of Interest (POI) is generally a geographic information point marked in an electronic map, and usually includes information such as a POI identifier, a POI name, a POI type, a longitude, and a latitude. The POI can be marked on the map with latitude and longitude information, which can be used to find and calculate navigation landmarks or buildings, such as shopping malls, parking lots, schools, hospitals, hotels, restaurants, supermarkets, parks, tourist attractions, etc.

More and more users query the POI in the electronic map, and the POI data stored in the database provides data support for the POI query. At present, the POI data in the database is updated mainly by performing data mining, updating the POI data stored in the database according to the data obtained by the actual acquisition, or obtaining POI data from various life information websites on the Internet, as long as The acquired data includes the name and address of the POI, and the data can be determined as a piece of POI data. Due to the characteristics of the acquisition and update of POI data, it is inevitable that there will be various POI data on the Internet. Therefore, there may be repetitive data in the POI data obtained from different source websites. That is, multiple POI data actually describe the same POI, and the actual POI longitude and latitude are the same, but the POI name and the POI address are described in the same way. different. The repetitive POI data causes the user to quickly and accurately search for the POI name corresponding to the POI address of the same POI geographic location (latitude and longitude), which affects the user experience.

Summary of the invention

In view of the above problems, the present invention has been made in order to provide a cluster-based POI name-based system and a corresponding cluster-based POI name-based method for overcoming the above problems or at least partially solving or alleviating the above problems, a clustering-based method The POI name determination system and method and a system and method for determining the validity of POI information based on address data in the network.

According to an aspect of the present invention, a system for determining a POI name based on clustering is provided, the system comprising:

An address data grabber for fetching address data from network data;

An address data parser, configured to separately extract a name field and address information from the captured one or more address data;

a keyword determiner for determining one or more keywords based on the name field;

a keyword clusterer for clustering the keywords corresponding to the same address information to generate at least one class;

The POI name generator is configured to determine a POI name corresponding to the address information according to the clustered keywords.

According to another aspect of the present invention, a method for determining a POI name based on clustering is provided, including:

Grab address data from network data;

Extracting the name field and address information from the captured one or more address data;

Determining one or more keywords based on the name field;

And clustering the keywords corresponding to the same address information to generate at least one class;

The POI name corresponding to the address information is determined according to the clustered keywords.

According to still another aspect of the present invention, a cluster-based POI name determination system is provided, the system comprising:

An address data grabber for extracting address data from network data based on a search engine, the address data including a name field and address information;

a name field clusterer for clustering name fields corresponding to the same address information according to keywords;

The second frequency statistic is used for counting the frequency of occurrence of the name field in each category after clustering, as the second frequency;

The POI name determining unit is configured to determine, according to the second frequency, a POI name corresponding to the address information of the category.

According to still another aspect of the present invention, a cluster-based POI name determining method is provided, including:

Obtaining address data from network data, the address data including a name field and address information;

The name fields corresponding to the same address information are clustered according to keywords;

The frequency at which the name field appears in each category after statistical clustering, as the second frequency;

The POI name corresponding to the address information of the category is determined according to the second frequency.

According to still another aspect of the present invention, a system for determining validity of POI information based on address data in a network is provided, the system comprising:

a POI information acquiring unit, configured to acquire, according to the search engine, a plurality of related POI information corresponding to the same POI name by using address data in the network;

a statistical unit, configured to count the number of occurrences of the POI information in the address data in the network;

a POI information determining unit, configured to determine, according to the number of occurrences of the POI information in the address data in the network, corresponding to the same Valid POI information for the POI name.

According to still another aspect of the present invention, a method for determining validity of POI information based on address data in a network is provided, including:

Acquiring a plurality of related POI information corresponding to the same POI name by using address data in the network;

Counting the number of occurrences of the POI information in address data in the network;

The valid POI information corresponding to the same POI name is determined according to the number of occurrences of the POI information in the address data in the network.

According to still another aspect of the present invention, a computer program is provided, comprising computer readable code that, when executed on a computing device, causes the computing device to perform a cluster-based determination as described above a method for determining a POI name, or causing the computing device to perform a cluster-based POI name determining method as described above, or causing the computing device to perform the network-based address data described in the network to determine the validity of the POI information. method.

According to still another aspect of the present invention, a computer readable medium is proposed, wherein the computer program described above is stored.

The beneficial effects of the invention are:

The invention extracts the name field and the address information by fetching the address data from the network data, determines one or more keywords based on the name field, and clusters the keywords corresponding to the same address information, based on the key after clustering The word determines the POI name corresponding to the address information, so that the user can quickly and accurately search for the POI name corresponding to the POI address of the same latitude and longitude, thereby improving the user experience.

The above description is only an overview of the technical solutions of the present invention, and the above-described and other objects, features and advantages of the present invention can be more clearly understood. Specific embodiments of the invention are set forth below.

DRAWINGS

Various other advantages and benefits will become apparent to those of ordinary skill in the art in the <RTIgt; The drawings are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the drawing:

1 is a block diagram schematically showing a system for determining a POI name based on clustering according to an embodiment of the present invention;

2 is a block diagram schematically showing a keyword determiner in a system for determining a POI name based on clustering according to another embodiment of the present invention;

3 is a block diagram schematically showing a POI name generator in a system for determining a POI name based on clustering according to another embodiment of the present invention;

4 is a block diagram schematically showing a POI name generator in a system for determining a POI name based on clustering according to another embodiment of the present invention;

FIG. 5 is a flow chart schematically showing a method for determining a POI name based on clustering according to an embodiment of the present invention; FIG.

FIG. 6 is a schematic diagram showing a subdivided flowchart of step S13 of a method for determining a POI name based on clustering according to another embodiment of the present invention; FIG.

FIG. 7 is a view schematically showing a subdivision flowchart of step S15 of a method for determining a POI name based on clustering according to another embodiment of the present invention;

FIG. 8 is a schematic diagram showing a subdivided flowchart of step S15 of the method for determining a POI name based on clustering according to another embodiment of the present invention; FIG.

FIG. 9 is a block diagram schematically showing a cluster-based POI name determining system according to an embodiment of the present invention; FIG.

FIG. 10 is a block diagram schematically showing a name field clusterer in a cluster-based POI name determination system according to another embodiment of the present invention; FIG.

11 is a block diagram schematically showing a second frequency statistic in a cluster-based POI name determining system according to another embodiment of the present invention;

FIG. 12 is a flow chart schematically showing a cluster-based POI name determining method according to an embodiment of the present invention; FIG.

FIG. 13 is a schematic flowchart showing a subdivision of step S122 of the cluster-based POI name determining method according to another embodiment of the present invention;

Fig. 14 is a view schematically showing a subdivision flow chart of step S123 of the cluster-based POI name determining method of another embodiment of the present invention.

15 is a block diagram schematically showing a system for determining validity of POI information based on address data in a network according to an embodiment of the present invention;

16 is a block diagram schematically showing a statistical unit in a system for determining validity of POI information based on address data in a network according to another embodiment of the present invention;

FIG. 17 is a block diagram schematically showing a POI information determining unit in a system for determining validity of POI information based on address data in a network according to another embodiment of the present invention; FIG.

FIG. 18 is a flow chart schematically showing a method for determining validity of POI information based on address data in a network according to an embodiment of the present invention; FIG.

FIG. 19 is a schematic flowchart showing a subdivision of step S812 of a method for determining validity of POI information based on address data in a network according to another embodiment of the present invention;

FIG. 20 is a schematic diagram showing a subdivided flow chart of step S813 of a method for determining validity of POI information based on address data in a network according to another embodiment of the present invention.

Figure 21 schematically shows a block diagram of a computing device for performing a method in accordance with the present invention;

Fig. 22 schematically shows a storage unit for holding or carrying program code implementing the method according to the invention.

Specific embodiment

The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are intended to be illustrative of the invention and are not to be construed as limiting.

The singular forms "a", "an", "the" It is to be understood that the phrase "comprise" or "an" Integers, steps, operations, components, components, and/or groups thereof.

Those skilled in the art will appreciate that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention belongs, unless otherwise defined. It should also be understood that terms such as those defined in a general dictionary should be understood to have a meaning consistent with the meaning of the prior art, and will not be idealized or overly formal unless specifically defined. The meaning is explained.

1 shows a block diagram of a system for determining a PO1 name based on clustering in accordance with one embodiment of the present invention.

Referring to FIG. 1, a system for determining a POI name based on clustering according to an embodiment of the present invention includes:

An address data grabber 11 for fetching address data from network data;

The address data parser 12 is configured to separately extract a name field and address information from the captured one or more address data;

a keyword determiner 13 for determining one or more keywords based on the name field;

a keyword clusterer 14 configured to cluster the keywords corresponding to the same address information to generate at least one class;

The POI name generator 15 is configured to determine a POI name corresponding to the address information according to the clustered keywords.

In the embodiment of the present invention, the address data is used by the search engine, and the address data includes a name field, address information, and a plurality of related POI information. In the embodiment of the present invention, the plurality of related POI information is at least one corresponding POI. Preset attribute information. Further, the preset attribute is a latitude and longitude, an address, a building name, or a unit name included.

In the embodiment of the present invention, the address data is captured from the network data based on the search engine, and the address data includes a name field and address information, based on map address data excavated by the search engine from the Internet, such as name: a certain real estate group Company company; address: ** City ** District 8 * Fortune Center Building A, 14th floor, of which "Some Real Estate Group ** Branch Company" is the name of POI, "** City ** District 8 * Fortune Center A The 14th floor of the office building "for the address of the POI, the latitude and longitude information of the address can be obtained by analyzing the latitude and longitude of the address, such as the latitude and longitude of the latitude and longitude analysis of the address "** City District 8* Fortune Center Building A, 14th Floor" For: East longitude: 102.733445 North latitude: 25.08108. In addition, it is necessary to count the number of times POI information appears on the Internet and the source of the record.

Therefore, the format of the POI information of different information sources corresponding to the address data mined from the Internet is as shown in Table 1, as follows:

Table 1 Format Table of POI Information from Different Information Sources

It can be seen from Table 1 that there may be repetitive data in the POI data obtained from different source websites in the same geographical location (the same latitude and longitude), that is, there may be multiple POI names in the same address (latitude and longitude), as in the same table 1 There are multiple companies in latitude and longitude, the actual POI longitude and latitude are the same, but the POI name and POI address are described in different ways. It can also be seen that the same poi name may have different opinions, such as "Baoshan Mingzhi Automobile Sales Limited" The company" and "Baoshan Mingzhi Automobile Sales and Service Co., Ltd.", the repetitive POI data caused the user to quickly and accurately search for the POI name corresponding to the POI address of the same POI geographic location (latitude and longitude).

In this regard, in the embodiment of the present invention, the address data is fetched from the network data, and the name field and the address information are respectively extracted from the captured one or more address data, and one or more keywords are determined based on the name field; The keywords corresponding to the same address information are clustered to generate at least one class, and the POI name corresponding to the address information is determined according to the clustered keywords, thereby obtaining the best poi name.

In order to further embody the superiority of the invention, the internal structure of the keyword determiner 13 in the system for determining the POI name based on clustering in the other embodiment is further disclosed as follows to embody another implementation implemented by the keyword determiner 13. The details of an embodiment. Referring to FIG. 2, the keyword determiner 13 further includes a word segmentation unit 131 and a keyword acquisition unit 132:

The word-cutting unit 131 is configured to perform word-cutting processing on the name in the name field to generate a word segmentation;

The keyword acquiring unit 132 is configured to acquire keywords of the address data according to the word segmentation.

The keyword obtaining unit further includes:

a first frequency statistics module, configured to count frequency of occurrence of each participle corresponding to the same address information, as the first frequency;

And a keyword generating module, configured to generate a keyword of the address data according to the first frequency.

The keyword generating module selects a word segment with the smallest frequency and is a non-place name as a keyword of the address data.

In the embodiment of the present invention, the name of the POI information in the mined address data is cut, and the number of occurrences of each word after the word is cut is counted. The least frequent occurrence of the same POI name includes the largest amount of information, and is a non-place name. The word is recorded as the keyword of the POI name. For example, the POI name in the relevant POI information corresponding to the address data appearing in Table 1 is as shown in Table 2 (the word frequency is based on the name of about 90 million poi) The second column in Table 2 is the obtained keywords, as follows:

Table 2 Data table after the wording of the POI name

Clustering according to keywords: The POI names corresponding to the same keyword are recorded as the same class. The above POI names can be classified into five classes, that is, there are five different poi names on the POI address.

In order to further embody the superiority of the invention, the internal structure of the POI name generator 15 in the system for determining the POI name based on clustering in the other embodiment is further disclosed as follows to embody another implementation implemented by the POI name generator 15. The details of an embodiment. Referring to FIG. 3, the POI name generator 15 further includes a frequency statistics unit 151, a class identification name determining unit 152, and a POI name determining unit 153:

The frequency statistics unit 151 is configured to calculate an appearance frequency of a name field in each class;

The class identifier name determining unit 152 is configured to use a name field with the highest frequency of occurrence in each class as a class identifier name;

The POI name determining unit 153 is configured to use each class identifier name as a POI name.

In this embodiment, each class identifier name is used as the POI name, and further is: clustering according to keywords: the POI names corresponding to the same keyword are recorded as the same class, and the above POI names can be classified into five classes. That is to say, there are 5 different poi names on this POI address, which are:

A: Baoshan Bo Xinyuan Automobile Trading Co., Ltd.;

B: Yunnan Province Minjiang Beer Group Baoshan Co., Ltd. Yunnan Province Minjiang Beer Group Baoshan Co., Ltd. (map marked);

C: Baoshan Mingzhi Automobile Sales Co., Ltd. Baoshan Mingzhi Automobile Sales & Service Co., Ltd.

D: Baoshan Great Wall Motor 4S shop;

E: Baoshan Rongyitong Automobile Sales Co., Ltd. (Chevrolet 4S shop).

In order to further embody the superiority of the invention, the internal structure of the POI name generator 15 in the system for determining the POI name based on clustering in the other embodiment is further disclosed as follows to embody another implementation implemented by the POI name generator 15. The details of an embodiment. Referring to FIG. 4, the POI name generator 15 further includes a frequency statistics unit 151', a class identification name determining unit 152', and a POI name determining unit 153':

a frequency statistics unit 151' for calculating an appearance frequency of a name field in each class;

a class identifier name determining unit 152', configured to use a name field having the highest frequency of occurrence in each of the classes as a class identifier name;

The POI name determining unit 153' is configured to select the class identification name having the highest frequency of occurrence as the POI name.

In this embodiment, among the POI names of the same type, the best POI name is selected according to the "voting" on the Internet. The so-called "voting" is mainly based on the frequency of the POI name appearing on the Internet and the source. Reliability, the name with the highest frequency and the most trusted source on the Internet is the best name to choose. such as:

There is only one name in class A, and the best one is this one.

There are two names in category B, among which “Yunjiang Beer Group Baoshan Co., Ltd.” has the highest frequency and is the best name.

There are two names in category C, of which “Baoshan Mingzhi Automobile Sales Service Co., Ltd.” appears the most frequently, as the best name.

There is only one name in class D and class E, similar to A.

In an embodiment of the invention, the reliable source is a source having a predetermined degree of confidence. Wherein, the source is a website or a webpage.

Among them, reliable sources of websites or web pages include, but are not limited to, large websites such as Sina and Phoenix, websites that have been officially certified, websites with high frequency of access, large data traffic, and no malicious links, virus links, and customer satisfaction. High-profile websites, etc.

In the embodiment of the present invention, the credibility of the website or the webpage of the reliable source is quantifiable, and the credibility of each website or webpage can be quantified according to the number of visits by the user and the customer evaluation. Moreover, the credibility of each website or webpage is dynamically changed. If the current website is infected with viruses, fraudulent advertisements or used by other malicious fraudulent websites, the credibility thereof will be reduced, and the present invention quantifies the credibility of the website. And dynamic adjustment to further ensure that the acquired POI information is reliable and effective.

The system for determining the POI name based on clustering according to the embodiment of the present invention searches for the keyword of the poi name according to the frequency of the word after the word is cut, and clusters the keyword to cluster the same poi name of different sayings. One type solves the problem that the same latitude and longitude corresponds to multiple poi names, and uses the Internet "voting" mechanism to select the best poi name.

FIG. 5 shows a flow chart of a method for determining a POI name based on clustering according to an embodiment of the present invention.

Referring to FIG. 5, a method for determining a POI name based on clustering according to an embodiment of the present invention includes the following steps:

S11. Obtain address data from network data.

S12. Extract name field and address information from the captured one or more address data respectively.

S13. Determine one or more keywords based on the name field;

S14. Cluster the keywords corresponding to the same address information to generate at least one class.

S15. Determine a POI name corresponding to the address information according to the clustered keyword.

In the embodiment of the present invention, the address data is captured from the network data based on the search engine, and the address data includes a name field and address information, based on map address data excavated by the search engine from the Internet, such as name: Evergrande Real Estate Group Kunming Company; Address: 14th Floor, Office Building, Block A, Beichen Fortune Center, Panlong District, Kunming City, including “Chengda Real Estate Group Kunming Company” is the name of POI, “14th Floor, Office Building, Block A, Beichen Fortune Center, Panlong District, Kunming”. Address of this POI The latitude and longitude information of the address can be obtained by analyzing the latitude and longitude of the address. For example, the address is “14th floor, office building, Block A, Beichen Fortune Center, Panlong District, Kunming”. The latitude and longitude of the latitude and longitude analysis is: east longitude: 102.733445 north latitude: 25.08108. In addition, it is necessary to count the number of times POI information appears on the Internet and the source of the record. In the POI data obtained from different source websites in the same geographical location (same latitude and longitude), there may be repetitive data, that is, there may be multiple POI names in the same address (latitude and longitude), as there are multiple companies in a latitude and longitude, the actual The POI longitude and latitude are the same, but the POI name and the POI address are described in different ways. It can also be seen that the same poi name may have different expressions, such as “Baoshan Mingzhi Automobile Sales Co., Ltd.” and “Baoshan Mingzhi Automobile Sales Service”. Ltd.", repetitive POI data causes users to quickly and accurately search for POI names corresponding to POI addresses of the same POI geographic location (latitude and longitude).

In order to further embody the superiority of the invention, the subdivision step of step S13 in the method for determining the POI name based on clustering according to the present invention is further disclosed as follows to embody another embodiment implemented according to this step. Referring to Figure 6, the subdivision steps of this step include:

S131. Perform word segmentation on the name in the name field to generate a participle;

S132. Acquire a keyword of the address data according to the word segmentation.

Step S132: Acquire the keyword of the address data according to the word segmentation, and further include:

Counting the frequency of occurrence of each participle corresponding to the same address information as the first frequency;

Generating keywords of the address data according to the first frequency.

Step: the keyword for generating the address data according to the first frequency is specifically:

A word segment having the smallest frequency and being a non-place name is selected as a keyword of the address data.

In order to further embody the superiority of the invention, the details of step S15 in the method for determining the POI name based on clustering according to the present invention are further disclosed as follows. A further step is taken to embody another embodiment implemented in accordance with this step. Referring to Figure 7, the subdivision steps of this step include:

S151. Calculate an appearance frequency of a name field in each class.

S152. The name field with the highest frequency of occurrence in each class is used as a class identifier name.

S153. Each class identifier name is taken as a POI name.

In this embodiment, each type of identifier name is used as the POI name corresponding to the address information, and clustered according to keywords: the POI name corresponding to the same keyword is recorded as the same category, and the above POI names can be classified into five categories. That is, there are 5 different poi names on this POI address, which are:

A: Baoshan Bo Xinyuan Automobile Trading Co., Ltd.;

D: Baoshan Great Wall Motor 4S shop;

E: Baoshan Rongyitong Automobile Sales Co., Ltd. (Chevrolet 4S shop).

In order to further embody the superiority of the invention, the subdivision step of step S15 in the method for determining the POI name based on clustering according to the present invention is further disclosed as follows to embody another embodiment implemented according to this step. Referring to Figure 8, the subdivision steps of this step include:

S151', calculating the frequency of occurrence of the name field in each class;

S152', the name field with the highest frequency of occurrence in each class is used as a class identifier name;

S153', selecting the class identification name with the highest frequency of occurrence as the POI name.

There is only one name in class A, and the best one is this one.

There is only one name in class D and class E, similar to A.

The method for determining a POI name based on clustering according to an embodiment of the present invention searches for a keyword of a poi name according to the frequency of the word after the word is cut, and clusters the keyword, and aggregates the same poi name of different sayings into One type solves the problem that the same latitude and longitude corresponds to multiple poi names, and uses the Internet "voting" mechanism to select the best poi name.

In summary, in the foregoing embodiment of the present invention, the name field and the address information are extracted by fetching the address data from the network data, the keyword is determined based on the name field, and the keywords corresponding to the same address information are clustered. The POI name corresponding to the address information is determined based on the clustered keywords, so that the user can quickly and accurately search for the POI name corresponding to the POI address of the same latitude and longitude, thereby improving the user experience.

Figure 9 is a block diagram showing a cluster-based POI name determination system in accordance with one embodiment of the present invention.

Referring to FIG. 9, a cluster-based POI name determining system according to an embodiment of the present invention includes:

An address data grabber 91, configured to fetch address data from network data based on a search engine, where the address data includes a name field and address information;

a name field clusterer 92, configured to cluster the name fields corresponding to the same address information according to keywords;

The second frequency statistic unit 93 is configured to count the frequency of occurrence of the name field in each category after clustering, as the second frequency;

The POI name determining unit 94 is configured to determine, according to the second frequency, a POI name corresponding to the address information of the category.

In the embodiment of the present invention, the address data is used by the search engine, and the address data includes a name field, address information, and a plurality of related POI information. In the embodiment of the present invention, the plurality of related POI information is at least one corresponding POI. Preset attribute information. Further, The preset attributes are latitude and longitude, address, building name, or unit name.

It can be seen from Table 1 above that in the POI data obtained from different source websites in the same geographical position (the same latitude and longitude), there may be repetitive data; it can also be seen that the same poi name may have many different claims.

In this regard, in the embodiment of the present invention, the address data is captured from the network data based on the search engine, and the address data includes a name field and address information, and the name fields corresponding to the same address information are clustered according to keywords, after statistical clustering. The frequency at which the name field appears in each category is used as the second frequency, and the POI name corresponding to the address information of the category is determined according to the second frequency, thereby obtaining the best poi name.

In order to further embody the superiority of the invention, the internal structure of the name field clusterer 92 in the cluster-based POI name determining system of the present invention is further disclosed as follows to implement the clustering by the name field clusterer 92. Details of another embodiment. Referring to FIG. 10, the name field clusterer 92 further includes a keyword determining unit 921, a keyword clustering unit 922, and a name field cluster determining unit 923:

The keyword determining unit 921 is configured to determine one or more keywords based on the name field;

The keyword clustering unit 922 is configured to cluster the keywords corresponding to the same address information;

The name field cluster determining unit 923 is configured to determine the clustered name field according to the clustered keywords.

Further, the keyword determining unit 921 further includes a word cutting module and a keyword obtaining module: the word cutting module is configured to perform word segmentation processing on the name in the name field to generate a word segmentation; a module, configured to acquire a keyword of the name field according to the word segmentation.

Further, the keyword obtaining module further includes a first frequency statistics sub-module and a keyword generation sub-module: the first frequency statistics sub-module is configured to count the frequency of occurrence of each participle corresponding to the same address information, as a first frequency; the keyword generating submodule, configured to generate a keyword of the name field according to the first frequency.

The keyword generation sub-module selects the word segment with the first frequency minimum and is not a place name as the keyword of the name field.

In the embodiment of the present invention, the name of the POI information in the mined address data is cut, and the number of occurrences of each word after the word is cut is counted. The least frequent occurrence of the same POI name includes the largest amount of information, and is a non-place name. The word is recorded as the keyword of the POI name. For example, the POI name of the relevant POI information corresponding to the address data appearing in Table 1 above is as shown in Table 2 above (the word frequency is based on the name of about 90 million poi). of). In order to further embody the superiority of the invention, the internal structure of the second frequency statistic 93 in the cluster-based POI name determining system of the present invention is further disclosed as follows to implement the second frequency statistic 93. Details of another embodiment. Referring to FIG. 11, the second frequency statisticator 93 further includes a name field source obtaining unit 931, a source reliability determining unit 932, and a second frequency counting unit 933:

The name field source obtaining unit 931 is configured to obtain a source of the name field;

The source reliability determining unit 932 is configured to determine whether the source is a reliable source;

The second frequency statistics unit 933 is configured to: when the determination is yes, count the frequency of occurrence of the name field as the second frequency, otherwise it is not counted.

In order to further embody the superiority of the invention, the internal structure of the POI name determining unit 94 in the cluster-based POI name determining system of the present invention in another embodiment is further disclosed as follows to embody another implementation implemented by the POI name determining unit 94. The details of an embodiment. The POI name determining unit 94 further includes a first class identification name determining module and a first POI name determining module:

The first type identifier name determining module is configured to use the name field with the highest frequency in the respective classes as the class identifier name;

The first POI name determining module is configured to use each type of identifier name as a POI name corresponding to the address information.

In this embodiment, each type of identification name is used as the POI name corresponding to the address information, and is clustered according to the keyword: the POI name corresponding to the same keyword is recorded as the same category, see Table 1 and Table 2, the above several POIs. The name can be classified into 5 classes, which means that there are 5 different poi names on this POI address, which are:

A: Baoshan Bo Xinyuan Automobile Trading Co., Ltd.;

D: Baoshan Great Wall Motor 4S shop;

E: Baoshan Rongyitong Automobile Sales Co., Ltd. (Chevrolet 4S shop).

In order to further embody the superiority of the invention, the internal structure of the POI name determining unit 94 in the cluster-based POI name determining system of the present invention in another embodiment is further disclosed as follows to embody another implementation implemented by the POI name determining unit 94. The details of an embodiment. The POI name determining unit 94 further includes a second class identification name determining module and a second POI name determining module:

The second type identifier name determining module is configured to use a name field with the highest frequency in the respective classes as a class identifier name;

The second POI name determining module is configured to use the class identifier name that has the most occurrences on the network as the POI name corresponding to the address information.

There is only one name in class A, and the best one is this one.

There is only one name in class D and class E, similar to A.

The cluster-based POI name determination system provided by the embodiment of the present invention mines the keywords of the POI name according to the frequency of the words after the word is cut, and clusters the keywords, and aggregates the same POI name of different sayings into One class solves the problem that the same latitude and longitude corresponds to multiple POI names, and uses the Internet "voting" mechanism to select the best POI name.

FIG. 12 is a flow chart showing a cluster-based POI name determination method according to an embodiment of the present invention.

Referring to FIG. 12, a cluster-based POI name determining method according to an embodiment of the present invention includes the following steps:

S121. Obtain address data from network data, where the address data includes a name field and address information.

S122: Cluster the name fields corresponding to the same address information according to keywords;

S123. The frequency of occurrence of the name field in each category after statistical clustering is used as the second frequency;

S124. Determine, according to the second frequency, a POI name corresponding to the address information of the category.

The address data is used by the search engine, and the address data includes a name field, address information, and a plurality of related POI information. In the embodiment of the present invention, the plurality of related POI information is information corresponding to at least one preset attribute of the POI. . Further, the preset attribute is a latitude and longitude, an address, a building name, or a unit name included.

In the embodiment of the present invention, the address data is captured from the network data based on the search engine, and the address data includes a name field and address information, based on map address data excavated by the search engine from the Internet, such as name: Evergrande Real Estate Group Kunming Company; Address: 14th Floor, Office Building, Block A, Beichen Fortune Center, Panlong District, Kunming City, including “Chengda Real Estate Group Kunming Company” is the name of POI, “14th Floor, Office Building, Block A, Beichen Fortune Center, Panlong District, Kunming”. Address of this POI The latitude and longitude information of the address can be obtained by analyzing the latitude and longitude of the address. For example, the address is “14th floor, office building, Block A, Beichen Fortune Center, Panlong District, Kunming”. The latitude and longitude of the latitude and longitude analysis is: east longitude: 102.733445 north latitude: 25.08108. In addition, it is necessary to count the number of times POI information appears on the Internet and the source of the record.

However, in the POI data obtained from different source websites in the same geographical location (same latitude and longitude), there may be repetitive data, that is, there may be multiple POI names in the same address (latitude and longitude), as there are multiple companies in one latitude and longitude, The actual POI longitude and latitude are the same, but the POI name and the POI address are described differently. It can also be seen that the same poi name may have different expressions, such as “Baoshan Mingzhi Automobile Sales Co., Ltd.” and “Baoshan Mingzhi Automobile”. Sales Service Co., Ltd.", the repetitive POI data caused the user to quickly and accurately search for the POI name corresponding to the POI address of the same POI geographic location (latitude and longitude).

In order to further embody the advantages of the invention, the subdivision step of step S122 in the cluster-based POI name determining method of the present invention is further disclosed as follows to embody another embodiment implemented according to this step. Referring to Figure 13, the subdivision steps of this step include:

S1221: Determine one or more keywords based on the name field;

S1222: clustering the keywords corresponding to the same address information;

S1223. Determine a clustered name field according to the clustered keywords.

Further, the step S1221: determining one or more keywords based on the name field, further comprising: performing word segmentation on the name field to generate a word segmentation; and acquiring a keyword of the name field according to the word segmentation.

Further, the step of: acquiring the keyword of the name field according to the word segmentation, further comprising: counting frequency of occurrence of each participle corresponding to the same address information as the first frequency; determining the name according to the first frequency Key words for the field.

Further, the step of determining, according to the first frequency, the keyword of the name field is specifically: selecting a word segment with a first frequency minimum and a non-place name as a keyword of the name.

In the embodiment of the present invention, the name of the POI information in the mined address data is cut, and the number of occurrences of each word after the word is cut is counted. The least frequent occurrence of the same POI name includes the largest amount of information, and is a non-place name. The word is recorded as the keyword of the POI name, and is clustered according to the keyword: the POI name corresponding to the same keyword is recorded as the same class.

In order to further embody the superiority of the invention, the subdivision step of step S123 in the cluster-based POI name determining method of the present invention is further disclosed as follows to embody another embodiment implemented according to this step. Referring to Figure 14, the subdivision steps of this step include:

S1231: Obtain a source of the name field;

S1232, determining whether the source is a reliable source, and if so, executing S1233;

S1233: Count the frequency of occurrence of the name field as the second frequency.

In order to further embody the superiority of the invention, the subdivision step of step S124 in the cluster-based POI name determining method of the present invention is further disclosed as follows to embody another embodiment implemented according to this step. The subdivision steps of this step include:

The name field with the highest frequency in the second class is used as the class identification name; each type of identification name is taken as the POI name corresponding to the address information.

The name field with the highest frequency in the second class is used as the class identification name; the class identification name with the most occurrences on the network is used as the POI name corresponding to the address information.

The clustering-based POI name determining method provided by the embodiment of the present invention mines the keywords of the POI name according to the frequency of the word after the word cutting, and clusters the keywords, and aggregates the same POI name of different sayings into One class solves the problem that the same latitude and longitude corresponds to multiple POI names, and uses the Internet "voting" mechanism to select the best POI name.

In summary, the above embodiment of the present invention extracts a name field and address information by fetching address data from network data, determines a keyword based on the name field, and clusters keywords corresponding to the same address information. The POI name corresponding to the address information is determined based on the clustered keywords, so that the user can quickly and accurately search for the POI name corresponding to the POI address of the same latitude and longitude, thereby improving the user experience.

Figure 15 is a block diagram schematically showing a system for determining the validity of POI information based on address data in a network, in accordance with one embodiment of the present invention.

Referring to FIG. 15, a system for determining validity of POI information based on address data in a network according to an embodiment of the present invention includes:

The POI information obtaining unit 511 is configured to acquire, according to the search engine, the plurality of related POI information corresponding to the same POI name by using the address data in the network;

In the embodiment of the present invention, the multiple related POI information is information corresponding to at least one preset attribute of the POI. Further, the preset attribute is a latitude and longitude, an address, a building name, or a unit name included.

The statistics unit 512 is configured to count the number of occurrences of the POI information in the address data in the network;

The POI information determining unit 513 is configured to determine valid POI information corresponding to the same POI name according to the number of occurrences of the POI information in the address data in the network.

It can be seen from the foregoing Table 1 that there may be repetitive data in the POI data obtained from different source websites in the same geographical location (the same latitude and longitude), that is, there may be multiple POI names in the same address (latitude and longitude); it can also be seen that The same poi name may have many different claims.

In the embodiment of the present invention, the search engine uses the address data in the network to obtain a plurality of related POI information corresponding to the same POI name, where the plurality of related POI information is information corresponding to at least one preset attribute of the POI, and the preset attribute For the latitude and longitude, the address, the building name, or the included unit name, valid POI information corresponding to the same POI name is determined according to the number of occurrences of the POI information in the address data in the network.

Further, determining valid POI information corresponding to the same POI name according to the number of occurrences of the POI information in the address data in the network, including: information corresponding to the same address information according to the preset attribute of the related POI information The name field is clustered according to keywords, and the frequency of occurrence of the name field in each category after clustering is counted as the second frequency, and the POI name corresponding to the address information of the category is determined according to the second frequency. It is said that the POI name corresponding to the address information of the category is determined according to the second frequency, and the "voting" mechanism of the Internet is used to select the trusted POI information of the same POI name.

Further, the one or more keywords are determined based on the name field, the keywords corresponding to the same address information are clustered, and the clustered name field is determined according to the clustered keywords.

Further, a word segmentation process is performed on the name in the name field to generate a word segment, and the keyword of the name field is obtained according to the word segmentation.

Further, the frequency of occurrence of each participle corresponding to the same address information is counted as the first frequency, and the keyword of the name field is generated according to the first frequency, specifically, the first frequency is selected to be the smallest and the non-place name is selected. The participle is used as the keyword of the name field.

Further, the present invention may use the name field with the highest frequency in the respective classes as the class identifier name, and each type of the identifier name as the POI name corresponding to the address information; or, in the respective classes The name field with the highest frequency is used as the class identification name, and the class identification name with the most occurrences on the network is taken as the POI name corresponding to the address information.

Wherein, the name of the POI information in the mined address data is cut, and the number of occurrences of each word after the word is cut is counted. The least frequent occurrence of the same POI name includes the largest amount of information, and the word of the non-place name is the most For the keyword of the POI name, for example, the POI name of the relevant POI information corresponding to the address data appearing in Table 1 is as shown in Table 2 above (the word frequency is counted according to the poi name of about 90 million). In order to further embody the superiority of the invention, the internal structure of the statistical unit 512 in the system for determining the validity of the POI information based on the address data in the network in the present invention is further disclosed as follows to implement the implementation according to the statistical unit 512. Details of another embodiment. Referring to FIG. 16, the statistics unit 512 further includes a POI information source obtaining module 5121, a POI information source reliability determining module 5122, and a statistics module 5123:

The POI information source obtaining module 5121 is configured to obtain a source of the POI information;

The POI information source reliability determining module 5122 is configured to determine whether the source is a reliable source;

The statistic module 5123 is configured to count the number of occurrences of the POI information in the address data in the network if the source belongs to a reliable source; otherwise, it is not counted.

There is only one name in class A, and the best one is this one.

There is only one name in class D and class E, similar to A.

In order to further demonstrate the superiority of the invention, the internal structure of the POI information determining unit 513 in the system for determining the validity of the POI information based on the address data in the network in another embodiment is further disclosed as follows to reflect the determination according to the POI information. Details of another embodiment implemented by unit 513. Referring to FIG. 17, the POI information determining unit 513 further includes a judging subunit 5131 and an information point information determining subunit 5132:

The determining subunit 5131 is configured to determine whether the number of occurrences of the POI information in the address data in the network is higher than a predetermined threshold;

The information point information determining sub-unit 5132 is configured to determine that the acquired POI information is valid if the determining sub-unit determines to be YES.

In the embodiment of the present invention, the higher the frequency of POI information appearing on the interconnection and the more credible the source is, the more credible the POI information is. The best selected POI name is filtered according to the frequency and source of its occurrence on the interconnection. Above a certain threshold is the final POI information.

In the embodiment of the present invention, the website or the webpage of the source of the predetermined credibility includes, but is not limited to, a large website such as Sina, Fenghuang.com, an officially certified website, a website with a relatively high frequency of access, a large data flow, and no maliciousness. Websites with links, virus links, and high customer satisfaction.

In the embodiment of the present invention, the credibility is quantifiable, and the credibility of each website or webpage can be quantified according to the number of visits by the user and the customer evaluation. Moreover, the credibility of each website or webpage is dynamically changed. If the current website is infected with viruses, fraudulent advertisements or used by other malicious fraudulent websites, the credibility thereof will be reduced, and the present invention quantifies the credibility of the website. And dynamic adjustment to further ensure that the acquired POI information is reliable and effective.

In this embodiment, a plurality of related POI information corresponding to the same POI name are obtained by using address data in the network, and valid POI information corresponding to the same POI name is determined according to the number of occurrences of the POI information in the address data in the network, thereby making the user Ability to search quickly and accurately One or more POI names corresponding to the POI address of the same latitude and longitude, and then using the network voting mechanism to filter from one or more POI names according to the information source and the frequency of occurrence on the Internet, and selecting a highly credible The POI name is used as the POI name corresponding to the current POI address to improve the validity of the POI information.

FIG. 18 is a flow chart schematically showing a method of determining validity of POI information based on address data in a network according to an embodiment of the present invention.

Referring to FIG. 18, a method for determining validity of POI information based on address data in a network according to an embodiment of the present invention includes the following steps:

S811. Acquire, by using address data in the network, multiple related POI information corresponding to the same POI name.

S812. Count the number of occurrences of the POI information in the address data in the network.

S813. Determine valid POI information corresponding to the same POI name according to the number of occurrences of the POI information in the address data in the network.

In the embodiment of the present invention, the multiple related POI information is information corresponding to at least one preset attribute of the POI. The preset attribute is a latitude and longitude, an address, a building name, or a unit name.

In the POI data obtained from different source websites in the same geographical location (same latitude and longitude), there may be repetitive data, and the same poi name may be different.

In this regard, in the embodiment of the present invention, the name of the POI information in the mined address data is cut, and the number of occurrences of each word after the word-cutting is counted, and the frequency of occurrence of the same POI name is the least, that is, the amount of information is the largest, and The word of the non-place name is recorded as the keyword of the POI name.

In order to further embody the advantages of the invention, the subdivision step of step S812 in the method for determining the validity of the POI information based on the address data in the network is further disclosed as follows to embody another embodiment implemented according to this step. Referring to Figure 19, the subdivision steps of this step include:

S8121: Obtain a source of the POI information;

S8122, determining whether the source is a reliable source, and if so, executing step S123;

S8123: When the source belongs to a reliable source, count the number of occurrences of the POI information in the address data in the network, otherwise it is not counted.

In this embodiment, among the POI names of the same type, the best POI name is selected according to the "voting" on the Internet. The so-called "voting" is mainly based on the frequency of the POI name appearing on the Internet and the source. Reliability, the name with the highest frequency and the most trusted source on the Internet is the best name to choose.

In order to further embody the advantages of the invention, the subdivision step of step S813 in the method for determining the validity of the POI information based on the address data in the network is further disclosed as follows to embody another embodiment implemented according to this step. Referring to Figure 20, the subdivision steps of this step include:

S8131, determining whether the number of occurrences of the POI information in the address data in the network is higher than a predetermined threshold; if yes, executing step S8132,

S8132: Determine that the POI information is valid.

By using the method for determining the validity of the POI information based on the address data in the network provided by the embodiment of the present invention, the keywords of the poi name are searched according to the frequency of the word after the word is cut, and the keywords are clustered by the keyword. The same poi name is grouped together to solve the problem of multiple poi names corresponding to one latitude and longitude. The Internet "voting" mechanism is used to select the best poi name, and the "voting" mechanism on the Internet is used to select trusted poi information. .

In summary, the foregoing embodiment of the present invention acquires a plurality of related POI information corresponding to the same POI name by using address data in the network, and determines, according to the number of occurrences of the POI information in the address data in the network, the same POI name. Effective POI information, enabling users to quickly and accurately search for one or more POI names corresponding to the same latitude and longitude POI address, and then use the online voting mechanism to follow the information source from one or more POI names and on the Internet. The frequency of occurrence is filtered, and the POI name with high reliability is selected as the current POI address. The POI name improves the validity of the POI information.

It should be noted that the algorithms and formulas provided herein are not inherently related to any particular computer, virtual system, or other device. Various general purpose systems can also be used with the examples based herein. The structure required to construct such a system is apparent from the above description. Moreover, the invention is not directed to any particular programming language. It is to be understood that the invention may be embodied in a variety of programming language, and the description of the specific language has been described above in order to disclose the preferred embodiments of the invention.

In the description provided herein, numerous specific details are set forth. However, it is understood that the embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques are not shown in detail so as not to obscure the understanding of the description.

Similarly, the various features of the present invention are sometimes grouped together into a single embodiment in the above description of the exemplary embodiments of the present invention in order to the , diagram, or description of it. However, the method and apparatus disclosed are not to be interpreted as reflecting the invention that the claimed invention is claimed to have more features than those recited in the claims. Rather, as the following claims reflect, inventive aspects reside in less than all features of the single embodiments disclosed herein. Therefore, the claims following the specific embodiments are hereby explicitly incorporated into the embodiments, and each of the claims as a separate embodiment of the invention.

Those skilled in the art will appreciate that the modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components. In addition to such features and/or at least some of the processes or units being mutually exclusive, any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined. Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.

In addition, those skilled in the art will appreciate that, although some embodiments described herein include certain features that are included in other embodiments and not in other features, combinations of features of different embodiments are intended to be within the scope of the present invention. Different embodiments are formed and formed.

The various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or digital signal processor (DSP) may be used in practice to implement some or all of some or all of the components of the POI name-based system based on clustering in accordance with embodiments of the present invention. Features. The invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein. Such a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.

For example, Figure 21 schematically illustrates a block diagram of a computing device for performing the method in accordance with the present invention. The computing device conventionally includes a processor 2110 and a computer program product or computer readable medium in the form of a memory 2120. The memory 2120 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM. Memory 2120 has a storage space 2130 for program code 2131 for performing any of the method steps described above. For example, storage space 2130 for program code may include various program code 2131 for implementing various steps in the above methods, respectively. The program code can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such a computer program product is typically a portable or fixed storage unit as described with reference to FIG. The storage unit may have a storage segment, a storage space, and the like that are similarly arranged to the storage 2120 in the computing device of FIG. The program code can be compressed, for example, in an appropriate form. Typically, the storage unit comprises computer readable code 2131' for performing the steps of the method according to the invention, ie code that can be read by a processor such as, for example, 2110, which when executed by the computing device causes the calculation The device performs the various steps in the methods described above.

It is to be noted that the above-described embodiments are illustrative of the invention and are not intended to be limiting, and that the invention may be devised without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as a limitation. The word 'comprising' does not exclude the presence of the elements or steps that are not recited in the claims. The word "a" or "an" The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means can be embodied by the same hardware item. The use of the words first, second, and third does not indicate any order. These words can be interpreted as names.

In addition, it should be noted that the language used in the specification has been selected for the purpose of readability and teaching, and is not intended to be construed or limited. Therefore, many modifications and changes will be apparent to those skilled in the art without departing from the scope of the invention. The disclosure of the present invention is illustrative for the scope of the present invention, and Without limiting the scope of the invention, the scope of the invention is defined by the appended claims.

The present invention is applicable to computer systems/servers that can operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems/servers include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.

The computer system/server can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system. Generally, program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types. The computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network. In a distributed cloud computing environment, program modules may be located on a local or remote computing system storage medium including storage devices.

"an embodiment," or "an embodiment," or "one or more embodiments" as used in the context of the present invention means that the particular features, structures, or characteristics described in connection with the embodiments are included in at least one embodiment of the invention. In addition, it is noted that the phrase "in one embodiment" is not necessarily referring to the same embodiment.

The above is only a part of the embodiments of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.

Claims

A system for determining a POI name based on clustering, the system comprising:

An address data grabber for fetching address data from network data;

An address data parser, configured to separately extract a name field and address information from the captured one or more address data;

a keyword determiner for determining one or more keywords based on the name field;

a keyword clusterer for clustering the keywords corresponding to the same address information to generate at least one class;

The POI name generator is configured to determine a POI name corresponding to the address information according to the clustered keywords.
The system of claim 1 wherein said keyword determiner further comprises:

a word unit for performing word segmentation on the name in the name field to generate a word segmentation;

And a keyword acquiring unit, configured to acquire a keyword of the address data according to the word segmentation.
The system of any of claims 1-2, the keyword acquisition unit further comprising:

a first frequency statistics module, configured to count frequency of occurrence of each participle corresponding to the same address information, as the first frequency;

And a keyword generating module, configured to generate a keyword of the address data according to the first frequency.
The system according to any one of claims 1 to 3, wherein the keyword generating module selects a word segment having the smallest frequency and being a non-place name as a keyword of the address data.
The system of any of claims 1-4, the POI name generator further comprising:

a frequency statistics unit for calculating the frequency of occurrence of the name field in each class;

a class identifier name determining unit, configured to use a name field with the highest frequency of occurrence in each of the classes as a class identifier name;

The POI name determining unit is configured to use each class identification name as the POI name.
The system of any of claims 1-4, the POI name generator further comprising:

a frequency statistics unit for calculating the frequency of occurrence of the name field in each class;

a class identifier name determining unit, configured to use a name field with the highest frequency of occurrence in each of the classes as a class identifier name;

The POI name determining unit is configured to select the class identifier name with the highest frequency of occurrence as the POI name.
A method for determining a POI name based on clustering, comprising:

Grab address data from network data;

Extracting the name field and address information from the captured one or more address data;

Determining one or more keywords based on the name field;

And clustering the keywords corresponding to the same address information to generate at least one class;

The POI name corresponding to the address information is determined according to the clustered keywords.
The method of claim 7, the step of determining one or more keywords based on the name field, further comprising:

Performing word segmentation on the name in the name field to generate a participle;

Obtaining keywords of the address data according to the word segmentation.
The method according to any one of claims 7 to 8, wherein the step of: acquiring the keyword of the address data according to the word segmentation further comprises:

Counting the frequency of occurrence of each participle corresponding to the same address information as the first frequency;

Generating keywords of the address data according to the first frequency.
The method according to any one of claims 7-9, wherein the step of generating the keyword data according to the first frequency is specifically:

A word segment having the smallest frequency and being a non-place name is selected as a keyword of the address data.
The method according to any one of claims 7 to 10, wherein the step of: determining the POI name corresponding to the address information according to the clustered keywords, further comprising:

Calculate the frequency of occurrence of name fields in each class;

Name the name of the highest frequency in each class as the class identifier name;

Each class ID name is taken as the POI name.
The method according to any one of claims 7 to 11, wherein the step of: determining the POI name corresponding to the address information according to the clustered keywords, further comprising:

Calculate the frequency of occurrence of name fields in each class;

Name the name of the highest frequency in each class as the class identifier name;

Select the class ID name with the highest frequency of occurrence as the POI name.
A cluster-based POI name determination system, comprising:

An address data grabber for extracting address data from network data based on a search engine, the address data including a name field and address information;

a name field clusterer for clustering name fields corresponding to the same address information according to keywords;

The second frequency statistic is used for counting the frequency of occurrence of the name field in each category after clustering, as the second frequency;

The POI name determining unit is configured to determine, according to the second frequency, a POI name corresponding to the address information of the category.
The system of claim 13 wherein said name field clusterer further comprises:

a keyword determining unit, configured to determine one or more keywords based on the name field;

a keyword clustering unit, configured to cluster the keywords corresponding to the same address information;

The name field cluster determining unit is configured to determine the clustered name field according to the clustered keywords.
The system of any of claims 13-14, the keyword determining unit further comprising:

a word cutting module, configured to perform word segmentation on the name in the name field to generate a word segmentation;

And a keyword obtaining module, configured to acquire a keyword of the name field according to the word segmentation.
The system of any of claims 13-15, the keyword acquisition module further comprising:

a first frequency statistics sub-module, configured to count the frequency of occurrence of each participle corresponding to the same address information, as the first frequency;

And a keyword generating submodule, configured to generate a keyword of the name field according to the first frequency.
The system according to any one of claims 13 to 16, wherein the keyword generation sub-module selects the word segmentation of the first frequency minimum and non-place name as a keyword of the name field.
The system of any of claims 13-17, the second frequency statistic further comprising:

a name field source obtaining unit, configured to obtain a source of the name field;

a source reliability determining unit, configured to determine whether the source is a reliable source;

The second frequency statistics unit is configured to: when the determination is yes, count the frequency of occurrence of the name field as the second frequency, otherwise it is not counted.
The system of any of claims 13-18, the POI name determining unit further comprising:

a class identifier name determining module, configured to use the name field with the highest frequency in the respective classes as the class identifier name;

The first POI name determining module is configured to use each type of identification name as the POI name corresponding to the address information.
The system of any one of claims 13 to 19, wherein the POI name determining unit further comprises:

a class identifier name determining module, configured to use a name field with the highest frequency in the second class as the class identifier name;

The second POI name determining module is configured to use the class identifier name that has the most occurrence on the network as the POI name corresponding to the address information.
A method for determining a POI name based on clustering, comprising:

Obtaining address data from network data, the address data including a name field and address information;

The name fields corresponding to the same address information are clustered according to keywords;

The frequency at which the name field appears in each category after statistical clustering, as the second frequency;

The POI name corresponding to the address information of the category is determined according to the second frequency.
The method of claim 21, wherein the clustering of the name fields corresponding to the same address information by keywords further comprises:

Determining one or more keywords based on the name field;

Clustering the keywords corresponding to the same address information;

The clustered name field is determined according to the clustered keywords.
The method of any one of claims 21 to 22, wherein the determining one or more keywords based on the name field further comprises:

Performing word segmentation on the name field to generate a participle;

The keyword of the name field is obtained according to the word segmentation.
The method according to any one of claims 21 to 23, wherein the obtaining the keyword of the name field according to the word segmentation further comprises:

Counting the frequency of occurrence of each participle corresponding to the same address information as the first frequency;

Determining a keyword of the name field according to the first frequency.
The method according to any one of claims 21 to 24, wherein the determining the keyword of the name field according to the first frequency is specifically:

A participle whose first frequency is the smallest and is not a place name is selected as the keyword of the name.
The method according to any one of claims 21 to 25, wherein the frequency of occurrence of the name field in each category after the statistical clustering, as the second frequency, further comprises:

Get the source of the name field;

It is determined whether the source is a reliable source, and if so, the frequency at which the name field appears is counted as the second frequency.
The method according to any one of claims 21 to 26, wherein the determining, according to the second frequency, the POI name corresponding to the address information of the category, further comprising:

Name the name of the second highest frequency in each of the classes as a class identifier name;

Each type of identification name is taken as the POI name corresponding to the address information.
The method according to any one of claims 21 to 27, wherein the determining, according to the second frequency, the POI name corresponding to the address information of the category, further comprising:

Name the name of the second highest frequency in each of the classes as a class identifier name;

The class identifier name that appears most frequently on the network is taken as the POI name corresponding to the address information.
A system for determining validity of POI information based on address data in a network, the system comprising:

a POI information acquiring unit, configured to acquire, according to the search engine, a plurality of related POI information corresponding to the same POI name by using address data in the network;

a statistical unit, configured to count the number of occurrences of the POI information in the address data in the network;

The POI information determining unit is configured to determine valid POI information corresponding to the same POI name according to the number of occurrences of the POI information in the address data in the network.
The system of claim 29, wherein the plurality of related POI information is information corresponding to at least one preset attribute of the POI.
The system of any one of claims 29-30, wherein the preset attribute is a latitude and longitude, an address, a building name, or a unit name.
The system of any of claims 29-31, the statistical unit further comprising:

a POI information source obtaining module, configured to obtain a source of the POI information;

The POI information source reliability judging module is configured to judge whether the source is a reliable source;

The statistics module is configured to count the number of occurrences of the POI information in the address data in the network if the source belongs to a reliable source; otherwise, it is not counted.
The system of any of claims 29-32, the POI information determining unit further comprising:

a determining subunit, configured to determine whether the number of occurrences of the POI information in the address data in the network is higher than a predetermined threshold;

The information point information determining subunit is configured to determine that the acquired POI information is valid if the determining subunit determines that it is YES.
A system according to any of claims 29-33, said reliable source being a source having a predetermined degree of confidence.
A system according to any of claims 29-34, the source being a website or a web page.
A method for determining validity of POI information based on address data in a network, the method comprising:

Acquiring a plurality of related POI information corresponding to the same POI name by using address data in the network;

Counting the number of occurrences of the POI information in address data in the network;

The valid POI information corresponding to the same POI name is determined according to the number of occurrences of the POI information in the address data in the network.
The method of claim 36, wherein the plurality of related POI information is information corresponding to at least one preset attribute of the POI.
The method according to any one of claims 36 to 37, wherein the preset attribute is a latitude and longitude, an address, a building name, or a unit name.
The method of any of claims 36-38, the step of: counting the number of occurrences of the POI information in the address data in the network, further comprising:

Obtaining the source of the POI information;

Determining whether the source is a reliable source, and if so, counting the number of occurrences of the POI information in the address data in the network, otherwise it is not counted.
The method according to any one of claims 36 to 39, wherein the step of: determining valid POI information corresponding to the same POI name according to the number of occurrences of the POI information in the address data in the network, further comprising:

Determining whether the number of occurrences of the POI information in the address data in the network is higher than a predetermined threshold;

If so, it is determined that the POI information is valid.
A method according to any one of claims 36 to 40, wherein the reliable source is a source having a predetermined degree of confidence.
A method according to any one of claims 36 to 41, wherein the source is a website or a web page.
A computer program comprising computer readable code, when said computer readable code is run on a computing device, causing said computing device to perform clustering based determination of POI according to any of claims 7-12 a method of naming, or causing the computing device to perform a cluster-based POI name determination method according to any one of claims 21-28, or causing the computing device to perform according to claims 36-42 Any of the methods for determining validity of POI information based on address data in a network.
A computer readable medium storing the computer program of claim 43.