WO2016155386A1 - Procédé et dispositif permettant de déterminer si une page web comprend des données de point d'intérêt (poi) - Google Patents

Procédé et dispositif permettant de déterminer si une page web comprend des données de point d'intérêt (poi) Download PDF

Info

Publication number
WO2016155386A1
WO2016155386A1 PCT/CN2015/099580 CN2015099580W WO2016155386A1 WO 2016155386 A1 WO2016155386 A1 WO 2016155386A1 CN 2015099580 W CN2015099580 W CN 2015099580W WO 2016155386 A1 WO2016155386 A1 WO 2016155386A1
Authority
WO
WIPO (PCT)
Prior art keywords
poi data
poi
webpage
information
name
Prior art date
Application number
PCT/CN2015/099580
Other languages
English (en)
Chinese (zh)
Inventor
王智广
魏少俊
Original Assignee
北京奇虎科技有限公司
奇智软件(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京奇虎科技有限公司, 奇智软件(北京)有限公司 filed Critical 北京奇虎科技有限公司
Publication of WO2016155386A1 publication Critical patent/WO2016155386A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Definitions

  • the present invention relates to the field of computer technology, and in particular, to a method and apparatus for determining POI data including points of interest in a web page.
  • a POI Point Of Interest
  • the POI data includes address information and a POI name.
  • the traditional POI data acquisition method requires the technician to use a precise surveying and mapping instrument to obtain the latitude and longitude information of each POI, and then mark it down. This method is time-consuming and laborious, resulting in a small amount of POI data obtained through the acquisition, geographic information. It is difficult for the system to provide a high level of service based on a small amount of POI data.
  • the webpage containing POI data can be collected from the Internet, and the POI data is extracted from the collected webpage for use by the geographic information system, labor and time are greatly saved.
  • the Internet is full of fake POI data.
  • the content of the blog page contains “original address: http://xxx.xxx.xxx/xxx”. Although it contains the word “address”, the address is a network address or The URL (Uniform Resoure Locator) is not the geographical address information in the POI data; thus, the proportion of the fake POI data in the collected POI data is high.
  • the present invention is directed to the disadvantages of the prior art, and provides a method and apparatus for determining POI data including POIs in a web page to solve the problem of collecting more false POI data in the prior art.
  • the present invention provides a method for determining a POI data including a POI in a webpage page, including:
  • the POI name corresponding to the POI data is searched in the webpage page to determine whether the POI name of the POI data is included in the webpage page;
  • the web page When the POI name of the POI data is included in the web page, it is determined that the web page includes the POI data of the POI.
  • the present invention further provides an apparatus for determining a POI data including a POI in a webpage page, including:
  • a POI data acquisition module configured to acquire multiple POI data from the Internet
  • a webpage crawling module for crawling a plurality of webpage pages including address information
  • a latitude and longitude information normalization module configured to normalize address information in the plurality of POI data and address information included in the plurality of webpage pages into latitude and longitude information
  • the latitude and longitude information matching module is configured to perform matching on the latitude and longitude information of the plurality of POI data and the latitude and longitude information in the plurality of webpage pages based on the same latitude and longitude information;
  • the webpage includes a POI name determining module, configured to search for the POI data and the webpage page having the same latitude and longitude information, and search for the POI of the POI data in the webpage page according to the POI name corresponding to the POI data. name;
  • the webpage includes a POI data determining module, configured to determine, when the webpage includes the POI name of the POI data, the webpage page includes the POI data of the POI.
  • a computer program comprising computer readable code that, when executed on a computing device, causes the computing device to perform the method for determining a web page as described above
  • the method of including POI data of interest points is included in the page.
  • a computer readable medium wherein the computer program described above is stored.
  • the address information is normalized into latitude and longitude information, and the geographical address information can be filtered out. Due to the uniqueness of the latitude and longitude, the accuracy of the matching result based on the latitude and longitude information is much higher than the existing text-based information.
  • the accuracy of the matching results which facilitates subsequent collection avoidance POI data to the fake address information; based on the latitude and longitude information of the POI data and the latitude and longitude information in the webpage page, further determining whether the POI name of the POI data is included in the webpage page to accurately determine whether the POI data is included in the same
  • the webpage page facilitates the subsequent determination of the accuracy of the collected POI data according to the authority and accuracy of the content recorded on the webpage, thereby providing a reliable guarantee for collecting large quantities of highly accurate POI data in the Internet. .
  • 1a is a schematic flowchart of a method for determining POI data including a POI in a webpage page according to an embodiment of the present invention
  • FIG. 1b is a schematic diagram of a webpage including multiple POI data according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a framework for determining an internal structure of an apparatus for including POI data of a point of interest in a webpage page according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a framework of an internal structure of a POI data acquiring module according to an embodiment of the present invention
  • Figure 4 shows schematically a block diagram of a computing device for performing the method according to the invention
  • Fig. 5 schematically shows a storage unit for holding or carrying program code implementing the method according to the invention.
  • FIG. 1a is a schematic flowchart diagram of a method for determining POI data including POI in a web page according to the present invention.
  • S101 Acquire multiple POI data from the Internet;
  • S102 Crawl a plurality of webpage pages including address information;
  • S103 normalize address information in the plurality of POI data and address information included in the plurality of webpage pages into latitude and longitude respectively Information;
  • S104 matching latitude and longitude information of the plurality of POI data with latitude and longitude information of the plurality of webpage pages based on the same latitude and longitude information;
  • S105 for the POI data and the webpage page having the same latitude and longitude information, according to the POI corresponding to the POI data The name is searched in the webpage page to determine whether the POI name of the POI data is included in the webpage page.
  • S106 When the webpage page includes the POI name of the POI data, it is determined that the webpage page includes the POI data of the POI.
  • the method for determining the POI data of the interest point in the webpage of the present invention normalizes the address information into the latitude and longitude information, and can filter the address information of the geographical location, and the matching result based on the latitude and longitude information is due to the uniqueness of the latitude and longitude.
  • the accuracy is much higher than the accuracy of the existing text-based matching result, thereby facilitating the subsequent avoidance of collecting the data of the fake address information; on the basis of matching the latitude and longitude information of the POI data with the latitude and longitude information in the webpage page, Further determining whether the POI name of the POI data is included in the webpage page to accurately determine whether the POI data is included in the same webpage page, thereby facilitating subsequent determination of the collected POI data according to the authority and accuracy of the content recorded on the webpage page.
  • the accuracy provides a reliable guarantee for collecting high-accuracy POI data in the Internet in large quantities.
  • S101 Acquire multiple POI data from the Internet.
  • a plurality of web pages including POI data are crawled from the Internet using a program of a web crawler; and then a plurality of POI data are extracted from a plurality of web pages including POI data.
  • the POI data includes address information and a POI name; preferably, the POI data may also include a contact, a zip code, a network label, and the like.
  • the inventors of the present invention have found that there are such web pages in the Internet, in which the content of each web page contains one or more POI data, and the address information in the POI data includes address keywords such as "address"; and these
  • the page structure feature URL format of the web page, and the location and format of the POI data in the web page are regular. In other words, POI data can be quickly extracted from these web pages in a unified way.
  • a plurality of URLs (Uniform Resole Locators) corresponding to a plurality of web pages including address keywords such as “address” can be crawled from the Internet; and pattern aggregation is performed on the plurality of URLs obtained by the crawling A class that clusters URLs with the same structural characteristics into the same pattern set.
  • URLs Uniform Resole Locators
  • a web page including only one POI data is obtained, and URLs of all web pages including only one POI data are obtained; pattern clustering is performed on all obtained URLs, and the same structural features are used. URL clustering is the same pattern set.
  • the URL http://www.aibang.com/detail/1537772035-1606201508 includes only the POI data of "Epson (China) Co., Ltd.” and the URL is http: The page of //www.aibang.com/detail/152928073-419169481 only includes the POI data of “Beijing Wangfu Chinese Medicine and Western Medicine Hospital”.
  • These two URLs have the same structural characteristics www.aibang.com/detail/*, Where * is a wildcard for any character; therefore, the two URLs can be clustered into the same pattern set; that is, all URLs in the pattern set have the same structural feature www.aibang.com/detail/*.
  • a webpage including a plurality of POI data obtaining URLs of all webpages including multiple POI data; pattern clustering all acquired URLs, having the same structural features The URLs are clustered into the same pattern collection.
  • the webpage with the URL www.dianping.com/topic/s_c_2_120_r14_x540/p7 includes the POI titles "boy london", “COACH” and "Milan shop”. Multiple POI data such as Sanlitun); the webpage with the URL www.dianping.com/topic/s_c_2_120_r14_x540/p6 also includes multiple POI data; obtain all URLs whose structural features conform to www.dianping.com/topic/* Where * is a wildcard for any character; pattern clustering is performed for all URLs obtained, and the URLs in the same pattern set obtained by clustering have the structural characteristics www.dianping.com/topic/*.
  • a pattern set including a plurality of web pages including POI data is filtered from a plurality of pattern sets, and a plurality of web pages including POI data are extracted from the pattern set.
  • extracting the plurality of POI data from the plurality of web pages including the POI data may include:
  • a POI data extraction template corresponding to the pattern set is generated based on page structure features of a plurality of URLs belonging to the same pattern set corresponding to a plurality of web pages including POI data. Specifically, for each URL belonging to the same pattern set, a POI data extraction template corresponding to the pattern set is generated according to the format and location of the POI data in each webpage corresponding to the URL.
  • a plurality of POI data are extracted from a plurality of web pages including POI data. Specifically, for each webpage in the same pattern set, for the webpage corresponding to the URL, the format of the POI data in the template and the location of the plurality of POI data in the webpage are extracted according to the generated POI data, and more information is extracted from the webpage. POI data.
  • S102 Crawling a plurality of webpage pages including address information.
  • a web crawler-like program is used to crawl a plurality of webpage pages including address keywords from the Internet.
  • the text content of the webpage page is extracted, and an address keyword that may include address information, such as “address”, “located” or “located in”, is searched for in the text content; and text near the address keyword is extracted.
  • Fragment; segment the text segment according to the set separator and the length of the segment, such as the text length of the text segment from the address keyword is greater than the set threshold, and/or the separator of the text segment (such as a space, a comma, The period, etc., divides the text segment; in the segmentation result, the text segment between the segmentation (for example, the separator) and the address keyword is used as the text information associated with the address keyword in the web page.
  • the address information is extracted from the text information as the address information of the web page.
  • S103 Normalize the address information in the plurality of POI data and the address information included in the plurality of webpage pages into the latitude and longitude information.
  • a geographic information base including address information, latitude and longitude information, and correspondence between address information and latitude and longitude information including provinces, cities, counties (districts), towns, and roads in the country is obtained in advance.
  • the address information in the geographic information base may include address information indicating multiple expression forms of the same geographical address; for example, "6 Jiuxianqiao Road, Chaoyang District", "Beijing Chaoyang Jiuxianqiao No. 6" and "Chaoyang District” Multiple address information such as “Jiuxianqiao No. 6" means the same geographical address.
  • the address information in the plurality of POI data is respectively normalized into latitude and longitude information of the plurality of POI data.
  • the latitude and longitude information corresponding to the address information is searched from the pre-acquired geographic information database, and the found latitude and longitude information is determined as the latitude and longitude information of the POI data.
  • S104 Matching the latitude and longitude information of the plurality of POI data with the latitude and longitude information of the plurality of webpage pages based on the same latitude and longitude information.
  • determining whether there is a webpage page whose latitude and longitude information is consistent with the latitude and longitude information of the POI data in each webpage page and if yes, determining that the POI data matches the webpage page, that is, determining The POI data has the same latitude and longitude information as the web page page, otherwise the POI data is ignored.
  • the accuracy of the matching result based on the latitude and longitude information is much higher than the accuracy of the existing text-based matching result, so that more accurate POI data can be collected according to the more accurate matching result.
  • the matching based on the latitude and longitude information is equivalent to matching the plurality of geographic information corresponding to the latitude and longitude information, thereby expanding the matching range, and facilitating subsequent collection of more POI data.
  • S105 Search for the POI data and the webpage page having the same latitude and longitude information according to the POI name corresponding to the POI data, and determine whether the POI name of the POI data is included in the webpage page.
  • the name information is judged. Whether to match the POI name in the POI data: if so, the POI name including the POI data in the web page page is determined; otherwise, the POI data is ignored.
  • the POI name in the POI data matches the name information in the webpage page, thereby determining the POI name including the POI data in the webpage page.
  • the POI name in the POI data is “Qihu 360”, and the name information in the webpage page is “Beijing Qihoo Technology Co., Ltd.”, which can be confirmed as The POI name of the POI data is included in the web page.
  • the text distance between the plurality of name information and the address information of the webpage page is separately calculated.
  • the name information corresponding to the minimum text distance is determined as the name information corresponding to the address information in the web page.
  • the text distance may be the number of characters between the name information and the address information.
  • the POI name corresponding to the POI data is compared with the name information corresponding to the address information in the webpage page. When the comparison is consistent, it is determined that the POI name of the POI data is included in the webpage page.
  • the webpage page includes the POI data of the POI.
  • the webpage page is determined to include the POI name and address information of the POI.
  • a schematic diagram of a framework for determining a POI data of a point of interest in a webpage page includes: a POI data acquisition module 201, a webpage crawling module 202, and a latitude and longitude information normalization module. 203.
  • the POI data obtaining module 201 is configured to acquire a plurality of POI data from the Internet.
  • the webpage crawling module 202 is configured to crawl a plurality of webpage pages including address information.
  • the webpage crawling module 202 crawls a plurality of webpage pages including address keywords from the Internet; extracts a plurality of text information associated with the address keywords in the plurality of webpage pages; and extracts corresponding information from the plurality of textual information The address information of the web page.
  • the latitude and longitude information normalization module 203 is configured to normalize the address information in the plurality of POI data and the address information included in the plurality of web page pages into the latitude and longitude information.
  • the latitude and longitude information normalization module 203 normalizes the address information in the plurality of POI data into the latitude and longitude information of the plurality of POI data.
  • the latitude and longitude information corresponding to the address information is searched from the pre-acquired geographic information database, and the found latitude and longitude information is determined as the latitude and longitude information of the POI data.
  • the pre-acquired geographic information database includes address information, latitude and longitude information, and correspondence between the address information and the latitude and longitude information of provinces, cities, counties (districts), towns, and roads in the country.
  • the latitude and longitude information normalization module 203 normalizes the address information included in the plurality of webpage pages into the latitude and longitude information of the plurality of webpage pages.
  • the latitude and longitude information corresponding to the address information is searched from the pre-acquired geographic information database, and the found latitude and longitude information is determined as the latitude and longitude information of the webpage page.
  • the latitude and longitude information matching module 204 is configured to match the latitude and longitude information of the plurality of POI data with the latitude and longitude information of the plurality of webpage pages based on the same latitude and longitude information. Specifically, the latitude and longitude information matching module 204 determines, for each POI data, whether there is a webpage page whose latitude and longitude information is consistent with the latitude and longitude information of the POI data in each webpage page, and if yes, determines that the POI data is related to the webpage page. Matching, that is, determining that the POI data has the same latitude and longitude information as the web page page, otherwise, the POI data is ignored.
  • the webpage includes a POI name determining module 205, configured to search for the POI data and the webpage page having the same latitude and longitude information, and search for the POI of the POI data in the webpage page according to the POI name corresponding to the POI data. name.
  • the web page The facet determining module 205 includes, for the POI data and the webpage page having the same latitude and longitude information, all the name information is found from the webpage page; and for each name information found, it is determined whether the name information is related to the POI data.
  • the POI name matches: if so, the POI name of the POI data is determined in the web page; otherwise, the POI data is ignored.
  • the web page includes a POI data determining module 206 for determining that the web page includes the POI data of the POI when the POI name of the POI data is included in the web page.
  • the schematic diagram of the internal structure of the POI data acquisition module 201 is as shown in FIG. 3, and includes a webpage crawling unit 301 and a POI data extracting unit 302.
  • the webpage crawling unit 301 is configured to crawl a plurality of webpages including POI data from the Internet.
  • the webpage crawling unit 301 crawls a plurality of URLs corresponding to the plurality of webpages including the address keyword from the Internet; performs pattern clustering on the plurality of URLs, and clusters the URLs having the same structural feature into the same pattern set; A pattern set including a plurality of web pages including POI data is filtered from a plurality of pattern sets, and a plurality of web pages including POI data are extracted from the pattern set.
  • the POI data extracting unit 302 is configured to extract a plurality of POI data from a plurality of web pages including POI data.
  • the POI data extracting unit 302 is specifically configured to generate a POI data extraction template corresponding to the pattern set based on page structure features of the plurality of URLs corresponding to the plurality of URLs in the same pattern set, and extract the template based on the POI data. Extracting a plurality of POI data from a plurality of web pages including POI data.
  • the apparatus for determining the POI data included in the webpage page of the present invention further includes: a webpage page name information determining module 207.
  • the name information determining module 207 in the webpage is configured to separately calculate a text distance between the plurality of name information and the address information of the webpage page when the webpage page includes the plurality of name information; and the minimum text distance corresponds to The name information is determined to be name information corresponding to the address information in the web page.
  • the webpage including the POI name determining module 205 is further configured to compare the POI name corresponding to the POI data with the name information corresponding to the address information in the webpage page; when the comparison is consistent, determining that the webpage page includes the The POI name of the POI data.
  • the address information is normalized into latitude and longitude information, which can be filtered.
  • the geographic address information it is advantageous to avoid the subsequent collection of the fake POI data of the address information; and further, the POI name including the POI data in the webpage page is further determined on the basis that the latitude and longitude information of the POI data matches the latitude and longitude information in the webpage page, It is beneficial to avoid collecting some POI data whose latitude and longitude information or POI name cannot be matched, and the POI data which cannot be matched is often less accurate, so that it can be conveniently collected according to the POI data of the POI included in the determined webpage. More accurate POI data.
  • the present invention includes apparatus related to performing one or more of the operations described herein. These devices may be specially designed and manufactured for the required purposes, or may also include known devices in a general purpose computer. These devices have computer programs stored therein that are selectively activated or reconfigured.
  • Such computer programs may be stored in a device (eg, computer) readable medium or in any type of medium suitable for storing electronic instructions and coupled to a bus, respectively, including but not limited to any Types of disks (including floppy disks, hard disks, optical disks, CD-ROMs, and magneto-optical disks), ROM (Read-Only Memory), RAM (Random Access Memory), EPROM (Erasable Programmable Read-Only Memory) , EEPROM (Electrically Erasable Programmable Read-Only Memory), flash memory, magnetic card or light card.
  • a readable medium includes any medium that is stored or transmitted by a device (eg, a computer) in a readable form.
  • each block of the block diagrams and/or block diagrams and/or flow diagrams and combinations of blocks in the block diagrams and/or block diagrams and/or flow diagrams can be implemented by computer program instructions. .
  • these computer program instructions can be implemented by a general purpose computer, a professional computer, or a processor of other programmable data processing methods, such that the processor is executed by a computer or other programmable data processing method.
  • steps, measures, and solutions in the various operations, methods, and processes that have been discussed in the present invention may be alternated, changed, combined, or deleted. Further, other steps, measures, and schemes of the various operations, methods, and processes that have been discussed in the present invention may be alternated, modified, rearranged, decomposed, combined, or deleted. Further, the steps, measures, and solutions in the prior art having various operations, methods, and processes disclosed in the present invention may also be alternated, changed, rearranged, decomposed, combined, or deleted.
  • the various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processor may be used in practice to implement some or all of the means for determining POI data in a web page including point of interest POI data in accordance with an embodiment of the present invention.
  • the invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein.
  • Such a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
  • Figure 4 schematically illustrates a block diagram of a computing device for performing the method in accordance with the present invention.
  • the computing device conventionally includes a processor 410 and a computer program product or computer readable medium in the form of a memory 420.
  • the memory 420 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM.
  • Memory 420 has a memory space 430 for program code 431 for performing any of the method steps described above.
  • storage space 430 for program code may include various program code 431 for implementing various steps in the above methods, respectively.
  • the program code can be read from or written to one or more computer program products.
  • These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • Such computer program products are typically portable or fixed storage units as described with reference to FIG.
  • the storage unit may have storage segments, storage spaces, and the like that are similarly arranged to memory 420 in the computing device of FIG.
  • the program code can be compressed, for example, in an appropriate form.
  • the storage unit comprises computer readable code 431' for performing the steps of the method according to the invention, ie code that can be read by a processor, such as 410, which, when executed by the computing device, causes the calculation The device performs the various steps in the methods described above.
  • the present invention is applicable to computer systems/servers that can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems/servers include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • the computer system/server can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.

Abstract

L'invention concerne un procédé et un dispositif permettant de déterminer si une page Web comprend des données de point d'intérêt (POI). Le procédé consiste à : acquérir plusieurs éléments de données POI à partir d'Internet (S101) ; parcourir plusieurs pages Web comprenant des informations d'adresse (S102) ; normaliser séparément les informations d'adresse dans les multiples éléments de données POI et les informations d'adresse incluses dans les multiples pages Web en informations de longitude et de latitude (S103) ; apparier les informations de longitude et de latitude des multiples éléments de données POI avec celles des multiples pages Web (S104) ; pour les données POI et les pages Web ayant les mêmes informations de longitude et de latitude, effectuer une recherche dans les pages Web en fonction des noms POI correspondant aux données POI de façon à déterminer si les pages Web comprennent les noms POI des données POI (S105) ; et lorsque les pages Web comprennent les noms POI des données POI, déterminer que les pages Web comprennent les données (S106) du point d'intérêt (POI). Le procédé et le dispositif aident à déterminer ensuite la précision des données POI collectées en fonction de la précision du contenu enregistré par les pages Web, ce qui permet de garantir ensuite de manière fiable la collecte de données POI précises sur Internet à grande échelle.
PCT/CN2015/099580 2015-03-31 2015-12-29 Procédé et dispositif permettant de déterminer si une page web comprend des données de point d'intérêt (poi) WO2016155386A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510148638.4 2015-03-31
CN201510148638.4A CN104699835B (zh) 2015-03-31 2015-03-31 用于确定网页页面中包括兴趣点poi数据的方法及装置

Publications (1)

Publication Number Publication Date
WO2016155386A1 true WO2016155386A1 (fr) 2016-10-06

Family

ID=53346955

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/099580 WO2016155386A1 (fr) 2015-03-31 2015-12-29 Procédé et dispositif permettant de déterminer si une page web comprend des données de point d'intérêt (poi)

Country Status (2)

Country Link
CN (1) CN104699835B (fr)
WO (1) WO2016155386A1 (fr)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104699835B (zh) * 2015-03-31 2016-09-28 北京奇虎科技有限公司 用于确定网页页面中包括兴趣点poi数据的方法及装置
CN104933171B (zh) * 2015-06-30 2019-06-18 百度在线网络技术(北京)有限公司 兴趣点数据关联方法和装置
CN105117425B (zh) * 2015-07-31 2022-03-08 北京奇虎科技有限公司 选择兴趣点poi数据的方法及装置
CN105160032B (zh) * 2015-09-30 2019-05-31 北京奇虎科技有限公司 一种网站中兴趣点数据的置信度的判定方法和装置
CN105138708A (zh) * 2015-09-30 2015-12-09 北京奇虎科技有限公司 一种兴趣点名称的识别方法和装置
CN105320752B (zh) * 2015-09-30 2018-12-07 北京奇虎科技有限公司 一种兴趣点数据的挖掘方法和装置
CN105159885A (zh) * 2015-09-30 2015-12-16 北京奇虎科技有限公司 一种兴趣点名称的识别方法和装置
CN105279246A (zh) * 2015-09-30 2016-01-27 北京奇虎科技有限公司 一种判断网页中是否包含指定兴趣点poi的方法和装置
CN105243136B (zh) * 2015-09-30 2019-02-19 北京奇虎科技有限公司 一种挖掘互联网中的兴趣点poi数据的方法和装置
CN105279249B (zh) * 2015-09-30 2019-06-21 北京奇虎科技有限公司 一种网站中兴趣点数据的置信度的判定方法和装置
CN105160031A (zh) * 2015-09-30 2015-12-16 北京奇虎科技有限公司 一种地图兴趣点poi数据的挖掘方法和装置
CN105608112A (zh) * 2015-12-10 2016-05-25 北京奇虎科技有限公司 衡量地图poi数据的质量的方法和装置
CN105550169A (zh) * 2015-12-11 2016-05-04 北京奇虎科技有限公司 一种基于字符长度识别兴趣点名称的方法和装置
CN105550330B (zh) * 2015-12-21 2020-09-11 北京奇虎科技有限公司 兴趣点poi信息排序的方法和系统
CN106708952B (zh) 2016-11-25 2019-11-19 北京神州绿盟信息安全科技股份有限公司 一种网页聚类方法及装置
CN112000495B (zh) * 2020-10-27 2021-02-12 博泰车联网(南京)有限公司 用于兴趣点信息管理的方法、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080040684A1 (en) * 2006-08-14 2008-02-14 Richard Crump Intelligent Pop-Up Window Method and Apparatus
CN101963962A (zh) * 2009-07-23 2011-02-02 高德软件有限公司 兴趣点数据关联方法及装置
CN102142003A (zh) * 2010-07-30 2011-08-03 华为软件技术有限公司 兴趣点信息提供方法及装置
CN103514234A (zh) * 2012-06-30 2014-01-15 北京百度网讯科技有限公司 一种页面信息提取方法和装置
CN104699835A (zh) * 2015-03-31 2015-06-10 北京奇虎科技有限公司 用于确定网页页面中包括兴趣点poi数据的方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591867B (zh) * 2011-01-07 2015-05-27 清华大学 一种基于移动设备位置的搜索服务方法
CN102841920B (zh) * 2012-06-30 2017-05-10 北京百度网讯科技有限公司 一种页面信息提取方法及装置
CN103678629B (zh) * 2013-12-19 2016-09-28 北京大学 一种地理位置敏感的搜索引擎方法和系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080040684A1 (en) * 2006-08-14 2008-02-14 Richard Crump Intelligent Pop-Up Window Method and Apparatus
CN101963962A (zh) * 2009-07-23 2011-02-02 高德软件有限公司 兴趣点数据关联方法及装置
CN102142003A (zh) * 2010-07-30 2011-08-03 华为软件技术有限公司 兴趣点信息提供方法及装置
CN103514234A (zh) * 2012-06-30 2014-01-15 北京百度网讯科技有限公司 一种页面信息提取方法和装置
CN104699835A (zh) * 2015-03-31 2015-06-10 北京奇虎科技有限公司 用于确定网页页面中包括兴趣点poi数据的方法及装置

Also Published As

Publication number Publication date
CN104699835A (zh) 2015-06-10
CN104699835B (zh) 2016-09-28

Similar Documents

Publication Publication Date Title
WO2016155386A1 (fr) Procédé et dispositif permettant de déterminer si une page web comprend des données de point d'intérêt (poi)
US11698261B2 (en) Method, apparatus, computer device and storage medium for determining POI alias
Schulz et al. A multi-indicator approach for geolocalization of tweets
Lieberman et al. STEWARD: architecture of a spatio-textual search engine
CN111767407B (zh) 用可搜索的地理时间值对知识图条目进行编码以评估实体提及的传递地理时间接近度
CN103049575A (zh) 一种主题自适应的学术会议搜索系统
CN108304423A (zh) 一种信息识别方法及装置
CN102549571A (zh) 来自数字图片集合的地标
CN105069076A (zh) 确定官网首页中的地址信息的方法及装置
WO2020052312A1 (fr) Procédé et appareil de positionnement, dispositif électronique et support de stockage lisible
CN103514234A (zh) 一种页面信息提取方法和装置
CN104899243A (zh) 检测兴趣点poi数据准确性的方法及装置
WO2019227581A1 (fr) Procédé de reconnaissance de point d'intérêt, appareil, borne et support d'informations
CN102646124A (zh) 一种自动识别地址信息的方法
US8954438B1 (en) Structured metadata extraction
Srivastava et al. A geocoding framework powered by delivery data
Intagorn et al. Learning boundaries of vague places from noisy annotations
WO2016107352A1 (fr) Système et procédé de détermination d'un nom de poi et de la validité d'informations de poi
CN111460054B (zh) 地址数据处理方法及装置、设备和存储介质
CN105117425B (zh) 选择兴趣点poi数据的方法及装置
CN106095808B (zh) 一种mdb文件碎片恢复的方法和装置
US20150269268A1 (en) Search server and search method
JP5637073B2 (ja) 情報処理装置、情報処理方法、及びプログラム
KR20090085135A (ko) 수집 신디케이션 플랫폼
CN108595453B (zh) Url标识映射获取方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15887335

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15887335

Country of ref document: EP

Kind code of ref document: A1