CN105117425B - Method and device for selecting point of interest (POI) data - Google Patents

Method and device for selecting point of interest (POI) data Download PDF

Info

Publication number
CN105117425B
CN105117425B CN201510463031.5A CN201510463031A CN105117425B CN 105117425 B CN105117425 B CN 105117425B CN 201510463031 A CN201510463031 A CN 201510463031A CN 105117425 B CN105117425 B CN 105117425B
Authority
CN
China
Prior art keywords
poi data
webpage
user attention
attention
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510463031.5A
Other languages
Chinese (zh)
Other versions
CN105117425A (en
Inventor
王智广
魏少俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510463031.5A priority Critical patent/CN105117425B/en
Publication of CN105117425A publication Critical patent/CN105117425A/en
Application granted granted Critical
Publication of CN105117425B publication Critical patent/CN105117425B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a method for selecting POI data, which comprises the following steps: acquiring a plurality of webpage pages comprising different POI data with the same name information; extracting user attention information of a plurality of webpage pages; determining user attention corresponding to one or more POI data in each webpage according to the user attention information of each webpage; ranking a plurality of POI data with the same name information included in a plurality of web pages based on the user attention; one or more POI data are selected as the trusted POI data corresponding to the same name information based on the ranking of the POI data. POI data with high reliability can be selected from different POI data with the same name information according to the attention of a user, the problem that in the prior art, it is difficult to distinguish which POI data has accurate address information corresponding to the name information is solved, and the accuracy of POI data collection is improved.

Description

Method and device for selecting point of interest (POI) data
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for selecting POI (point of interest) data.
Background
In the geographic information system, one POI (Point Of Interest) may be one house, one shop, one mailbox, one bus station, and the like. The POI data includes address information and POI names.
In the traditional POI data acquisition method, technicians need to adopt a precise surveying and mapping instrument to acquire longitude and latitude information of each POI and then mark the POI, the method is time-consuming and labor-consuming, the number of the acquired POI data is small, and a geographic information system is difficult to provide high-level service according to the POI data with small number.
There are a lot of POI data on the Internet, and if the web pages containing the POI data can be collected from the Internet and the POI data can be extracted from the collected web pages for the use of the geographic information system, the labor and the time can be greatly saved. However, there is a certain difficulty in extracting POI data with high accuracy from the internet, for example, a plurality of POI data may be obtained from the internet, which have the same name information but different address information, and it is difficult in the prior art to identify which address information corresponding to the name information in the POI data is accurate, thereby causing a barrier to the collection of accurate POI data.
Disclosure of Invention
The invention provides a method and a device for selecting POI data aiming at the defects of the prior art, which are used for solving the problem that the prior art is difficult to discriminate the accuracy of a plurality of POI data with the same name information.
The invention provides a method for selecting point of interest (POI) data according to one aspect, which comprises the following steps:
acquiring a plurality of webpage pages comprising different POI data with the same name information;
extracting user attention information of the multiple webpage pages;
determining user attention corresponding to one or more POI data in each webpage according to the user attention information of each webpage;
ranking a plurality of POI data with the same name information included in the plurality of web pages based on the user attention;
and selecting one or more POI data based on the ranking of the POI data as the credible POI data corresponding to the same name information.
Preferably, the step of extracting the user attention information in the plurality of web pages further includes:
and acquiring the user attention information of each webpage according to the user access times and/or the average browsing time of each webpage in the first time.
When the web page includes only one POI data, optionally, the step of determining the user attention corresponding to one or more POI data included in each web page according to the user attention information of each web page further includes:
and taking the user attention information of the webpage as the user attention of one POI data included in the webpage.
When the web page includes a plurality of POI data, optionally, the step of determining the user attention corresponding to one or more POI data included in each web page according to the user attention information of each web page further includes:
capturing page contents in each webpage within a second time length according to a preset frequency;
extracting POI data in the page content captured each time;
judging whether POI data in the page content of each webpage are changed within the second duration;
selecting a corresponding attention degree distribution rule based on the judgment result;
and determining the user attention of a plurality of POI data included in each webpage page according to the user attention of each webpage page and by combining the number of the POI data included in each webpage page based on the corresponding attention distribution rule.
Wherein, selecting the corresponding attention degree distribution rule based on the judgment result comprises the following situations:
when the POI data is not changed, selecting the user attention information of the webpage as the attention distribution rule of the user attention of each POI data included in the webpage; or
When the POI data is changed, selecting an attention degree distribution rule which averagely distributes the user attention degree information of the webpage to the user attention degree of each POI data included in the webpage.
Preferably, the step of ranking the POI data having the same name information included in the plurality of web pages based on the user attention further includes:
extracting at least two POI data of which the attention degree is greater than an attention degree threshold value in the POI data;
ranking the at least two POI data based on user attention.
The present invention also provides an apparatus for selecting POI data, according to another aspect, comprising:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a plurality of webpage pages comprising different POI data with the same name information;
the extraction module is used for extracting the user attention information of the plurality of webpage pages;
the determining module is used for determining the user attention corresponding to one or more POI data in each webpage according to the user attention information of each webpage;
the ranking module is used for ranking the POI data with the same name information in the webpage pages based on the attention of the user;
and the selecting module is used for selecting one or more POI data based on the sequencing of the POI data as the credible POI data corresponding to the same name information.
Preferably, the extraction module is specifically configured to obtain the user attention information of each web page according to the user access times and/or the average browsing time of each web page within the first time period.
When the web page only includes one piece of POI data, optionally, the determining module is specifically configured to use the user attention information of the web page as the user attention of the one piece of POI data included in the web page.
When the webpage page includes a plurality of POI data, optionally, the determining module specifically includes:
the grabbing unit is used for grabbing page content in each webpage within a second duration according to a preset frequency;
the extraction unit is used for extracting POI data in the page content captured each time;
the judging unit is used for judging whether POI data in the page content of each webpage within the second duration are changed or not;
a selection unit configured to select a corresponding attention degree allocation rule based on the determination result;
and the determining unit is used for determining the user attention of a plurality of POI data included in each webpage page according to the user attention of each webpage page and by combining the number of the POI data included in each webpage page based on the corresponding attention distribution rule.
Wherein, selecting the corresponding attention degree distribution rule based on the judgment result comprises the following situations:
when the POI data is not changed, selecting the user attention information of the webpage as the attention distribution rule of the user attention of each POI data included in the webpage; or
When the POI data is changed, selecting an attention degree distribution rule which averagely distributes the user attention degree information of the webpage to the user attention degree of each POI data included in the webpage.
Preferably, the ranking device is specifically configured to extract at least two POI data in the POI data, where the attention degree of the POI data is greater than the attention degree threshold; ranking the at least two POI data based on user attention.
According to the technical scheme, the user attention degree corresponding to the POI data included in a plurality of webpage pages comprising different POI data with the same name information is determined according to the user attention degree information of the webpage pages, the POI data are ranked based on the user attention degree, and then credible POI data corresponding to the same name information are selected according to the ranking result; the POI data with higher user attention degree has higher information accuracy, and meanwhile, the user attention degree of the webpage page can directly reflect the reliability and value degree of each item of information included in the webpage page, and the user attention degree of the webpage page can also reflect the user attention degree of the POI data to a great extent for the POI data included in the webpage page; therefore, POI data with high reliability can be selected from different POI data with the same name information according to the attention of the user, the problem that in the prior art, it is difficult to distinguish which POI data has accurate address information corresponding to the name information is solved, and the accuracy rate of collecting the POI data is improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic flow chart diagram illustrating a method for selecting POI data according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for selecting POI data according to a preferred embodiment of the present invention;
FIG. 3 is a block diagram of an internal structure of an apparatus for selecting POI data according to another embodiment of the present invention;
fig. 4 is a schematic frame diagram of the internal structure of an apparatus for selecting point of interest POI data in another preferred embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Fig. 1 is a flowchart illustrating a method for selecting POI data according to an embodiment of the present invention.
Step S110: acquiring a plurality of webpage pages comprising different POI data with the same name information; step S120: extracting user attention information in a plurality of webpage pages; step S130: determining user attention corresponding to one or more POI data included in each webpage according to the user attention information of each webpage; step S140: ranking a plurality of POI data with the same name information included in a plurality of web pages based on the user attention; step S150: and selecting one or more POI data based on the ranking of the POI data as the credible POI data corresponding to the same name information.
According to the technical scheme, the user attention degree corresponding to the POI data included in a plurality of webpage pages comprising different POI data with the same name information is determined according to the user attention degree information of the webpage pages, the POI data are ranked based on the user attention degree, and then credible POI data corresponding to the same name information are selected according to the ranking result; the POI data with higher user attention degree has higher information accuracy, and meanwhile, the user attention degree of the webpage page can directly reflect the reliability and value degree of each item of information included in the webpage page, and the user attention degree of the webpage page can also reflect the user attention degree of the POI data to a great extent for the POI data included in the webpage page; therefore, POI data with high reliability can be selected from different POI data with the same name information according to the attention of the user, the problem that in the prior art, it is difficult to distinguish which POI data has accurate address information corresponding to the name information is solved, and the accuracy rate of collecting the POI data is improved.
Step S110: a plurality of web pages including different POI data having the same name information are acquired.
Specifically, a webpage including POI data is obtained; matching POI data included in each webpage based on one name information, and determining the webpage including the name information; and judging whether the address information in the POI data included in the webpage pages including the name information is the same or not, and extracting a plurality of webpage pages with different address information.
The step of acquiring the webpage including the POI data specifically includes:
acquiring a plurality of POI data from the Internet; crawling a plurality of web pages comprising address information; respectively normalizing address information in the POI data and address information contained in the webpage into longitude and latitude information; matching latitude and longitude information of a plurality of POI data with latitude and longitude information in a plurality of webpage pages based on the same latitude and longitude information; searching the POI data and the webpage with the same longitude and latitude information according to the POI name corresponding to the POI data in the webpage, and determining whether the webpage comprises the POI name of the POI data; and when the POI name of the POI data is included in the webpage, determining that the webpage includes the POI data.
Step S120: and extracting user attention information in a plurality of webpage pages.
Specifically, the step of extracting the user attention information of the multiple web pages specifically includes:
and acquiring the user attention information of each webpage according to the user access times and/or the average browsing time of each webpage in the first time.
The number of times of user access may be the number of times of user clicks on a link pointing to a web page.
For example, according to the page identifier of each web page, querying in the user history access record, determining that the user click times of the corresponding link of each web page and/or the average browsing time of the web page each time are within a first time period, such as about 30 days, and then based on an attention calculation formula, performing weighted calculation by giving corresponding weights to the user click times and/or the average browsing time each time to determine the user attention information of each web page; the user history access records comprise user clicks of corresponding links of the webpage and/or browsing records of the webpage by the user.
Step S130: and determining the user attention corresponding to one or more POI data included in the webpage according to the user attention information of each webpage.
Optionally, when the web page only includes one POI data, the manner of determining the user attention corresponding to one or more POI data included in the web page according to the user attention information of each web page specifically is: and taking the user attention information of the webpage as the user attention of one POI data included in the webpage.
Optionally, as shown in fig. 2, when the web page includes a plurality of POI data, the step of determining the user attention degree corresponding to one or more POI data included in each web page according to the user attention degree information of each web page further includes step S231 (not shown in the figure), step S232 (not shown in the figure), step S233 (not shown in the figure), step S234 (not shown in the figure), and step S235 (not shown in the figure).
Step S231: capturing page contents in each webpage within a second time length according to a preset frequency; step S232: extracting POI data in the page content captured each time; step S233: judging whether POI data in the page content of each webpage are changed within a second time length; step S234: selecting a corresponding attention degree distribution rule based on the judgment result; step S235: and determining the user attention of a plurality of POI data included in each webpage page according to the user attention of each webpage page and the number of the POI data included in each webpage page based on the corresponding attention distribution rule.
Step S231: and capturing the page content in each webpage within the second time length according to the preset frequency.
Specifically, for a web page including a plurality of POI data, a web crawler-like program may be utilized to crawl the page content of the web page within the second duration from the internet according to a predetermined frequency, for example, crawl the page content of the web page within 180 days at a frequency of 1 time/day.
Step S232: and extracting POI data in the page content captured each time.
For example, for the first captured page content, extracting text content from the first captured page content, and searching for address keywords such as "address", "located" or "located" in the text content, which may include address information; extracting text segments near the address keywords; segmenting the text segment according to the set separator and the segment length, for example, if the text length of the text segment from the address keyword is greater than a set threshold value and/or the text segment has the set separator (such as a space, a comma, a period, etc.), segmenting the text segment; taking a text segment between a segmentation position (such as a separator position) and the address keyword in the segmentation result as text information associated with the address keyword in the webpage; subsequently, for each piece of text information, address information is extracted from each piece of text information, and based on each piece of address information, a name closest to each of the extracted text information is extracted as a POI name, that is, each piece of POI data is extracted. For the page content captured later, the content can be directly extracted based on the position of the page where the POI data is determined for the first time.
Step S233: and judging whether the POI data in the page content of each webpage is changed within the second duration.
Specifically, the page content of the web page within 180 days, that is, the page content of 180 web pages, is captured at a frequency of 1 time/day, and whether the POI data mentioned in the page content of 180 web pages are the same or not is compared, and if the POI data are the same, it is determined that the POI data are not changed. For example, one web page includes three POI data, denoted by P1, P2, and P3; the POI name of P1 is large-board roast duck store (reunion lake store), the POI name of P2 is large-board roast duck store (east forty shops), and the POI name of P3 is large-board roast duck store (brillouin lake store); capturing 180 page contents of the webpage within 180 days at the frequency of 1 time/day, and extracting POI names and address information corresponding to P1, P2 and P3 in the 180 page contents respectively, namely comparing whether the POI names and address information of 180P 1, the POI names and address information of 180P 2 and the POI names and address information of 180P 3 are respectively the same.
Step S234: and selecting a corresponding attention degree distribution rule based on the judgment result.
Wherein, the attention degree distribution rule comprises: when the POI data is not changed, selecting the user attention information of the webpage as the attention distribution rule of the user attention of each POI data included in the webpage; when the POI data is changed, selecting an attention degree distribution rule which averagely distributes the user attention degree information of the webpage to the user attention degree of each POI data included in the webpage.
Step S235: and determining the user attention of a plurality of POI data included in each webpage page according to the user attention of each webpage page and the number of the POI data included in each webpage page based on the corresponding attention distribution rule.
And when the POI data is not changed, taking the user attention of the webpage as the user attention of each POI data included in the webpage.
When the POI data are changed, the user attention of the webpage is evenly distributed to the POI data in the webpage according to the user attention of the webpage and by combining the number of the POI data in the webpage.
Specifically, when any POI data in the webpage is changed, the user attention of each POI data can be determined by calculating n/m according to the user attention of the webpage and the number of the POI data in the webpage; the user attention of the webpage is n, and the number of POI data contained in the webpage is m.
Referring to fig. 1, step S140: a plurality of POI data having the same name information included in a plurality of web pages are ranked based on a user attention.
Specifically, the user attention degrees corresponding to a plurality of POI data having the same name information included in a plurality of web pages are ranked.
Preferably, step S140 includes step S141 (not shown in the figure) and step S142 (not shown in the figure); step S141: extracting at least two POI data of which the attention degree is greater than an attention degree threshold value in the POI data; step S142: ranking the at least two POI data based on the user attention.
Specifically, at least two POI data with the user attention degree larger than the attention degree threshold value are extracted from the POI data; and sequencing the user attention degrees corresponding to the at least two POI data respectively.
Step S150: one or more POI data are selected as the trusted POI data corresponding to the same name information based on the ranking of the POI data.
Specifically, one or more POI data before ranking are selected from the ranked POI data as the trusted POI data corresponding to the same name information.
Fig. 3 is a schematic diagram of a framework of an internal structure of an apparatus for selecting POI data according to another embodiment of the present invention.
The acquiring module 310 acquires a plurality of web pages including different POI data having the same name information; the extracting module 320 extracts user attention information in a plurality of web pages; the determining module 330 determines the user attention corresponding to one or more POI data included in each web page according to the user attention information of each web page; the ranking module 340 ranks a plurality of POI data having the same name information included in a plurality of web pages based on the user attention; the selection module 350 selects one or more POI data as the trusted POI data corresponding to the same name information based on the ranking of the POI data.
According to the technical scheme, the user attention degree corresponding to the POI data included in a plurality of webpage pages comprising different POI data with the same name information is determined according to the user attention degree information of the webpage pages, the POI data are ranked based on the user attention degree, and then credible POI data corresponding to the same name information are selected according to the ranking result; the POI data with higher user attention degree has higher information accuracy, and meanwhile, the user attention degree of the webpage page can directly reflect the reliability and value degree of each item of information included in the webpage page, and the user attention degree of the webpage page can also reflect the user attention degree of the POI data to a great extent for the POI data included in the webpage page; therefore, POI data with high reliability can be selected from different POI data with the same name information according to the attention of the user, the problem that in the prior art, it is difficult to distinguish which POI data has accurate address information corresponding to the name information is solved, and the accuracy rate of collecting the POI data is improved.
The acquisition module 310 acquires a plurality of web pages including different POI data having the same name information.
Specifically, a webpage including POI data is obtained; matching POI data included in each webpage based on one name information, and determining the webpage including the name information; and judging whether the address information in the POI data included in the webpage pages including the name information is the same or not, and extracting a plurality of webpage pages with different address information.
The step of acquiring the webpage including the POI data specifically includes:
acquiring a plurality of POI data from the Internet; crawling a plurality of web pages comprising address information; respectively normalizing address information in the POI data and address information contained in the webpage into longitude and latitude information; matching latitude and longitude information of a plurality of POI data with latitude and longitude information in a plurality of webpage pages based on the same latitude and longitude information; searching the POI data and the webpage with the same longitude and latitude information according to the POI name corresponding to the POI data in the webpage, and determining whether the webpage comprises the POI name of the POI data; and when the POI name of the POI data is included in the webpage, determining that the webpage includes the POI data.
The extraction module 320 extracts user attention information in a plurality of web pages.
Specifically, the step of extracting the user attention information of the multiple web pages specifically includes:
and acquiring the user attention information of each webpage according to the user access times and/or the average browsing time of each webpage in the first time.
The number of times of user access may be the number of times of user clicks on a link pointing to a web page.
For example, according to the page identifier of each web page, querying in the user history access record, determining that the user click times of the corresponding link of each web page and/or the average browsing time of the web page each time are within a first time period, such as about 30 days, and then based on an attention calculation formula, performing weighted calculation by giving corresponding weights to the user click times and/or the average browsing time each time to determine the user attention information of each web page; the user history access records comprise user clicks of corresponding links of the webpage and/or browsing records of the webpage by the user.
The determining module 330 determines the user attention corresponding to one or more POI data included in each webpage according to the user attention information of the webpage.
Optionally, when the web page only includes one POI data, the manner of determining the user attention corresponding to one or more POI data included in the web page according to the user attention information of each web page specifically is: and taking the user attention information of the webpage as the user attention of one POI data included in the webpage.
Optionally, as shown in fig. 4, when the webpage includes a plurality of POI data, the determining module specifically includes a grabbing unit 431, an extracting unit (not shown in the figure), a determining unit (not shown in the figure), a selecting unit (not shown in the figure), and a determining unit (not shown in the figure).
The capturing unit 431 captures the page content in each webpage within the second duration according to a predetermined frequency; the extracting unit 432 extracts POI data in the page content captured each time; the judging unit 433 judges whether the POI data in the page content of each web page within the second duration is changed; the selection unit 434 selects a corresponding attention degree allocation rule based on the determination result; the determining unit 435 determines the user attention of the multiple pieces of POI data included in each web page according to the user attention of each web page and by combining the number of the POI data included in each web page based on the corresponding attention distribution rule.
The crawling unit 431 crawls the page content in each webpage within the second duration according to a predetermined frequency.
Specifically, for a web page including a plurality of POI data, a web crawler-like program may be utilized to crawl the page content of the web page within the second duration from the internet according to a predetermined frequency, for example, crawl the page content of the web page within 180 days at a frequency of 1 time/day.
The extraction unit 432 extracts POI data in the page content captured each time.
For example, for the first captured page content, extracting text content from the first captured page content, and searching for address keywords such as "address", "located" or "located" in the text content, which may include address information; extracting text segments near the address keywords; segmenting the text segment according to the set separator and the segment length, for example, if the text length of the text segment from the address keyword is greater than a set threshold value and/or the text segment has the set separator (such as a space, a comma, a period, etc.), segmenting the text segment; taking a text segment between a segmentation position (such as a separator position) and the address keyword in the segmentation result as text information associated with the address keyword in the webpage; subsequently, for each piece of text information, address information is extracted from each piece of text information, and based on each piece of address information, a name closest to each of the extracted text information is extracted as a POI name, that is, each piece of POI data is extracted. For the page content captured later, the content can be directly extracted based on the position of the page where the POI data is determined for the first time.
The determining unit 433 determines whether the POI data in the page content of each web page within the second duration is changed.
Specifically, the page content of the web page within 180 days, that is, the page content of 180 web pages, is captured at a frequency of 1 time/day, and whether the POI data mentioned in the page content of 180 web pages are the same or not is compared, and if the POI data are the same, it is determined that the POI data are not changed. For example, one web page includes three POI data, denoted by P1, P2, and P3; the POI name of P1 is large-board roast duck store (reunion lake store), the POI name of P2 is large-board roast duck store (east forty shops), and the POI name of P3 is large-board roast duck store (brillouin lake store); capturing 180 page contents of the webpage within 180 days at the frequency of 1 time/day, and extracting POI names and address information corresponding to P1, P2 and P3 in the 180 page contents respectively, namely comparing whether the POI names and address information of 180P 1, the POI names and address information of 180P 2 and the POI names and address information of 180P 3 are respectively the same.
The selection unit 434 selects a corresponding attention degree allocation rule based on the determination result.
Wherein, the attention degree distribution rule comprises: when the POI data is not changed, selecting the user attention information of the webpage as the attention distribution rule of the user attention of each POI data included in the webpage; when the POI data is changed, selecting an attention degree distribution rule which averagely distributes the user attention degree information of the webpage to the user attention degree of each POI data included in the webpage.
The determining unit 435 determines the user attention of the multiple pieces of POI data included in each web page according to the user attention of each web page and by combining the number of the POI data included in each web page based on the corresponding attention distribution rule.
And when the POI data is not changed, taking the user attention of the webpage as the user attention of each POI data included in the webpage.
When the POI data are changed, the user attention of the webpage is evenly distributed to the POI data in the webpage according to the user attention of the webpage and by combining the number of the POI data in the webpage.
Specifically, when any POI data in the webpage is changed, the user attention of each POI data can be determined by calculating n/m according to the user attention of the webpage and the number of the POI data in the webpage; the user attention of the webpage is n, and the number of POI data contained in the webpage is m.
Referring to fig. 3, the ranking module 340 ranks a plurality of POI data having the same name information included in a plurality of web pages based on the user attention.
Specifically, the user attention degrees corresponding to a plurality of POI data having the same name information included in a plurality of web pages are ranked.
Preferably, the sorting module 340 extracts at least two POI data of which the attention degree is greater than the attention degree threshold value from the POI data; ranking the at least two POI data based on the user attention.
Specifically, at least two POI data with the user attention degree larger than the attention degree threshold value are extracted from the POI data; and sequencing the user attention degrees corresponding to the at least two POI data respectively.
The selection module 350 selects one or more POI data as the trusted POI data corresponding to the same name information based on the ranking of the POI data.
Specifically, one or more POI data before ranking are selected from the ranked POI data as the trusted POI data corresponding to the same name information.
Those skilled in the art will appreciate that the present invention includes apparatus directed to performing one or more of the operations described in the present application. These devices may be specially designed and manufactured for the required purposes, or they may comprise known devices in general-purpose computers. These devices have stored therein computer programs that are selectively activated or reconfigured. Such a computer program may be stored in a device (e.g., computer) readable medium, including, but not limited to, any type of disk including floppy disks, hard disks, optical disks, CD-ROMs, and magnetic-optical disks, ROMs (Read-Only memories), RAMs (Random Access memories), EPROMs (Erasable Programmable Read-Only memories), EEPROMs (Electrically Erasable Programmable Read-Only memories), flash memories, magnetic cards, or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a bus. That is, a readable medium includes any medium that stores or transmits information in a form readable by a device (e.g., a computer).
It will be understood by those within the art that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. Those skilled in the art will appreciate that the computer program instructions may be implemented by a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, implement the features specified in the block or blocks of the block diagrams and/or flowchart illustrations of the present disclosure.
Those of skill in the art will appreciate that various operations, methods, steps in the processes, acts, or solutions discussed in the present application may be alternated, modified, combined, or deleted. Further, various operations, methods, steps in the flows, which have been discussed in the present application, may be interchanged, modified, rearranged, decomposed, combined, or eliminated. Further, steps, measures, schemes in the various operations, methods, procedures disclosed in the prior art and the present invention can also be alternated, changed, rearranged, decomposed, combined, or deleted.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (8)

1. A method of selecting point of interest (POI) data, comprising:
acquiring a plurality of webpage pages comprising different POI data with the same name information;
extracting user attention information of the multiple webpage pages;
determining user attention corresponding to one or more POI data in each webpage according to the user attention information of each webpage;
ranking a plurality of POI data with the same name information included in the plurality of web pages based on the user attention;
selecting one or more POI data based on the ranking of the POI data as credible POI data corresponding to the same name information;
when the web page includes a plurality of POI data, determining a user attention degree corresponding to one or more POI data included in each web page according to the user attention degree information of each web page, further including:
capturing page contents in each webpage within a second time length according to a preset frequency;
extracting POI data in the page content captured each time;
judging whether POI data in the page content of each webpage are changed within the second duration;
selecting a corresponding attention degree distribution rule based on the judgment result;
and determining the user attention of a plurality of POI data included in each webpage page according to the user attention of each webpage page and by combining the number of the POI data included in each webpage page based on the corresponding attention distribution rule.
2. The method of selecting point of interest (POI) data as claimed in claim 1, wherein the step of extracting user attention information for said plurality of web pages further comprises:
and acquiring the user attention information of each webpage according to the user access times and/or the average browsing time of each webpage in the first time.
3. The method of selecting point of interest POI data according to claim 1, wherein selecting the corresponding attention allocation rule based on the determination result comprises:
when the POI data is not changed, selecting the user attention information of the webpage as the attention distribution rule of the user attention of each POI data included in the webpage; or
When the POI data is changed, selecting an attention degree distribution rule which averagely distributes the user attention degree information of the webpage to the user attention degree of each POI data included in the webpage.
4. The method of selecting point-of-interest POI data according to claim 1, wherein the step of ranking the POI data having the same name information included in the plurality of web pages based on user attention further comprises:
extracting at least two POI data of which the attention degree is greater than an attention degree threshold value in the POI data;
ranking the at least two POI data based on user attention.
5. An apparatus for selecting point of interest (POI) data, comprising:
the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a plurality of webpage pages comprising different POI data with the same name information;
the extraction module is used for extracting the user attention information of the plurality of webpage pages;
the determining module is used for determining the user attention corresponding to one or more POI data in each webpage according to the user attention information of each webpage;
the ranking module is used for ranking the POI data with the same name information in the webpage pages based on the attention of the user;
a selecting module, configured to select one or more POI data based on the ranking of the POI data, as trusted POI data corresponding to the same name information;
when the webpage page includes a plurality of POI data, the determining module specifically includes:
the grabbing unit is used for grabbing page content in each webpage within a second duration according to a preset frequency;
the extraction unit is used for extracting POI data in the page content captured each time;
the judging unit is used for judging whether POI data in the page content of each webpage within the second duration are changed or not;
a selection unit configured to select a corresponding attention degree allocation rule based on the determination result;
and the determining unit is used for determining the user attention of a plurality of POI data included in each webpage page according to the user attention of each webpage page and by combining the number of the POI data included in each webpage page based on the corresponding attention distribution rule.
6. The device for selecting point of interest (POI) data according to claim 5, wherein the extraction module is specifically configured to obtain the user attention information of each webpage according to the user access times and/or the average browsing time length of each webpage within the first time length.
7. The apparatus for selecting POI data according to claim 5, wherein selecting the corresponding attention degree distribution rule based on the determination result comprises:
when the POI data is not changed, selecting the user attention information of the webpage as the attention distribution rule of the user attention of each POI data included in the webpage; or
When the POI data is changed, selecting an attention degree distribution rule which averagely distributes the user attention degree information of the webpage to the user attention degree of each POI data included in the webpage.
8. The apparatus for selecting POI data according to any one of claims 5-7, wherein the ranking module is specifically configured to extract at least two POI data of the plurality of POI data whose attention degree is greater than a threshold attention degree; ranking the at least two POI data based on user attention.
CN201510463031.5A 2015-07-31 2015-07-31 Method and device for selecting point of interest (POI) data Active CN105117425B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510463031.5A CN105117425B (en) 2015-07-31 2015-07-31 Method and device for selecting point of interest (POI) data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510463031.5A CN105117425B (en) 2015-07-31 2015-07-31 Method and device for selecting point of interest (POI) data

Publications (2)

Publication Number Publication Date
CN105117425A CN105117425A (en) 2015-12-02
CN105117425B true CN105117425B (en) 2022-03-08

Family

ID=54665415

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510463031.5A Active CN105117425B (en) 2015-07-31 2015-07-31 Method and device for selecting point of interest (POI) data

Country Status (1)

Country Link
CN (1) CN105117425B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550169A (en) * 2015-12-11 2016-05-04 北京奇虎科技有限公司 Method and device for identifying point of interest names based on character length
CN107729368A (en) * 2017-09-08 2018-02-23 百度在线网络技术(北京)有限公司 A kind of method and apparatus for POI data verification
CN109783740A (en) * 2019-01-24 2019-05-21 北京字节跳动网络技术有限公司 Pay close attention to the sort method and device of the page

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050196B (en) * 2013-03-15 2017-09-15 阿里巴巴集团控股有限公司 A kind of interest point data redundant detecting method and device
CN104699835B (en) * 2015-03-31 2016-09-28 北京奇虎科技有限公司 For determining that Webpage includes the method and device of point of interest POI data

Also Published As

Publication number Publication date
CN105117425A (en) 2015-12-02

Similar Documents

Publication Publication Date Title
CN104699835B (en) For determining that Webpage includes the method and device of point of interest POI data
Andersen et al. Retrieving a common accumulation record from Greenland ice cores for the past 1800 years
US9519718B2 (en) Webpage information detection method and system
WO2010062726A2 (en) Determining user similarities based on location histories
Blöthe et al. Surface velocity fields of active rock glaciers and ice‐debris complexes in the Central Andes of Argentina
Ensing et al. Taxonomic identification errors generate misleading ecological niche model predictions of an invasive hawkweed
CN109284498A (en) Self-service cabinet recommendation method, self-service cabinet recommendation device and electronic device
Wang et al. Incorporation of texture information in a SVM method for classifying salt cedar in western China
CN105117425B (en) Method and device for selecting point of interest (POI) data
CN106776609A (en) Reprint the statistical method and device of quantity in website
CN105069079B (en) Method and device for screening POI (Point of interest) data
CN107784046B (en) POI information processing method and device
CN107688563B (en) Synonym recognition method and recognition device
CN106886532A (en) Mode and device based on Authoritative Web pages checking POI data accuracy
CN107133689B (en) Position marking method
CN104915453A (en) Method, device and system for classifying POI information
JP6314071B2 (en) Information processing apparatus, information processing method, and program
WO2015086859A1 (en) Method for determining a user profile in relation to certain web content
US20150269268A1 (en) Search server and search method
WO2017107695A1 (en) Method and device for sorting news
US10394920B2 (en) Data verification device
CN106302319A (en) A kind of detection method for phishing site and equipment
Clifton Magnetic depths to basalts: extension of spectral depths method
Chen et al. Recommending interesting landmarks in photo sharing sites
CN106407444A (en) Retrieval method and device, and terminal

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240104

Address after: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.