CN111782699A - Intelligent interest point searching method based on user history tile browsing records - Google Patents

Intelligent interest point searching method based on user history tile browsing records Download PDF

Info

Publication number
CN111782699A
CN111782699A CN202010688185.5A CN202010688185A CN111782699A CN 111782699 A CN111782699 A CN 111782699A CN 202010688185 A CN202010688185 A CN 202010688185A CN 111782699 A CN111782699 A CN 111782699A
Authority
CN
China
Prior art keywords
user
tile
intelligent
method based
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010688185.5A
Other languages
Chinese (zh)
Inventor
丛杨
张明远
刘庆彬
张伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Ruizhi Flight Control Technology Co ltd
Original Assignee
Shandong Ruizhi Flight Control Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Ruizhi Flight Control Technology Co ltd filed Critical Shandong Ruizhi Flight Control Technology Co ltd
Priority to CN202010688185.5A priority Critical patent/CN111782699A/en
Publication of CN111782699A publication Critical patent/CN111782699A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Abstract

The invention discloses an intelligent interest point searching method based on a user history tile browsing record, which belongs to the technical field of artificial intelligence, wherein a history tile calling record of a user is obtained, the spatial heat concerned by the user is calculated based on the history tile calling record of the user, a word segmentation matching method based on a character string is used for searching a matching document by using a Boolean model, a practical scoring function is optimized by adopting a spatial heat influence factor, the correlation is calculated by adopting the practical scoring function after an optimization meeting, and the maximum correlation result can be pushed for the user along with the continuous use of the user and according to a hot spot area concerned by the user, so that the search tends to be personalized and intelligent; the method and the device have the advantages that the user historical tile browsing records are obtained and analyzed, the spatial heat concerned by the user is calculated, the spatial heat is used as an influence factor to optimize a scoring algorithm, and the accuracy and precision of interest point searching are improved.

Description

Intelligent interest point searching method based on user history tile browsing records
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an intelligent interest point searching method based on user history tile browsing records.
Background
The prior art interest point searching method is a word segmentation matching method based on character strings and a matching method based on full text retrieval.
Firstly, address word segmentation is carried out on search keywords, and the acquisition of interest points requires splitting the search keywords and carrying out semantic standardization; then, matching the character string to be split with each entry contained in the machine dictionary according to a specific strategy, and calculating the correlation score of the search result according to the matching degree; and finally, obtaining search results according to the related scores and matching the coordinates of the related position points according to the search results.
The word segmentation matching method based on the character strings only matches the database according to the word segmentation results without considering the interests of the user, and returns the search results singly, so that different people use the same search word and obtain the same search result. Therefore, the user requirements are known in time, user requirement management and active pushing are achieved, and higher requirements are put forward on the searching method.
The matching method based on full text retrieval refers to that a computer indexing program scans each word in an article and establishes an index. In the user query process, the retrieval program can query according to the index constructed before, and then returns the result obtained by the query to the retrieval mode of the user. And taking the address database as a text library based on the address matching model of full-text retrieval, and taking the address to be matched as an input condition of retrieval to realize address matching and query.
The geocoding method realized by the full-text retrieval technology adopts a word segmentation algorithm and a search engine to build an index for a database and realize the query and the matching of addresses. A full-text retrieval engine toolkit Lucene based on open source codes, such as Zhangman, designs a city address matching engine to solve the problem of fuzzy retrieval, establishes an ordered keyword index list while sequentially storing data sources, and stores the mapping relationship between keywords and records, including the one-to-one correspondence relationship between the keywords and record numbers, the mapping of the times, frequency and positions of the keywords appearing in the records, and the like. And the full-text retrieval technology is adopted to realize address positioning in the GIS, and an index for storing address data in a database is constructed through Lucene. And then, searching and scoring the Chinese address word segmentation result, and sorting the output result according to the score.
The matching method based on full-text retrieval is characterized in that a search engine is needed, but the matching method is only limited to matching and corresponding of keywords, and although the matching speed and the matching efficiency are high, the matching accuracy is relatively low.
Disclosure of Invention
The invention provides an intelligent interest point searching method based on user historical tile browsing records.
The specific technical scheme provided by the invention is as follows:
the invention provides an intelligent interest point searching method based on user history tile browsing records, which comprises the following steps:
acquiring a historical tile call record of a user, and calculating the space heat concerned by the user based on the historical tile call record of the user, wherein the space heat concerned by the user represents the attention degree of the user to a certain area;
the word segmentation matching method based on the character strings uses a Boolean model to search for matching documents, optimizes a practical scoring function by adopting a space heat influence factor, and calculates the correlation by adopting the practical scoring function after the optimization meeting.
Optionally, the calculating the spatial heat focused by the user based on the historical tile call record of the user specifically includes:
acquiring log data from a log database, filtering and screening the log data, and acquiring a network tile service path and calling times called by a user;
analyzing a network tile service path by using an elastic search through a tile browsing record to obtain a tile specific position, a service resolution, a range, a starting point coordinate, a tile name, a path, a scaling grade, a row and column number and a calling frequency, and calculating a geographic coordinate of a corresponding tile according to the tile specific position, the service resolution, the range, the starting point coordinate, the tile name, the path, the scaling grade, the row and column number and the calling frequency;
and matching the database according to the geographic coordinates to obtain the region, and obtaining the spatial heat concerned by the user according to the calling times and the name of the place where the calling tile is located.
Optionally, the calculation formula of the utility scoring function after the optimization meeting is as follows:
Figure BDA0002588366750000031
wherein, coord (q, d) is a coordination factor representing the number of terms containing the query in the document; queryNorm is the normalized value for each query, as the sum of the squares of the weights for each query term; idf (t) is the inverse document frequency, which is used to measure the uniqueness of the terms and represents the frequency of occurrence of the keywords in all the documents in the set; tf (t, in, d) is the frequency of occurrence of a word in a document; numkThe number of calls; lambda [ alpha ]kTo search for a degree of match between a keyword and a spatial heat, BkIs the spatial heat weight; t, weighting query terms when querying; norm (t, d) is a length-dependent weighting factor.
Optionally, the inverse document frequency idf (t) is calculated by the following formula:
idf(t)=1+log[numDocs/(d)ocFreq+1]
where numDocs is the number of documents in the index and docFreq is the number of all documents containing the word.
The invention has the following beneficial effects:
according to the intelligent interest point searching method based on the historical tile browsing records of the user, the historical tile calling records of the user are obtained, the spatial heat concerned by the user is calculated based on the historical tile calling records of the user, the matched document is searched by using a Boolean model based on a word segmentation matching method of character strings, a practical scoring function is optimized by adopting a spatial heat influence factor, the correlation is calculated by adopting the practical scoring function after optimization meeting, the maximum correlation result can be pushed for the user along with the continuous use of the user and according to the hot spot area concerned by the user, and the searching tends to be personalized and intelligent; the method and the device have the advantages that the user historical tile browsing records are obtained and analyzed, the spatial heat concerned by the user is calculated, the spatial heat is used as an influence factor to optimize a scoring algorithm, and the accuracy and precision of interest point searching are improved.
Drawings
Fig. 1 is a schematic flowchart of a point of interest intelligent search method based on a user history tile browsing record according to an embodiment of the present invention.
Detailed Description
An intelligent interest point searching method based on user history tile browsing records according to an embodiment of the present invention will be described in detail below with reference to fig. 1.
Referring to fig. 1, an intelligent interest point searching method based on user history tile browsing records according to an embodiment of the present invention includes the following steps:
step 100: the method comprises the steps of obtaining a historical tile call record of a user, and calculating the spatial heat concerned by the user based on the historical tile call record of the user, wherein the spatial heat concerned by the user represents the attention degree of the user to a certain area.
Specifically, log data are obtained from a log database, and are filtered and screened, so that a network tile service path and calling times called by a user are obtained; analyzing a network tile service path by using an elastic search through a tile browsing record to obtain a tile specific position, a service resolution, a range, a starting point coordinate, a tile name, a path, a scaling grade, a row and column number and a calling frequency, and calculating a geographic coordinate of a corresponding tile according to the tile specific position, the service resolution, the range, the starting point coordinate, the tile name, the path, the scaling grade, the row and column number and the calling frequency; and matching the database according to the geographic coordinates to obtain the region, and obtaining the spatial heat concerned by the user according to the calling times and the name of the place where the calling tile is located.
The spatial heat degree refers to the attention degree of a user to a certain area, and if the spatial heat degree of a certain area is high, the area is the area which is most concerned by the user. Firstly, log data are obtained from a log database, and are filtered and screened, so that a network tile service (WMTS) path and calling times called by a user are obtained. And collecting logs by using Filebeat. The file and search system is simple to access to the analysis and search system, the file has smaller resource overhead than logstack, the code amount is small, and the optimization is convenient, so that the file and search method has better performance compared with the collection of the traditional log. And because the data collected by the filebeat has information redundancy, the search result needs to be secondarily screened by combining the Elasticissearch to remove invalid data.
And analyzing the service path by using an Elasticissearch to obtain the specific position, the service resolution, the range and the start point coordinate of the tile. Through the tile browsing records, the tile name, the path, the zoom level, the row and column number and the calling times can be obtained. And calculating the geographic coordinates of the corresponding tiles according to the information, performing database matching according to the geographic coordinates to obtain the regions, and obtaining the spatial heat of the user interest according to the calling times and the names of the places where the tiles are called. The geographic coordinate calculation formula is as follows:
lon=x×(res×twidth)+XOrigin+(twidth/2×res)
lat=YOrigin-y×(res×theight)+(res×theight/2)
in the formula, lon is the latitude value of the central point, lat is the precision value of the central point, x is the tile row number, y is the tile column number, res is the resolution level, twidth is the tile length, height is the tile width, Xorigin is the x-axis starting point, and Yorigin is the y-axis starting point.
Step 200: the word segmentation matching method based on the character strings uses a Boolean model to search for matching documents, optimizes a practical scoring function by adopting a space heat influence factor, and calculates the correlation by adopting the practical scoring function after the optimization meeting.
The calculation formula of the utility scoring function after the optimization meeting is as follows:
Figure BDA0002588366750000061
wherein, coord (q, d) is a coordination factor, which represents the number of terms containing query based on the documents, AND the coordination factor can perform AND-like weighting on the documents containing more search terms; queryNorm is the normalized value for each query, as the sum of the squares of the weights for each query term; idf (t) is the inverse document frequency, which is used to measure the uniqueness of the terms and represents the frequency of occurrence of the keywords in all the documents in the set; tf (t, in, d) is the frequency of occurrence of a word in a document; numkThe number of calls; lambda [ alpha ]kTo search for a degree of match between a keyword and a spatial heat, BkIs the spatial heat weight; t, weighting query terms when querying; norm (t, d) is a length-dependent weighting factor.
Specifically, querynom is a normalized value of each examination, and is a sum of squares of weights of each query item. Wherein the shorter the field, the higher the weight of the field. A word appears in a field like the title with a higher relevance than it appears in a field like the content body. The query normalization value querynom for a field length is the inverse of the square root of the number of words in the field.
idf (t) is the inverse document frequency, is the "uniqueness" used to measure terms, and is the frequency with which keywords appear in all documents in the set, the higher the frequency, the lower the weight. The higher frequency of occurrence of term has a lower idf, and the fewer occurrences of term has a higher idf. The inverse document frequency is the number of documents in the index divided by the number of all documents containing the word, and then the logarithm is taken. The inverse document frequency idf (t) is calculated by the following formula:
idf (t) ═ 1+ log [ numDocs/(d) ocFreq +1], where numDocs is the number of documents in the index and docFreq is the number of all documents containing the word.
tf (t, in, d) is the frequency of occurrence of a word in a document, where the higher the frequency, the higher the weight. Fields that refer to the same word more than just 1 time are more relevant. The term frequency is calculated as the term frequency (tf) of the term t in the document d being the square root of the number of times the term appears in the document. numkIs the number of calls. Lambda [ alpha ]kFor searching offThe matching degree between the key words and the space heat, Bk is the space heat weight, and the calculation formula is as follows:
Figure BDA0002588366750000071
wherein, TkNumber of calls for tile k, TjTotal number of calls for all tiles, BkRepresenting a normalized weight.
In an example, the experimental data of the invention is interest point data of Tengzhou city of jujube village, Shandong province, an index database is constructed by using an elastic search, a search frame is constructed by adopting a Web front-end technology, and the intelligent interest point search method based on the user history tile browsing record is experimentally verified through the experimental data. The specific verification process is as follows: the user's tile call log data is retrieved with the Elasticsearch. And (3) screening search results: the filtering calls the tile use record result with the tiltematrix smaller than 13 (when the user searches interest points meeting the requirements, the map is often enlarged to view the attribute information of the interest points, so that the corresponding interest points are very likely to be interested by the user when the tile zoom level is high, and the filtering effect is best by taking the zoom level 13 as a boundary through a plurality of experiments). And obtaining the Tilematrix, tileCol and tileRow of the called tile, and calculating the coordinate of the central point of the tile and the tile range. And performing a comparison experiment by using the optimized interest point searching method and a word segmentation matching method based on character strings, performing the experiment by using 6 different search words, respectively performing searching and comparing for 50 times on each search word to return the condition of a result, and calculating the corresponding precision ratio.
According to the above experiment and calculation, the same data is respectively subjected to the experiment by using the character string-based word segmentation matching method and the algorithm of the present invention to obtain the precision ratio comparison results, and the comparison results are shown in table 1 below.
TABLE 1 comparison of experimental data using the search method with a string-based segmentation matching method
Figure BDA0002588366750000081
Through comparison of results between the two experimental methods, the precision ratio of the search method is almost the same as that of the word segmentation matching method based on the character string under the condition that log data are empty, but along with accumulation of user history tile browsing record data, the effect of the search method optimized by the embodiment of the invention is remarkably improved, and the effect of the word segmentation matching method based on the character string is basically unchanged. The method provided by the invention can push the maximum relevant result for the user along with the continuous use of the user and according to the hot spot area concerned by the user, so that the search tends to be personalized and intelligent.
According to the intelligent interest point searching method based on the historical tile browsing records of the user, the historical tile calling records of the user are obtained, the spatial heat concerned by the user is calculated based on the historical tile calling records of the user, the matched document is searched by using a Boolean model based on a word segmentation matching method of character strings, a practical scoring function is optimized by adopting a spatial heat influence factor, the correlation is calculated by adopting the practical scoring function after optimization meeting, the maximum correlation result can be pushed for the user along with the continuous use of the user and according to the hot spot area concerned by the user, and the searching tends to be personalized and intelligent; the method and the device have the advantages that the user historical tile browsing records are obtained and analyzed, the spatial heat concerned by the user is calculated, the spatial heat is used as an influence factor to optimize a scoring algorithm, and the accuracy and precision of interest point searching are improved.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims (4)

1. An intelligent interest point searching method based on user history tile browsing records is characterized by comprising the following steps:
acquiring a historical tile call record of a user, and calculating the space heat concerned by the user based on the historical tile call record of the user, wherein the space heat concerned by the user represents the attention degree of the user to a certain area;
the word segmentation matching method based on the character strings uses a Boolean model to search for matching documents, optimizes a practical scoring function by adopting a space heat influence factor, and calculates the correlation by adopting the practical scoring function after the optimization meeting.
2. The intelligent point of interest searching method according to claim 1, wherein the spatial heat of interest of the user is calculated based on the historical tile call records of the user, specifically:
acquiring log data from a log database, filtering and screening the log data, and acquiring a network tile service path and calling times called by a user;
analyzing a network tile service path by using an elastic search through a tile browsing record to obtain a tile specific position, a service resolution, a range, a starting point coordinate, a tile name, a path, a scaling grade, a row and column number and a calling frequency, and calculating a geographic coordinate of a corresponding tile according to the tile specific position, the service resolution, the range, the starting point coordinate, the tile name, the path, the scaling grade, the row and column number and the calling frequency;
and matching the database according to the geographic coordinates to obtain the region, and obtaining the spatial heat concerned by the user according to the calling times and the name of the place where the calling tile is located.
3. The intelligent interest point searching method according to claim 1 or 2, wherein the calculation formula of the practical scoring function after the optimization session is as follows:
Figure FDA0002588366740000011
wherein, coord (q, d) is a coordination factor representing the number of terms containing the query in the document; queryNorm is the normalized value for each query, as the sum of the squares of the weights for each query term; idf (t) is the inverse document frequency, which is used to measure the uniqueness of terms, representing the occurrence of keywords in all documents in the collectionFrequency; tf (t, in, d) is the frequency of occurrence of a word in a document; numkThe number of calls; lambda [ alpha ]kTo search for a degree of match between a keyword and a spatial heat, BkIs the spatial heat weight; t, weighting query terms when querying; norm (t, d) is a length-dependent weighting factor.
4. The intelligent point-of-interest searching method according to claim 3, wherein the inverse document frequency idf (t) is calculated by the following formula:
idf(t)=1+log[numDocs/(d)ocFreq+1]
where numDocs is the number of documents in the index and docFreq is the number of all documents containing the word.
CN202010688185.5A 2020-07-16 2020-07-16 Intelligent interest point searching method based on user history tile browsing records Pending CN111782699A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010688185.5A CN111782699A (en) 2020-07-16 2020-07-16 Intelligent interest point searching method based on user history tile browsing records

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010688185.5A CN111782699A (en) 2020-07-16 2020-07-16 Intelligent interest point searching method based on user history tile browsing records

Publications (1)

Publication Number Publication Date
CN111782699A true CN111782699A (en) 2020-10-16

Family

ID=72764208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010688185.5A Pending CN111782699A (en) 2020-07-16 2020-07-16 Intelligent interest point searching method based on user history tile browsing records

Country Status (1)

Country Link
CN (1) CN111782699A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800160A (en) * 2021-01-29 2021-05-14 上海钧正网络科技有限公司 Search ranking optimization method, system and device based on map scene and readable storage medium
CN115098804A (en) * 2022-06-24 2022-09-23 武汉楷瀚文化传媒有限公司 Webpage search history record intelligent management system based on big data analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
丛杨等: "基于用户历史瓦片浏览记录的兴趣点智能搜索方法优化研究", 地理信息世界, vol. 26, no. 2, pages 92 - 95 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800160A (en) * 2021-01-29 2021-05-14 上海钧正网络科技有限公司 Search ranking optimization method, system and device based on map scene and readable storage medium
CN115098804A (en) * 2022-06-24 2022-09-23 武汉楷瀚文化传媒有限公司 Webpage search history record intelligent management system based on big data analysis
CN115098804B (en) * 2022-06-24 2023-11-03 上海上班族数字科技有限公司 Webpage search history record intelligent management system based on big data analysis

Similar Documents

Publication Publication Date Title
US8341159B2 (en) Creating taxonomies and training data for document categorization
CA2845194C (en) Classification of ambiguous geographic references
US8452766B1 (en) Detecting query-specific duplicate documents
US6167397A (en) Method of clustering electronic documents in response to a search query
US8756245B2 (en) Systems and methods for answering user questions
CN108846029B (en) Information correlation analysis method based on knowledge graph
US20070162448A1 (en) Adaptive hierarchy structure ranking algorithm
US8732165B1 (en) Automatic determination of whether a document includes an image gallery
CN1818908A (en) Feedbakc information use of searcher in search engine
CN101261629A (en) Specific information searching method based on automatic classification technology
CN105426529A (en) Image retrieval method and system based on user search intention positioning
CN110543595A (en) in-station search system and method
CN111782699A (en) Intelligent interest point searching method based on user history tile browsing records
CN112149422A (en) Enterprise news dynamic monitoring method based on natural language
Li Internet tourism resource retrieval using PageRank search ranking algorithm
JP4426041B2 (en) Information retrieval method by category factor
Ajoudanian et al. Deep web content mining
Jin et al. Tise: A temporal search engine for web contents
Sallaberry et al. Towards an IE and IR System Dealing with Spatial Information in Digital Libraries-Evaluation Case Study.
CN112199461B (en) Document retrieval method, device, medium and equipment based on block index structure
Qiu et al. Detection and optimized disposal of near-duplicate pages
Sun et al. A Point of Interest Intelligent Search Method based on Browsing History.
KR100434718B1 (en) Method and system for indexing document
Jin et al. Indexing temporal information for web pages
CN116414946A (en) Biomedical literature long query content retrieval method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination