CN111782699A

CN111782699A - Intelligent interest point searching method based on user history tile browsing records

Info

Publication number: CN111782699A
Application number: CN202010688185.5A
Authority: CN
Inventors: 丛杨; 张明远; 刘庆彬; 张伟
Original assignee: Shandong Ruizhi Flight Control Technology Co ltd
Current assignee: Shandong Ruizhi Flight Control Technology Co ltd
Priority date: 2020-07-16
Filing date: 2020-07-16
Publication date: 2020-10-16

Abstract

The invention discloses an intelligent interest point searching method based on a user history tile browsing record, which belongs to the technical field of artificial intelligence, wherein a history tile calling record of a user is obtained, the spatial heat concerned by the user is calculated based on the history tile calling record of the user, a word segmentation matching method based on a character string is used for searching a matching document by using a Boolean model, a practical scoring function is optimized by adopting a spatial heat influence factor, the correlation is calculated by adopting the practical scoring function after an optimization meeting, and the maximum correlation result can be pushed for the user along with the continuous use of the user and according to a hot spot area concerned by the user, so that the search tends to be personalized and intelligent; the method and the device have the advantages that the user historical tile browsing records are obtained and analyzed, the spatial heat concerned by the user is calculated, the spatial heat is used as an influence factor to optimize a scoring algorithm, and the accuracy and precision of interest point searching are improved.

Description

Intelligent interest point searching method based on user history tile browsing records

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an intelligent interest point searching method based on user history tile browsing records.

Background

The prior art interest point searching method is a word segmentation matching method based on character strings and a matching method based on full text retrieval.

Firstly, address word segmentation is carried out on search keywords, and the acquisition of interest points requires splitting the search keywords and carrying out semantic standardization; then, matching the character string to be split with each entry contained in the machine dictionary according to a specific strategy, and calculating the correlation score of the search result according to the matching degree; and finally, obtaining search results according to the related scores and matching the coordinates of the related position points according to the search results.

The word segmentation matching method based on the character strings only matches the database according to the word segmentation results without considering the interests of the user, and returns the search results singly, so that different people use the same search word and obtain the same search result. Therefore, the user requirements are known in time, user requirement management and active pushing are achieved, and higher requirements are put forward on the searching method.

The matching method based on full text retrieval refers to that a computer indexing program scans each word in an article and establishes an index. In the user query process, the retrieval program can query according to the index constructed before, and then returns the result obtained by the query to the retrieval mode of the user. And taking the address database as a text library based on the address matching model of full-text retrieval, and taking the address to be matched as an input condition of retrieval to realize address matching and query.

The geocoding method realized by the full-text retrieval technology adopts a word segmentation algorithm and a search engine to build an index for a database and realize the query and the matching of addresses. A full-text retrieval engine toolkit Lucene based on open source codes, such as Zhangman, designs a city address matching engine to solve the problem of fuzzy retrieval, establishes an ordered keyword index list while sequentially storing data sources, and stores the mapping relationship between keywords and records, including the one-to-one correspondence relationship between the keywords and record numbers, the mapping of the times, frequency and positions of the keywords appearing in the records, and the like. And the full-text retrieval technology is adopted to realize address positioning in the GIS, and an index for storing address data in a database is constructed through Lucene. And then, searching and scoring the Chinese address word segmentation result, and sorting the output result according to the score.

The matching method based on full-text retrieval is characterized in that a search engine is needed, but the matching method is only limited to matching and corresponding of keywords, and although the matching speed and the matching efficiency are high, the matching accuracy is relatively low.

Disclosure of Invention

The invention provides an intelligent interest point searching method based on user historical tile browsing records.

The specific technical scheme provided by the invention is as follows:

the invention provides an intelligent interest point searching method based on user history tile browsing records, which comprises the following steps:

acquiring a historical tile call record of a user, and calculating the space heat concerned by the user based on the historical tile call record of the user, wherein the space heat concerned by the user represents the attention degree of the user to a certain area;

the word segmentation matching method based on the character strings uses a Boolean model to search for matching documents, optimizes a practical scoring function by adopting a space heat influence factor, and calculates the correlation by adopting the practical scoring function after the optimization meeting.

Optionally, the calculating the spatial heat focused by the user based on the historical tile call record of the user specifically includes:

acquiring log data from a log database, filtering and screening the log data, and acquiring a network tile service path and calling times called by a user;

analyzing a network tile service path by using an elastic search through a tile browsing record to obtain a tile specific position, a service resolution, a range, a starting point coordinate, a tile name, a path, a scaling grade, a row and column number and a calling frequency, and calculating a geographic coordinate of a corresponding tile according to the tile specific position, the service resolution, the range, the starting point coordinate, the tile name, the path, the scaling grade, the row and column number and the calling frequency;

and matching the database according to the geographic coordinates to obtain the region, and obtaining the spatial heat concerned by the user according to the calling times and the name of the place where the calling tile is located.

Optionally, the calculation formula of the utility scoring function after the optimization meeting is as follows:

wherein, coord (q, d) is a coordination factor representing the number of terms containing the query in the document; queryNorm is the normalized value for each query, as the sum of the squares of the weights for each query term; idf (t) is the inverse document frequency, which is used to measure the uniqueness of the terms and represents the frequency of occurrence of the keywords in all the documents in the set; tf (t, in, d) is the frequency of occurrence of a word in a document; num_kThe number of calls; lambda [ alpha ]_kTo search for a degree of match between a keyword and a spatial heat, B_kIs the spatial heat weight; t, weighting query terms when querying; norm (t, d) is a length-dependent weighting factor.

Optionally, the inverse document frequency idf (t) is calculated by the following formula:

idf(t)＝1+log[numDocs/(d)ocFreq+1]

where numDocs is the number of documents in the index and docFreq is the number of all documents containing the word.

The invention has the following beneficial effects:

according to the intelligent interest point searching method based on the historical tile browsing records of the user, the historical tile calling records of the user are obtained, the spatial heat concerned by the user is calculated based on the historical tile calling records of the user, the matched document is searched by using a Boolean model based on a word segmentation matching method of character strings, a practical scoring function is optimized by adopting a spatial heat influence factor, the correlation is calculated by adopting the practical scoring function after optimization meeting, the maximum correlation result can be pushed for the user along with the continuous use of the user and according to the hot spot area concerned by the user, and the searching tends to be personalized and intelligent; the method and the device have the advantages that the user historical tile browsing records are obtained and analyzed, the spatial heat concerned by the user is calculated, the spatial heat is used as an influence factor to optimize a scoring algorithm, and the accuracy and precision of interest point searching are improved.

Drawings

Fig. 1 is a schematic flowchart of a point of interest intelligent search method based on a user history tile browsing record according to an embodiment of the present invention.

Detailed Description

An intelligent interest point searching method based on user history tile browsing records according to an embodiment of the present invention will be described in detail below with reference to fig. 1.

Referring to fig. 1, an intelligent interest point searching method based on user history tile browsing records according to an embodiment of the present invention includes the following steps:

step 100: the method comprises the steps of obtaining a historical tile call record of a user, and calculating the spatial heat concerned by the user based on the historical tile call record of the user, wherein the spatial heat concerned by the user represents the attention degree of the user to a certain area.

Specifically, log data are obtained from a log database, and are filtered and screened, so that a network tile service path and calling times called by a user are obtained; analyzing a network tile service path by using an elastic search through a tile browsing record to obtain a tile specific position, a service resolution, a range, a starting point coordinate, a tile name, a path, a scaling grade, a row and column number and a calling frequency, and calculating a geographic coordinate of a corresponding tile according to the tile specific position, the service resolution, the range, the starting point coordinate, the tile name, the path, the scaling grade, the row and column number and the calling frequency; and matching the database according to the geographic coordinates to obtain the region, and obtaining the spatial heat concerned by the user according to the calling times and the name of the place where the calling tile is located.

The spatial heat degree refers to the attention degree of a user to a certain area, and if the spatial heat degree of a certain area is high, the area is the area which is most concerned by the user. Firstly, log data are obtained from a log database, and are filtered and screened, so that a network tile service (WMTS) path and calling times called by a user are obtained. And collecting logs by using Filebeat. The file and search system is simple to access to the analysis and search system, the file has smaller resource overhead than logstack, the code amount is small, and the optimization is convenient, so that the file and search method has better performance compared with the collection of the traditional log. And because the data collected by the filebeat has information redundancy, the search result needs to be secondarily screened by combining the Elasticissearch to remove invalid data.

And analyzing the service path by using an Elasticissearch to obtain the specific position, the service resolution, the range and the start point coordinate of the tile. Through the tile browsing records, the tile name, the path, the zoom level, the row and column number and the calling times can be obtained. And calculating the geographic coordinates of the corresponding tiles according to the information, performing database matching according to the geographic coordinates to obtain the regions, and obtaining the spatial heat of the user interest according to the calling times and the names of the places where the tiles are called. The geographic coordinate calculation formula is as follows:

lon＝x×(res×twidth)+XOrigin+(twidth/2×res)

lat＝YOrigin-y×(res×theight)+(res×theight/2)

in the formula, lon is the latitude value of the central point, lat is the precision value of the central point, x is the tile row number, y is the tile column number, res is the resolution level, twidth is the tile length, height is the tile width, Xorigin is the x-axis starting point, and Yorigin is the y-axis starting point.

Step 200: the word segmentation matching method based on the character strings uses a Boolean model to search for matching documents, optimizes a practical scoring function by adopting a space heat influence factor, and calculates the correlation by adopting the practical scoring function after the optimization meeting.

The calculation formula of the utility scoring function after the optimization meeting is as follows:

wherein, coord (q, d) is a coordination factor, which represents the number of terms containing query based on the documents, AND the coordination factor can perform AND-like weighting on the documents containing more search terms; queryNorm is the normalized value for each query, as the sum of the squares of the weights for each query term; idf (t) is the inverse document frequency, which is used to measure the uniqueness of the terms and represents the frequency of occurrence of the keywords in all the documents in the set; tf (t, in, d) is the frequency of occurrence of a word in a document; num_kThe number of calls; lambda [ alpha ]_kTo search for a degree of match between a keyword and a spatial heat, B_kIs the spatial heat weight; t, weighting query terms when querying; norm (t, d) is a length-dependent weighting factor.

Specifically, querynom is a normalized value of each examination, and is a sum of squares of weights of each query item. Wherein the shorter the field, the higher the weight of the field. A word appears in a field like the title with a higher relevance than it appears in a field like the content body. The query normalization value querynom for a field length is the inverse of the square root of the number of words in the field.

idf (t) is the inverse document frequency, is the "uniqueness" used to measure terms, and is the frequency with which keywords appear in all documents in the set, the higher the frequency, the lower the weight. The higher frequency of occurrence of term has a lower idf, and the fewer occurrences of term has a higher idf. The inverse document frequency is the number of documents in the index divided by the number of all documents containing the word, and then the logarithm is taken. The inverse document frequency idf (t) is calculated by the following formula:

idf (t) ═ 1+ log [ numDocs/(d) ocFreq +1], where numDocs is the number of documents in the index and docFreq is the number of all documents containing the word.

tf (t, in, d) is the frequency of occurrence of a word in a document, where the higher the frequency, the higher the weight. Fields that refer to the same word more than just 1 time are more relevant. The term frequency is calculated as the term frequency (tf) of the term t in the document d being the square root of the number of times the term appears in the document. num_kIs the number of calls. Lambda [ alpha ]_kFor searching offThe matching degree between the key words and the space heat, Bk is the space heat weight, and the calculation formula is as follows:

wherein, T_kNumber of calls for tile k, T_jTotal number of calls for all tiles, B_kRepresenting a normalized weight.

In an example, the experimental data of the invention is interest point data of Tengzhou city of jujube village, Shandong province, an index database is constructed by using an elastic search, a search frame is constructed by adopting a Web front-end technology, and the intelligent interest point search method based on the user history tile browsing record is experimentally verified through the experimental data. The specific verification process is as follows: the user's tile call log data is retrieved with the Elasticsearch. And (3) screening search results: the filtering calls the tile use record result with the tiltematrix smaller than 13 (when the user searches interest points meeting the requirements, the map is often enlarged to view the attribute information of the interest points, so that the corresponding interest points are very likely to be interested by the user when the tile zoom level is high, and the filtering effect is best by taking the zoom level 13 as a boundary through a plurality of experiments). And obtaining the Tilematrix, tileCol and tileRow of the called tile, and calculating the coordinate of the central point of the tile and the tile range. And performing a comparison experiment by using the optimized interest point searching method and a word segmentation matching method based on character strings, performing the experiment by using 6 different search words, respectively performing searching and comparing for 50 times on each search word to return the condition of a result, and calculating the corresponding precision ratio.

According to the above experiment and calculation, the same data is respectively subjected to the experiment by using the character string-based word segmentation matching method and the algorithm of the present invention to obtain the precision ratio comparison results, and the comparison results are shown in table 1 below.

TABLE 1 comparison of experimental data using the search method with a string-based segmentation matching method

Through comparison of results between the two experimental methods, the precision ratio of the search method is almost the same as that of the word segmentation matching method based on the character string under the condition that log data are empty, but along with accumulation of user history tile browsing record data, the effect of the search method optimized by the embodiment of the invention is remarkably improved, and the effect of the word segmentation matching method based on the character string is basically unchanged. The method provided by the invention can push the maximum relevant result for the user along with the continuous use of the user and according to the hot spot area concerned by the user, so that the search tends to be personalized and intelligent.

It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims

1. An intelligent interest point searching method based on user history tile browsing records is characterized by comprising the following steps:

2. The intelligent point of interest searching method according to claim 1, wherein the spatial heat of interest of the user is calculated based on the historical tile call records of the user, specifically:

3. The intelligent interest point searching method according to claim 1 or 2, wherein the calculation formula of the practical scoring function after the optimization session is as follows:

wherein, coord (q, d) is a coordination factor representing the number of terms containing the query in the document; queryNorm is the normalized value for each query, as the sum of the squares of the weights for each query term; idf (t) is the inverse document frequency, which is used to measure the uniqueness of terms, representing the occurrence of keywords in all documents in the collectionFrequency; tf (t, in, d) is the frequency of occurrence of a word in a document; num_kThe number of calls; lambda [ alpha ]_kTo search for a degree of match between a keyword and a spatial heat, B_kIs the spatial heat weight; t, weighting query terms when querying; norm (t, d) is a length-dependent weighting factor.

4. The intelligent point-of-interest searching method according to claim 3, wherein the inverse document frequency idf (t) is calculated by the following formula:

idf(t)＝1+log[numDocs/(d)ocFreq+1]