WO2020037794A1

WO2020037794A1 - Index building method for english geographical name, and query method and apparatus therefor

Info

Publication number: WO2020037794A1
Application number: PCT/CN2018/109938
Authority: WO
Inventors: 张雪英; 叶鹏; 杜咪
Original assignee: 南京师范大学
Priority date: 2018-08-20
Filing date: 2018-10-12
Publication date: 2020-02-27
Also published as: CN109165331A; AU2018102145A4

Abstract

An English gazetteer query method, belonging to the field of natural language processing, comprising: performing geographical name query by utilizing text features such as the total number of letters, the number of letter radicals, the total number of words, and the initial letter coding in a geographical name according to the mainline of "multidimensional feature statistics-inverted index generation-candidate geographical name query-similarity ranking", to obtain a feature statistics-inverted index-based English gazetteer query method. The method maintains high operation efficiency under the large-scale data environment, and can accurately query the target geographical name in the case of inaccurate query geographical name expression, so that the user can obtain better user experience.

Description

Index establishment method for English place names and query method and device thereof

Technical field

The invention relates to the field of natural language processing, and in particular, to a method for querying English place name dictionaries for large-scale place name data.

Background technique

Geographical name dictionary query is the basic operation of applications such as geographical name spelling check, fuzzy matching, optical identification, etc., to provide geographical name knowledge support. With the acceleration of the process of global integration, the speed of international geographical name information transmission continues to increase and the frequency of use is increasing. As one of the languages widely used in the world, English is often used as a standard for the translation, storage and management of place names between different languages. At the same time, the explosive growth of data and the rapid development of information storage technology have made large-scale geographical name data collections increasingly common. Therefore, how to efficiently perform English place name dictionary query in a large-scale data environment has become an important technical challenge to improve many place name services and applications.

Conventional dictionary query methods generally use sequential traversal or binary search to obtain query records, but their operating efficiency is linear with the size of the data, and it is difficult to meet actual needs when facing massive data. Inverted files, as a simple and efficient way of indexing document data, is a basic technology implemented by modern search engine retrieval systems, and is gradually introduced into the dictionary lookup mechanism. The word-level index is a general organization of inverted documents to achieve phrases or near queries. The N-gram index is the most commonly used word-level index structure. Although the N-gram structure improves the recall rate of the query to a certain extent, the vocabulary generated by the N-gram usually increases the space occupied by the index and reduces the speed of the construction processing and query processing. In addition, the index items formed in the form of morphemes need to be calculated by using similarity in fuzzy queries, and each index item needs to be compared with the query conditions. This query mode greatly increases the complexity of the operating mechanism, and it is difficult to adapt to the application requirements of large-scale data environments.

Therefore, in order to meet the actual application requirements in different scenarios, how to efficiently return completely accurate or closest query results in the case of inaccurate and incomplete place names entered in English place name queries is currently required by those skilled in the art to study and solve problem.

Summary of the Invention

technical problem

The technical problems to be solved by the present invention include: how to efficiently return completely accurate or closest query results in the case of inaccurate and incomplete place names input in English place name query, and related technical problems under the technical problem.

Summary of the Invention

Summary of technical contributions: Discovering the text features contained in English place names and combining them with a dictionary query mechanism is the key to improving query performance. The present invention uses the total number of letters in the place name, the number of radicals, the total number of words, and the first letter of the word. For text features, place name query is performed according to the main line of "multi-dimensional feature statistics-inverted index generation-candidate place name query-similarity order", and an English place name dictionary query method based on feature statistics inverted index is proposed.

Technical solutions

first

The invention provides a method for establishing an English place name index, which is applied to user equipment. The method includes: S1) Counting a plurality of feature values of all English place name groups stored in the English place name dictionary text, the feature values including the total number of letters, The number of letters, the total number of words, and the first letter of the word; S2) Generate a corresponding multi-dimensional feature statistical vector according to the feature values of the English noun group; S3) The multi-dimensional feature statistical vector of each English noun group and its The position mapping information of the inverted table is used as an index entry to establish an inverted index file, wherein each of the index entries corresponds to an inverted chain, respectively.

The process and principle of the above English place name index establishment method are described in detail below.

First, regarding the statistics of the feature values, the feature values of all the noun groups stored in the English place-name dictionary are counted in order: the total number of letters, the number of letter radicals, the total number of words, and the first letter code of the word. Among them, (1) the total number of letters represents the sum of all the letters included in the noun noun group; (2) the radicals of letters are based on the Chinese pictograph idea, and each English letter is set by "|", "-", "/" , "\", "(" And ")" are composed of some radicals. The expressions of radicals with different letters are shown in Table 1 below. Obviously, the more identical characters appear in two strings, the more similar they are considered. However, the number of English letters is large, and recording the occurrence frequency of each letter in the index entry will occupy too much storage space and is not conducive to comparison between strings. Expressing each letter with a fixed radical can simplify the comparative complexity of the query on the premise of implicitly recording the frequency occurrence of the letters; (3) the total number of words represents the sum of all words included in the noun noun group; (4) The first letter of a word is to convert the first letter of a word in a place noun group into a numeric code form. The conversion rule is to map the codes of "01" to "26" in the order of A to Z, that is, A code is "01", B Coded as "02", and so on. In the process of code conversion, the first letter is converted to uppercase.

Table 1

Among them, the radical "|" is represented by number 1, the radical "-" is represented by number 2, the radical "/" is represented by number 3, the radical "\" is represented by number 4, and the radical "(" is represented by number 5 Indicates that the radical ")" is indicated by the number 6.

Secondly, with regard to the composition of the index entries, in the index dictionary, each index entry records the total number of letters, the number of letters, the total number of words, the first letter of the word, and the position of the inverted table. Among them, f _cn represents the total number of letters, and records a 1-dimensional vector. f _ar represents the number of alphabetic radicals, and there are 6 radicals in total, and a 6-dimensional vector is recorded. f _wn represents the total number of words and records a 1-dimensional vector. f _iw represents the first letter encoding. In this method, the first letter encoding information of the first 4 words in the noun group in the phrase is recorded, and the absent complementation code "00" for less than 4 words is recorded, and a 4-dimensional vector is recorded. These vectors according to equation (1), (2) and (3) simultaneous manner, constituting the 12-dimensional vector d _i. d _i as an index item fully characterizes the textual characteristics of the English place name string, and uses this as the entry point for English place name query.

d _i = [f _cn , f _ar , f _wn , f _iw ] (1)

f _ar = [f _ar1 , f _ar2 , ..., f _ar6 ] (2)

f _iw = [f _iw1 , f _iw2 , ..., f _iw4 ] (3)

Furthermore, with regard to constructing an inverted chain file, each index entry appearing in the dictionary corresponds to an inverted chain, and the inverted chain uses a data structure record (tf, <p1, p2, ..., pf>) to record the records Hit information of the index entry in the dictionary of place names. Among them, tf represents the number of occurrences of the index entry in the place-name dictionary, and pi represents position offset information appearing in the place-name dictionary each time. All hits are arranged in order to form the corresponding inverted chain.

Second aspect

The invention also provides an English place name query method, which is applied to user equipment. The English place name query method includes: obtaining a search keyword entered by the user on the user device; and searching and searching according to a pre-built index file in the English place name database. The candidate place name set related to the search keywords is described, wherein the index file stored on the user equipment is constructed according to the English place name index establishment method described in the first aspect; and the candidate place name collection is returned to the user equipment for display.

The process and principle of the above English place name query method are described in detail below.

The selection process of the candidate place name collection is:

First, for submitted query place names, first perform a normalization process, that is, the words in the place noun group are converted into a capital letter.

Second, according to the feature statistics rules when constructing the index, the feature values of the query place names are counted and organized into a vector form Q = [qf _cn , qf _ar , qf _wn , qf _iw ].

Third, using the index item Q by comparing the index dictionary, When the formula (4) the index entry is d _i candidates.

In the formula, f _cn represents the total number of letters, f _ar represents the initial number of letters, f _wn represents the total number of words, and f _iw represents the first letter code. k _cn represents the threshold for the total number of letters, k _ar represents the threshold for the number of letters in the alphabet, k _wn represents the threshold for the number of words, and k _iw represents the threshold for the number of letters in the dimension.

Fourthly, the index information d _i for the reverse analysis, offset information according to the position corresponding to the inverted chain _{_{<p 1, p 2, ...}} , p f>, data on place names to query the storage location gazetteer . The results of all the query of place names are combined to form a set of candidate place names.

In some preferred solutions, the above-mentioned English place name query method may further include the following steps: after obtaining a set of candidate place names, calculating a similarity value between each of the English place names and the search keywords in the candidate place name set; Sort the Chinese and English place names in the candidate place name set in a small order, and return the sorted result to the user device for display.

About the order similarity ranking process of the candidate place name set:

First: order similarity calculation, for all place noun groups in the candidate place name set, calculate the order similarity with the query place name. Suppose there are two place name strings, P = p ₁ p ₂ … p _n and W = w ₁ w ₂ … w _m , where N represents the same sequence of characters between P and W. The judgment of the same N order is based on two principles: (1) the same principle of partial order. N similar items ls _i by the local composition, there may be a plurality ls _i between P and W. If there exists a substring q _i = p _j p _{j + 1} … p _k in P, which is exactly the same as the sub string w _s w _{s + 1} … w _{t in} W, ls _i conforms to the principle of the same local order. Set ls _i Is a local similarity. (2) The overall order is the same. Ls _{i in the} same order between P and W constitute N. The calculation formula for the similarity of place names of P and W is shown in formula (5).

In the formula, sim (P, W) is the order similarity value between P and W, and len (N), len (P), and len (W) represent the string length values of N, P, and W, respectively.

Second: order similarity ranking. The candidate place names are sorted according to the order similarity advanced, and the sorted results are returned to the user as the final query result.

Technical effect

The present invention uses the multi-dimensional text statistical features such as the number of words and letters in the summarized place names to perform the place name query according to the main line of "multi-dimensional feature statistics-inverted index generation-candidate place name query-similarity order". In the process of index generation, for each place name record, features such as the total number of letters, the number of radicals, the total number of words, and the initials of the words are extracted, and a vector composed of multidimensional features is used as an index to construct a corresponding inverted index structure. In the process of candidate place name search and order similarity sorting, the query request is normalized and multi-dimensional feature extraction is performed. According to the generated feature vector, the candidate index name set is queried in the inverted index, and the candidate set is ranked according to the similarity from high to low. The sort is returned to the user. It has been proved through experiments that the query method for English place names based on feature statistical inverted index proposed by the present invention not only maintains high operation efficiency in a large-scale data environment, but also can accurately find out when the place name expression is inaccurate. Target place names for users to have a better user experience.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart of a method for establishing an English place name index according to the present invention.

FIG. 2 is a flowchart of an English place name query method according to the present invention.

FIG. 3 is a flowchart of a method for querying English place names in a preferred embodiment of the present invention.

FIG. 4 is a schematic diagram of an English place name query device according to the present invention.

FIG. 5 is a schematic diagram of an English place name query device in a preferred embodiment of the present invention.

FIG. 6 is a graphic flowchart of a query method for an English place name dictionary according to the present invention.

FIG. 7 is a schematic diagram of an inverted index structure in the method for establishing an English place name index according to the present invention.

detailed description

The embodiments of the present invention will be described below with specific specific examples. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification.

Technical name explanation

Letter radicals refer to the use of six characters "|", "-", "/", "\", "(" and ")" (ie, radicals) to describe the composition of large and small letters, that is, any large and small letters are It can be composed of some of the 6 characters. If we use numbers 1-6 to represent "|", "-", "/", "\", "(" and ")", then the radical of any letter can be represented by a string of numbers. For example, "L" can be composed of "|" and "-", so the radical number of "L" is expressed as "12".

The number of alphabetic radicals refers to the number of all radicals in English place names corresponding to the number of the alphabetical radicals (requiring the first letter of each word in the English place name to be capitalized).

Letter encoding refers to converting the first letter of a word in a place noun group into a numeric encoding form. The conversion rule is to map the encoding of "01" to "26" in the order of A to Z, that is, A encoding is "01" and B encoding "02", and so on.

Example 1

As shown in FIG. 1, this embodiment provides a method for establishing an English place name index, which is applied to user equipment. The method includes the following steps:

S11. Count multiple feature values of all English place noun groups stored in the English place name dictionary text. The feature values include the total number of letters, the number of radicals, the total number of words, and the initial code of the words;

S12. Generate a corresponding multi-dimensional feature statistical vector according to each of the feature values of the English noun group;

S13. An inverted index file is established by using the multidimensional feature statistical vector of each English noun group and its position mapping information in the inverted table as index entries, where each of the index entries corresponds to an inverted chain.

Specifically, the multi-dimensional feature vectors _{_{statistic: d i = [f cn,}} f ar, f wn, f iw], where, d _i represents the multidimensional feature vector statistics English names, f _cn represents the total number of letters, f _ar Represents the initial number of letters, f _wn represents the total number of words, f _iw represents the first letter code, the f _ar includes the number of 6 radicals, and the f _iw includes the first letter code of the first 4 words in the English noun group information.

Specifically, the method for establishing an English place name index may further include: when querying an English place name according to the index item, comparing a search key with the index item, and when the index item meets the following conditions, the index Terms as candidates for the query; the conditions include:

Among them, qf _cn indicates the total number of letters in the search keyword, qf _ar indicates the number of letters in the search keyword, qf _wn indicates the total number of words in the search keyword, qf _iw indicates the first letter code in the search keyword, k _cn represents the threshold of the total number of letters, k _ar represents the threshold of the number of letters in the alphabet, k _wn represents the threshold of the total number of words, and k _iw represents the threshold of the first letter encoding dimension.

Example 2

As shown in FIG. 2, this embodiment provides an English place name query method, which is applied to user equipment. The English place name query method includes the following steps:

S21: Acquire a search keyword entered by the user on the user equipment;

S22. Find a candidate place name set related to the search keyword according to a pre-established index file in the English place name database, wherein the index file stored on the user device is constructed according to the English place name index establishment method described in Embodiment 1. ;

S23. Return the set of candidate place names to the user equipment for display.

As a preferred embodiment, as shown in FIG. 3, after obtaining a set of candidate place names, the English place name query method may further include:

S31. Calculate a similarity value between each English place name and the search keyword in the candidate place name set;

S32. Sort the English and Chinese place names in the candidate place name set according to the similarity value in descending order, and return the sorted result to the user device for display.

Specifically, a method for calculating a similarity value between each English place name and a search keyword in the candidate place name set is:

Among them, P is the character string of the search keyword, W is the character string of the English place name, sim (P, W) is the order similarity value between P and W, len (N), len (P) and len (W ) Represents the string length values of N, P, and W, respectively, and N represents the same sequence of characters between P and W.

Example 3

As shown in FIG. 4, this embodiment provides an English place name query device 300, which is applied to user equipment, and specifically includes a receiving module 310, a searching module 320, and a display module 330. The receiving module 310 is configured to obtain a user input on the user device. Search keywords; the search module 320 is configured to search a candidate place name set related to the search keywords according to a pre-established index file in an English place name database, wherein the index file stored on the user equipment is according to

claim

1 or 2 The English place name index establishing method is constructed; the display module 330 is configured to display a set of candidate place names returned to the user equipment.

In a preferred solution, as shown in FIG. 5, the English place name query device further includes a similarity calculation module 410 and a ranking module 420. The similarity calculation module 410 is configured to calculate each English word in the candidate place name set after obtaining the set of candidate place names. The similarity value between the place name and the search keyword; the sorting module 420 is used to sort the Chinese and English place names in the candidate place name set in the order of the similarity value and return to the user device; the display module displays the The sorting results are described.

Specifically, the formula for calculating the similarity value of each English place name and the search keyword in the candidate place name set in the similarity calculation module includes:

In order to enable those skilled in the art to understand the present invention more clearly, the place name “Aalders Lang” is taken as an example here, and in conjunction with FIG. 1, the detailed principle description of the content of the above embodiment is described. For the convenience of explanation and understanding, the description The description will be developed in the logical order of the index generation process-candidate place name lookup process-sequential similarity ranking process.

(I) Index generation process:

Step 11: The feature values of all the noun groups stored in the English place-name dictionary are counted in turn, including: the total number of letters, the number of letter radicals, the total number of words, and the first letter code of the word. Taking the place name "Aalders Lang Lang" as an example, the total number of letters is 16. As for the number of alphabetical radicals, the numbers of the six radicals "|", "-", "/", "\", "(", and ")" are 9, 8, 3, 2, 12, and 9, respectively. . The total number of words is 3. The first letter of the word is divided into 1, 12, 2, and 0.

Step 12: Build the index dictionary file. In the index dictionary, each index entry records the total number of letters, the number of initials, the total number of words, the first letter of the word, and the position of the inverted table. Take the place name "Aalders Lang Brook" as an example, since the total number of letters is 16, the initials of the letters are 9, 8, 3, 2, 12, 9, the total number of words is 3, and the first letter of the code is 1, 12, 2, 0, so the multidimensional feature vector is expressed as [16, [9,8,3,2,12,9], 3, [1,12,2,0]]. In addition to its location mapping information <1001> with the inverted table, the index entry structure in the index dictionary file is ([16, [9,8,3,2,12,9], 3, [1,12 , 2,0]], <1001>).

Step 13: Build the inverted file. Each index entry that appears in the dictionary corresponds to an inverted chain. The inverted chain uses a data structure (tf, <p ₁ , p ₂ ,…, p _f >) to record the index entry in the place name dictionary. Hit information. Taking multi-dimensional feature vectors [16, [9, 8, 3, 2, 12, 9], 3, [1, 12, 2, 0]] as examples, the corresponding inverted table position mapping information is <1001>, That is, the position of 1001 in the inverted chain file stores the storage position information of all the phrases in the English place-name dictionary with multidimensional feature vectors. For example: the record information of the position of the reverse link file 1001 is (<5>, <7>, ..., <125>, ...), which indicates that the storage position of the related noun group in the English place name dictionary is 5, 7, ... 125 and so on.

(B) Candidate Place Name Search Process:

Step 21: For the submitted query place name, first perform normalization processing, that is, convert the words in the place noun group into the form of initial capital letters. Taking the query of the place name "Alders Langbrook" as an example, it needs to be converted to "Alders Lang Lang".

Step 22: According to the feature statistics rules when constructing the index, statistics are performed on each feature value of the query place name and organized into a vector form Q = [qf _cn , qf _ar , qf _wn , qf _iw ]. Take the place name “Alders langbrook” as an example, the total number of letters is 15, the number of initials is 9, 8, 3, 2, 10, 9, the total number of words is 3, and the first letter of the word is 1, 12, 2, 0 The multi-dimensional statistical vector is [15, [9, 8, 3, 2, 10, 9], 3, [1, 12, 2, 0]].

Step 23: Use Q to compare with the index entries in the index dictionary. When formula (4) is satisfied, the index entry d _i is a candidate qd _i .

Step 24: Perform reverse analysis on the index information in the candidate qd _i , and query the relevant storage locations in the place name dictionary according to the corresponding position offset information <p ₁ , p ₂ , ..., p _f > in the inverted chain. Place name data. The results of all the query of place names are combined to form a set of candidate place names. Taking the query of the place name "Alders langbrook" as an example, the index entry ([16, [9, 8, 3, 2, 12, 9], 3, [1, 12, 2, 0]], <1001 >) Is the candidate qd _i , analyze all the inverted chain mapping position information in qd _i , and find the related record <1001> in the inverted chain. Then use the dictionary storage location information (<5>, <7>, ..., <125>, ...) contained in the <1001> record to enter the English place name dictionary file to search for related noun groups, and all place names form the candidate place name set C.

(3) Order similarity ranking process:

Step 31: Determine the degree of similarity between place names by counting the proportion of the same number of characters between the two strings. Suppose there are two place name strings, P = p ₁ p ₂ … p _n and W = w ₁ w ₂ … w _m , where N represents the same sequence of characters between P and W. The judgment of the same N order is based on two principles: (1) the same principle of partial order. N similar items ls _i by the local composition, there may be a plurality ls _i between P and W. If there exists a substring q _i = p _j p _{j + 1} … p _k in P, which is exactly the same as the sub string w _s w _{s + 1} … w _{t in} W, ls _i conforms to the principle of the same local order. Set ls _i Is a local similarity. (2) The overall order is the same. Ls _{i in the} same order between P and W constitute N. For example, P = "Aalders Lang Brook" and W = "Lang Aalders Brook". According to the same principle of local order, "Aalders", "Lang", and "Brook" are local similar items ls ₁ , ls _2, and ls _{3, respectively} . The order in P is ls ₁ ls ₂ ls ₃ , and the order in W is ls ₂ ls ₁ ls ₃ . Taking the order of the place names P as a reference, the one that meets the same principle as the overall order is ls ₁ ls ₃ , so N = ls ₁ ls ₃ . The calculation formula for the similarity of place names of P and W is shown in formula (5).

In the formula, sim (P, W) is the order similarity value between P and W, and len (N), len (P), and len (W) represent the string length values of N, P, and W, respectively. That is, the similarity between "Aalders Lang Brook" and "Lang Aalders Brook" is 12/16 ≈ 0.75.

Step 32: Sort by similarity. Based on the similarity calculation result in step 31, the place names C _q in the candidate place name set C are sorted according to the similarity results from high to low, and the top n C _{q is} used as the query result.

experiment analysis

In order to verify the technical effect of the present invention, this embodiment uses 115,000 English place name data as an example to construct an English place name dictionary, and extracts 5,409 place names as standard place names. Construct a test set of standard place names by artificially adding errors. The types of errors cover multiple inaccurate description methods (such as: multiple letters; missing letters; letter errors; alphabetical order replacement, etc.); The accuracy of the comparison is divided into 5 levels (as shown in the table). Among them, the definition of accuracy is shown in Equation 6:

In the formula, A represents the exact number of characters in the query place name P compared with the target place name C, N represents the number of characters in the search place name P, and accu (P, C) represents the accuracy of P.

Table 2 Details of test set division in the examples

Note: The content in brackets is the target place name corresponding to the test place name, which is the standard place name form.

In addition, in the experiment, the query effect of the present invention on querying place names with different degrees of accuracy is shown in Table 3:

Table 3 Evaluation statistics of experimental results

The experimental results show that the query method for English geographical names dictionary based on the feature statistical inverted index not only maintains a high operating efficiency in a large-scale data environment, but also can be more accurately queried in the case of inaccurate representation of geographical names. The destination place name.

The above-mentioned embodiments merely illustrate the principle of the present invention and its effects, but are not intended to limit the present invention. Anyone familiar with this technology can modify or change the above embodiments without departing from the spirit and scope of the present invention. Therefore, all equivalent modifications or changes made by those with ordinary knowledge in the technical field to which they belong without departing from the spirit and technical ideas disclosed by the present invention should still be covered by the claims of the present invention.

Claims

An English place name index establishing method applied to user equipment is characterized in that the method includes:

Statistically store a plurality of feature values of all English place noun groups in the English place name dictionary text, the feature values including the total number of letters, the number of letter radicals, the total number of words, and the first letter code of the word;

Generating a set of corresponding multi-dimensional feature statistical vectors according to the feature values of the English noun group;

An inverted index file is established by using the multidimensional feature statistical vector of each English noun group and its position mapping information in the inverted table as index entries, where each of the index entries corresponds to an inverted chain, respectively.
The method for establishing an English place name index according to claim 1, wherein the multi-dimensional feature statistical vector is:

d i = [f cn , f ar , f wn , f iw ],

Where, d i represents the multidimensional feature vector statistics English names, f cn represents the total number of letters, the letter represents the number f ar radicals, f wn represents the total number of words, f iw represents the first letter code, the f ar of radicals comprising 6 Number information, the f iw includes the first letter encoding information of the first 4 words in the English noun group.
The method for establishing an English place name index according to claim 2, further comprising:

When querying an English place name according to the index item, comparing the search key with the index item, and when the index item satisfies the following conditions, using the index item as a candidate for the query;

The conditions include:

Among them, qf cn indicates the total number of letters in the search keyword, qf ar indicates the number of letters in the search keyword, qf wn indicates the total number of words in the search keyword, qf iw indicates the first letter code in the search keyword, k cn represents the threshold of the total number of letters, k ar represents the threshold of the number of letters in the alphabet, k wn represents the threshold of the total number of words, and k iw represents the threshold of the first letter encoding dimension.
An English place name query method applied to user equipment is characterized in that the English place name query method includes:

Obtain the search keywords entered by the user on the user device;

The candidate place name set related to the search keywords is found according to a pre-established index file in the English place name database, wherein the index file stored on the user equipment is constructed according to the English place name index establishment method of claim 1 ;

Return the set of candidate place names to the user device for display.
The method for querying English place names according to claim 4, wherein after obtaining the set of candidate place names, the method further comprises:

Calculating a similarity value between each English place name and a search keyword in the candidate place name set;

The Chinese and English place names in the candidate place name set are sorted according to the similarity value in descending order, and the sorted result is returned to the user device for display.
The method for querying English place names according to claim 4 or 5, wherein the method for calculating the similarity value between each English place name and the search keyword in the candidate place name set is:

Among them, P is the character string of the search keyword, W is the character string of the English place name, sim (P, W) is the order similarity value between P and W, len (N), len (P) and len (W ) Represents the string length values of N, P, and W, respectively, and N represents the same sequence of characters between P and W.
An English place name query device applied to user equipment is characterized in that it includes:

A receiving module, configured to obtain a search keyword entered by a user on a user device;

A search module, configured to search a set of candidate place names related to the search keywords according to a pre-established index file in the English place name database, wherein the index file stored on the user equipment is the English place name according to claim 1 The indexing method is constructed;

A display module for displaying a set of candidate place names returned to the user device.
The English place name query device according to claim 7, further comprising:

A similarity calculation module, configured to calculate a similarity value between each English place name and a search keyword in the candidate place name set after obtaining the set of candidate place names;

A sorting module, the user sorts the English and Chinese place names in the candidate place name set according to the similarity value in descending order and returns them to the user device;

The display module displays the ranking results.
The English place name query device according to claim 7 or 8, wherein the formula for calculating the similarity value of each English place name and the search keyword in the candidate place name set in the similarity calculation module comprises:

Among them, P is the character string of the search keyword, W is the character string of the English place name, sim (P, W) is the order similarity value between P and W, len (N), len (P) and len (W ) Represents the string length values of N, P, and W, respectively, and N represents the same sequence of characters between P and W.