CN112256821A - Method, device, equipment and storage medium for complementing Chinese address - Google Patents
Method, device, equipment and storage medium for complementing Chinese address Download PDFInfo
- Publication number
- CN112256821A CN112256821A CN202011013241.1A CN202011013241A CN112256821A CN 112256821 A CN112256821 A CN 112256821A CN 202011013241 A CN202011013241 A CN 202011013241A CN 112256821 A CN112256821 A CN 112256821A
- Authority
- CN
- China
- Prior art keywords
- address
- name information
- place name
- complete
- level
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000003860 storage Methods 0.000 title claims abstract description 31
- 238000013507 mapping Methods 0.000 claims description 26
- 238000004590 computer program Methods 0.000 claims description 16
- 238000010586 diagram Methods 0.000 description 12
- 238000003780 insertion Methods 0.000 description 10
- 230000037431 insertion Effects 0.000 description 10
- 244000144730 Amygdalus persica Species 0.000 description 6
- 235000006040 Prunus persica var persica Nutrition 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013500 data storage Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/322—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/383—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/387—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application provides a method, a device, equipment and a storage medium for complementing a Chinese address, aiming at realizing the quick search and complementation of the Chinese address and not influencing the performance of a system when a database is increased. The method comprises the following steps: storing the complete address in the address base according to the structure of the Trie tree; marking the address level of each path on the last node of each path in the Trie tree; searching the input address keywords in the Trie tree to obtain the address keywords marked with the address level; and analyzing the labeled information in the address keywords to obtain the complete address corresponding to the address keywords.
Description
Technical Field
The embodiment of the application relates to the technical field of information processing, in particular to a method, a device, equipment and a storage medium for complementing a Chinese address.
Background
The Chinese address completion is a technology closely related to daily life of people, aims to complete incomplete addresses input by people into complete addresses, and has important applications in various scenes and applications, such as online shopping and address filling during certificate handling. One of the existing address completion methods is to store a large amount of address information in a database and then query according to a specific field, and the other scheme is to optimize the query by adopting an inverted index mode.
The prior art has the defects that when the database is too large, the efficiency of address query is reduced, the query time is nearly linearly increased, and the reverse index needs to be designed with a large number of databases in advance, so that the time is consumed.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for completing a Chinese address, aiming at realizing the quick search and completion of the Chinese address.
A first aspect of the embodiments of the present application provides a method for completing a chinese address, where the method includes:
storing the complete address in the address base according to the structure of the Trie tree;
marking the address level of each path on the last node of each path in the Trie tree;
searching the input address keywords in the Trie tree to obtain the address keywords marked with the address level;
and analyzing the labeled information in the address keywords to obtain the complete address corresponding to the address keywords.
Optionally, before storing the complete address in the address library according to the structure of the Trie tree, the method further includes:
indexing each of the complete addresses in the address base;
storing the place name information in each complete address in the address library in a grading way;
generating a mapping of the place name information of each level to an index of the full address to which it belongs.
Optionally, storing the complete address in the address library according to the structure of the Trie tree, including:
sequentially inserting the complete address into a path of the Trie tree from a first character by taking the character as a unit;
after the first place name information in the complete address is inserted, sequentially inserting the second place name information in the complete address into another path in the Trie tree by taking characters as units from a first character;
and after the second place name information is inserted, sequentially inserting all the place name information into the Trie tree according to the rule for storage.
Optionally, the method further comprises:
and when the first character of the current place name information is the same as the first character in the previous path, taking the node corresponding to the first character in the previous path as the root node of the path corresponding to the current place name information, and inserting the current place name information into the Trie tree.
Optionally, searching the input place name information in the Trie tree to obtain the place name information labeled with the address level, including:
searching each character of the address keyword from a root node of the Trie tree to obtain a plurality of paths corresponding to the address keyword;
and determining the address level of each place name information in the address keywords according to the address level marked on the last node of each path in the paths.
Optionally, the method further comprises:
and when the shortest path matched with the place name information cannot be found in the Trie tree, marking the place name information by using marks except the address level.
Optionally, analyzing the label information of the address keyword to obtain the complete address corresponding to the address keyword, including:
determining an address level of each of the place name information in the address keyword;
determining the complete address of the place name information according to the mapping of the index of each place name information and the complete address to which the place name information belongs, and obtaining a plurality of complete addresses containing the place name information;
when the address keywords contain at least two pieces of place name information, calculating the intersection between the complete addresses to obtain the complete addresses corresponding to the address keywords;
and when the address keyword only contains one piece of place name information, selecting the complete address with the highest use frequency as the complete address of the address keyword.
A second aspect of the embodiments of the present application provides a device for completing a chinese address, where the device includes:
the first address storage module is used for storing the complete address in the address base according to the structure of the Trie tree;
an address level marking module, configured to mark an address level of each path in the Trie on a last node of the path;
an address level lookup module: searching the input address keywords in the Trie tree to obtain the address keywords marked with the address level;
a complete address acquisition module: and analyzing the labeled information in the address keywords to obtain the complete address corresponding to the address keywords.
Optionally, the apparatus further comprises:
the index establishing module is used for establishing an index for each complete address in the address base;
the second storage module is used for storing the place name information in each complete address in the address library in a grading way;
and the mapping generation module is used for generating mapping of the place name information of each level and the index of the complete address to which the place name information belongs.
Optionally, the first address storage module includes:
the first character insertion sub-module is used for sequentially inserting the complete address into one path of the Trie tree from a first character by taking the character as a unit;
the second character insertion sub-module is used for inserting second place name information in the complete address into another path in the Trie tree in sequence by taking characters as units from the first character after the first place name information in the complete address is inserted;
and the third character insertion sub-module is used for sequentially inserting all the place name information into the Trie tree for storage according to the rule after the second place name information is inserted.
Optionally, the first address storage module further includes:
and the fourth character insertion sub-module is used for inserting the current place name information into the Trie tree by taking the node corresponding to the first character in the previous path as the root node of the path corresponding to the current place name information when the first character of the current place name information is the same as the first character in the previous path.
Optionally, the address level lookup module includes:
the path searching submodule is used for searching each character of the address keyword from a root node of the Trie tree to obtain a plurality of paths corresponding to the address keyword;
and the first address level obtaining submodule is used for determining the address level of each place name information in the address key words according to the address level marked on the last node of each path in the paths.
Optionally, the address level lookup module further includes:
and the second address level obtaining submodule is used for marking the place name information by using marks except the address level when the shortest path matched with the place name information cannot be found in the Trie tree.
Optionally, the complete address obtaining module includes:
an address level determination submodule, configured to determine an address level of each of the location name information in the address keyword;
a complete address obtaining submodule, configured to determine the complete address where the place name information is located according to mapping between each piece of place name information and an index of the complete address to which the place name information belongs, so as to obtain a plurality of complete addresses including the place name information;
the first complete address determining submodule is used for calculating the intersection among the complete addresses to obtain the complete address corresponding to the address keyword when the address keyword contains at least two pieces of place name information;
and the second complete address obtaining submodule is used for selecting the complete address with the highest use frequency as the complete address of the address keyword when the address keyword only contains one piece of place name information.
A third aspect of embodiments of the present application provides a readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps in the method according to the first aspect of the present application.
A fourth aspect of the embodiments of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method according to the first aspect of the present application.
The Chinese address completion method is adopted, the index of the complete address in the address library is pre-established, the place name information of each level contained in the complete address is stored in a grading way, the mapping of the place name information of each level and the index of the complete address where the place name information is located is generated, after the address library is pre-processed, the complete address information in the address library is stored according to the structure of the Trie tree, all the place name information is stored in each node of the Trie tree by taking characters as units, so that one place name information is a complete path, a plurality of place name information with the same characters from the first character can share one or a plurality of nodes, the address grade of the path is marked on the last node of each path, when a user inputs an address keyword, the place name information in the address keyword can be searched in the Trie to obtain the grade of all the place name information, and finding out complete addresses corresponding to all the place name information according to the mapping from the place name information of the corresponding level stored in the address library, thereby realizing Chinese address completion.
The complete address in the address library is stored by using the Trie tree structure, the special structure of the Trie tree is used, the query time is reduced, the rapid search of the address library is realized, the Chinese address is completed rapidly, and the performance is not obviously reduced along with the increase of the address library.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a flowchart of an address repository data storage method according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for Chinese address completion according to an embodiment of the present application;
fig. 3 is a structural diagram of a Trie according to an embodiment of the present disclosure;
FIG. 4 is a diagram of an apparatus for Chinese address completion according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The existing Chinese address completion method is to query the entry information matched with the input address in the database, so that the optimization of the performance and query conditions of the database is very depended on, and when the database is too large, the query rate is reduced. In another scheme, the query is optimized by adopting an inverted index mode, which needs to design a database in advance and create an index, and the workload is large.
According to the method and the device, the data in the address library are stored by adopting the Trie tree structure, the storage space is greatly saved, Chinese addresses are complemented according to the mapping of the place name information of different levels and the corresponding complete address index, the searching speed is high, and the query speed cannot be influenced when the database is increased.
Referring to fig. 1, fig. 1 is a flowchart of an address repository data storage method according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:
s11: indexing each of the complete addresses in the address repository.
In this embodiment, the address library is a database for storing complete addresses, where a complete address refers to a chinese address including top-level place name information to bottom-level place name information, and the index is a pointer list of complete addresses in the address library, which is equivalent to a directory in the address library, and the position of a complete address in the address library can be quickly determined by querying the index.
The index is built for each complete address in the address base, so that the complete address in the address base can be quickly inquired, a pointer list which points to the complete address in the address base in a space storage mode is separately built, and the complete address in the address base is searched through the index during inquiry.
In this embodiment, for example, "guangdong province, shenzhen city, nanshan district, peach garden road" and "beijing city, hai lake district, and zhongguan village road" are all a complete address, "guangdong province", "beijing city" belongs to the highest-level place name information, "peach garden road", and "zhongguan village road" belongs to the lowest-level place name information. The index of Guangdong province, Shenzhen city, Nanshan region and peach garden road can be set as "A", "Beijing city, Hai lake region and Zhongguancun road", namely the position of the complete address of Guangdong province, Shenzhen city, Nanshan region and peach garden road in the address library can be determined through the index A, and the quick query is realized.
S12, storing the place name information in each complete address in the address library in a grading way;
in this embodiment, a complete address is composed of a plurality of location name information of different levels, and the name information of the same level is stored together, so that a quick search can be performed according to the level of the location name information during a query.
In this embodiment, for example, all address information is divided into four levels, "province", "city", "county", "street and cell", which may be respectively denoted as "P", "C", "D" and "S", where the city of the direct jurisdiction, the province-level municipality and the special administrative district may also be regarded as "P" level place name information, the highest address level is "P" level, and the lowest address level is "S" level, and for example, "zhejiang, guangdong, jiangsu, beijing city, inner Mongolia municipality" is stored as P-level address information.
S13, generating a mapping of the place name information of each level and the index of the complete address to which the place name information belongs.
In this embodiment, each location name information belongs to a complete address, and each complete address is indexed in S11. And generating mapping of each place name information and the index of the complete address to which the place name information belongs according to the dependency relationship of each place name information and the index of the complete address. When the location name information is stored separately, each individual location name information may belong to a plurality of complete addresses, and the location name information of a plurality of levels D is subordinate in the location name information of level C, so that when the mapping is generated, the index corresponding to the location name information of level C will include the index corresponding to the location name information of level D. The S-level location name information may have the same name in the same level, and when the mapping is generated, the index corresponding to the S-level location name information includes all indexes of the same-name location name information. After the mapping is generated, the index set of the complete address to which the place name information belongs can be inquired according to the place name information.
In this embodiment, for example, if the index of "guangdong province, shenzhen city, nanshan region, peach garden way" is "0", and the index of "guangdong province, shenzhen city, dragon sentry region, dragon flying fairway" is "1", then a P-level address map { guangdong province, [0,1] }, a C-level address map { shenzhen city [0,1] }, and a D-level address map { nanshan region, [0 ]; the address mapping of the S level in the Dragon sentry area [1] }, { peach garden path, [0 ]; dragon flying fairway, [1] }.
Through the steps of S11 to S13, an index of a complete address in an address base is established, each location name information is separately stored in a level, and a mapping between each location name information and the index of the complete address to which it belongs is established, so that the index of the complete address can be queried only according to the location name information.
Referring to fig. 2, fig. 2 is a flowchart of a method for completing a chinese address according to an embodiment of the present application. As shown, the method comprises the following steps:
and S21, storing the complete address in the address library according to the structure of the Trie tree.
In this embodiment, the Trie is also called a dictionary tree, the Trie structure is a tree structure and is a variant of a hash tree, and the Trie tree can reduce the storage space by using a common prefix and has a good performance on a text word frequency statistical task.
In this embodiment, there is no character on the root node of the Trie, there is only one character on each node except the root node, a plurality of characters on one path form a character string, and the character strings included in all child nodes of each node are different.
In this embodiment, the method for storing the complete address in the address library according to the structure of the Trie includes the following steps:
s21-1: and sequentially inserting the complete address into a path of the Trie tree from the first character by taking the character as a unit.
In this embodiment, a string of characters on each path of the Trie structure represents address information of a complete address.
In this embodiment, when the complete address is inserted into the Trie, the complete address is inserted from the first character of the highest-level address information in the complete address, and all characters of the highest-level address information are sequentially inserted into one path, where the last character of the highest-level address information is the last node of the path.
S21-2: and after the first place name information in the complete address is inserted, sequentially inserting the second place name information in the complete address into another path in the Trie tree from the first character by taking the character as a unit.
In this embodiment, after the place name information of the highest level is inserted into the Trie to form a path, the second character of the place name information of the second level is inserted into a root node of a new path, and all the place name information of the second level is sequentially inserted into the new path, and similarly, the last character of the place name information of the second level is the last node of the new path.
S21-3: and after the second place name information is inserted, sequentially inserting all the place name information into the Trie tree according to the rule for storage.
In this embodiment, each place name information is sequentially inserted into the Trie tree in units of characters, where a string of characters on each path is a piece of place name information. And storing all the place name information in the address library into the Trie tree, thereby finishing the storage of the data.
For example, fig. 3 is a structural diagram of a Trie proposed in an embodiment of the present application, and as shown in fig. 3, "jiangji 37050prefecture, xinglong avenue," is stored in the Trie structure, where each node has only one character, and each path represents a place name information.
In another embodiment of the present application, when the first character of the current place name information is the same as the first character in the previous path, the node corresponding to the first character in the previous path is taken as the root node of the path corresponding to the local name information, and the current place name information is inserted into the Trie.
The first character of the place name information can be shared, when the first character of the current input place name information appears as the first character in the previous path, the node can be directly used as the root node of the current input place name information, when the character is inserted subsequently, if the second character is still a child node of the current path, the node corresponding to the second character can still be used as the second node of the current input place name information, until the same character can not be found in the current path by the subsequent continuous character of the current place name information, the subsequent character of the current place name information is inserted into a new node, and the subsequent character is inserted into the new node of the path whether the subsequent character appears or not before, so that the insertion of the current place name information is completed. The place name information of different address levels may also share the same root node.
As shown in fig. 3, when the complete address "Jiangxi province, Nanchang city, east lake region, Jiangde street" is inserted into the Trie tree, the "Jiang" character of the place name information "Jiangxi province" appears before, the node corresponding to the "Jiang" character is directly used as the root node of the place name information "Jiangxi province", and a child node is newly built under the node corresponding to the "Jiang" character to generate a new path. The insertion mode of the Nanchang city is the same as that of the Nanchang city, if the first character east of the east lake region does not appear in the previous root node, a new path is created for the east lake region, and if the build character of the build street appears in the previous root node, a new path is created by directly using the node corresponding to the build character as the root node.
The steps from S21-1 to S21-2 are adopted to store the complete address according to the structure of the Trie, the space is saved by sharing prefix characters, and the required address information can be quickly inquired.
And S22, marking the address level of each path in the Trie tree on the last node of the path.
In this embodiment, after all the complete addresses in the address library are inserted into the Trie, the address level of each path, that is, the address level of each location name information, needs to be marked on each path. The address level is labeled according to the address level of the place name information divided at the time of storage.
In this embodiment, the way of labeling the address level of the path for each path is to label the address level to which the path belongs on the last node of each path. Thus, the address level of the place name information contained in the path can be obtained by analyzing the information of the last node.
As shown in fig. 3, the last node of each path in the graph is labeled with an address level, wherein the address at the level P has "Jiansu province, Jiangxi province", the address at the level C has "Nanchang city, Nanjing city", the address at the level D has "Jian 37050", the district, the east lake district ", and the address at the level S has" Zhongnan Avenue, Jian De Duan ".
In another embodiment of the present application, a repeated path occurs, that is, the location name information of different actual locations is repeated, and when the address level is labeled, the address of a higher level is used as the standard.
And S23, searching the input address keywords in the Trie tree to obtain the address keywords marked with the address level.
In this embodiment, the address keyword refers to an incomplete address containing a plurality of place name information, and the address keyword labeled with an address level is an address level of each place name information in the address keyword obtained by searching.
And S23-1, searching each character of the address keyword from the root node of the Trie tree to obtain a plurality of paths corresponding to the address keyword.
In this embodiment, the address keywords are sequentially searched from the root node of the Trie to the lower node in units of characters. After receiving the address keywords, firstly, starting from a root node, searching a first character of first place name information in the address keywords, after determining the root node corresponding to the character, switching to a next-level sub-tree of the root node, searching a second character of the first place name information in the address keywords, determining a corresponding sub-tree according to a node corresponding to the second character, then, continuing to switch to the next-level sub-tree to search a third character of the first place name information in the address keywords, when a certain node does not contain the next-level sub-tree, reading address level information on the node, after reading the address level information on the node, marking each character of the place name information with the address level information. And then starting to search a path corresponding to the next place name information in the address keyword from the root node of the Trie tree, wherein the searching mode is the same as that of the first place name information. Until the last character position of the address keyword is found. And after the searching is finished, obtaining a plurality of paths corresponding to a plurality of place name information contained in the address keyword.
In this embodiment, for example, the input keywords are: "Jian 37050;" district, Xinglong street ", as shown in the figure, firstly finds the root node where the" build "word is located, then turns to the next level subtree of the root node, finds the node where the" Ye "word is located, turns to the next level subtree, finds the node where the" district "word is located, and if the node does not include the next level subtree, the path corresponding to" build 37050; "district" is obtained. Then, the root node where the 'xing' word is located is searched, and the path where the 'Xinglong avenue' is located can be determined according to the method.
S23-2, determining the address level of each place name information in the address key words according to the address level marked on the last node of each path in the paths.
In this embodiment, after the plurality of paths included in the input address keyword are found in the previous step, the address level of each path is determined by reading the address level information on the last node of each path.
In this embodiment, a plurality of place name information in the same address keyword have different address levels, and after the effective address level information on the last node of each path is read, all nodes on the path are labeled according to the address level.
In this embodiment, after determining "create 37050;" region "from S23-1, if the address level information on the node corresponding to the" region "word is read as D, then" create 37050; "region" is labeled as DDD. In the same way, marking the Xinglong avenue as SSSS, and obtaining the address keyword with marking information as Jian 37050in the district, Xinglong avenue; (DDD, SSSS) "
In another embodiment of the present application, for an input address keyword, when a shortest path matching the place name information cannot be found in the Trie tree, labeling the place name information with a flag other than the address level.
The shortest path matched with the place name information cannot be found, that is, the path of the place name information cannot be found in the Trie, for example, an input address keyword has a wrongly written character, a district is set up in a withdrawn county in a county, the name is changed, an address library is not updated in time, and the like, which may cause that the place name information cannot be matched with the shortest path. At this time, the place name information is marked with a flag other than the existing address level.
Illustratively, a district set by a withdrawing county of the XX city is renamed to be the XX district, the address base does not update the district, when the input keywords are the XX city and the XX district, the XX city is marked as CCC, when the XX district is not matched with the shortest path in searching the XX district, the XX district is marked as "" and address completion can not be performed any more.
And S24, analyzing the labeling information of the address keywords to obtain the complete address corresponding to the address keywords.
In this embodiment, the address keywords marked with the address level are obtained, and the complete address corresponding to the address keywords can be obtained by analyzing the address keywords, so that completion of the chinese address is achieved. The method specifically comprises the following steps:
s24-1, determining the address level of each place name information in the address keyword.
In this embodiment, the address keyword labeled with the address level is obtained through S23, and at this time, the address keyword containing the label information is obtained, and further operation is performed on the place name information included in the address keyword, and the label information on the address keyword needs to be analyzed to obtain the address level of each place name information in the keyword.
Illustratively, the regular expression is used to extract the address level corresponding to each place name information in the address keyword. The method comprises the following steps of marking ' Jian 37050 ', district, Xinglong street ' in S23; the extraction of (DDD, SSSS) "results in" Jian 37050; (DDD), "Xinglong Avenue (SSSS)".
S24-2, determining the complete address of the place name information according to the mapping of the place name information and the index of the complete address to which the place name information belongs, and obtaining a plurality of complete addresses containing the place name information.
In this embodiment, the address level corresponding to each place name information is obtained by analyzing the label information in the address keyword, and the index of the complete address to which the place name information belongs can be directly searched in the map of the place name information of the corresponding level in the address library through the address level corresponding to each place name information, so that all the complete addresses to which the place name information belongs can be determined.
By way of example, assume that the index of the complete address "Jiansu province, Nanjing City, Jian 37050;, region, XXX road (representing Jian 37050;, multiple roads in the region)" in the address library is "3, 4,5, … …, 15". The index of the complete address "Jiangsu province, Nanjing city, Jian 37050;, Xinglong street" in the address library is "5". D-level place name information building 37050, area and building 37050, wherein the mapping of the index of the complete address of the area is { building 37050; [3,4,5, … …, 15] }, and if a plurality of S-level place name information "Xinglong avenue" exists in the address library, that is, the index of "Xinglong avenue" belongs to a plurality of different complete addresses, for example, the index of "XX province, XX city, XX district, Xinglong avenue" exists in the address library is "16", "XX province, XX city, XX district, Xinglong avenue" is "23", and the mapping of the S-level place name information Xinglong avenue and the index of the complete address of the Xinglong avenue belongs to { Xinglong avenue, [5,16,23] }.
After two pieces of place name information of 'create 37050; (DDD);' Xinglong great street (SSSS) 'are extracted from the address keywords, the' create 37050; 'region' is searched in the mapping of the D level, the 'create 37050;' is obtained, the index set corresponding to the region is '3, 4,5, … …, 15', and then all the complete addresses where the 'create 37050;' region belongs are found through the index. And searching the Xinglong avenue in the mapping of the S level to obtain an index set corresponding to the Xinglong avenue as 5,16 and 23, and searching all complete addresses to which the Xinglong avenue belongs through the index.
S24-3: and when the address keyword contains at least two pieces of place name information, calculating the intersection between the complete addresses to obtain the complete address corresponding to the address keyword.
In this embodiment, when the address keyword at least includes two place name information, each place name information may correspond to multiple complete addresses, and at this time, the intersection between the multiple complete addresses is calculated, so that the complete address corresponding to the address keyword can be obtained.
Exemplarily, a plurality of complete addresses are found in S24-2 through indexes corresponding to ' Jian 37050;, ' Xinglong Dajie ', and at the moment, the intersection between the complete addresses is taken to obtain the complete address which simultaneously contains ' Jian 37050; ' Xinglong Dajie ', and ' Jian Huaglong Dajie ', namely ' Jiansu province, Nanjing city, Jian 37050; ' zone, Xinglong Dajie '.
S24-4: and when the address keyword only contains one piece of place name information, selecting the complete address with the highest use frequency as the complete address of the address keyword.
In this embodiment, when the address keyword only includes one location name information, one location name information may correspond to a plurality of complete addresses, and at this time, according to an address generated by the past history, the complete address with the highest use frequency is selected as the complete address of the current address keyword.
Illustratively, the input address keyword is 'Changan road', a plurality of complete addresses containing the Changan road are obtained through query, and if the history generation frequency of 'Shanghai City, quiet district, Changan road' is the highest, the 'Shanghai city, quiet district, Changan road' is selected as the corresponding complete address of the Changan road.
In this embodiment, there is also a case where, when the input address keyword does not include the place name information of the next level, the address keyword of the level higher than the level is complemented.
Illustratively, the input address keyword is 'Jian 37050;' district ', all complete addresses including Jian 37050;' district 'are obtained, the intersection of the complete addresses is calculated, and the complete address is' Jian 37050; 'district', Jian 37050; 'in Jiangsu province, Nanjing city, at this time, the address keyword does not have the Jian 37050;' the place name information of the next level of the district cannot complement the address of the next level.
Through S21-S24, the complete address is stored according to the structure of a Trie tree, each path stores one place name information, the last node of each path marks the address level of the place name information, when an address keyword is input, each place name information in the keyword is searched from the root node, the marked information on the last node is read, the address level of the place name information corresponding to the path is obtained, through mapping of the previously generated place name information and the complete address to which the place name information belongs, a plurality of complete addresses corresponding to the place name information of each level in the address keyword are searched, and through calculating the intersection among the complete addresses, completion of the Chinese address is achieved. The Trie tree is used for storing the address library information, so that the space can be saved, and the complete address corresponding to the address keyword can be quickly searched by a hierarchical searching method. The searching efficiency is high, the speed is high, and the performance cannot be influenced when the data of the address base is increased.
Based on the same inventive concept, an embodiment of the present application provides a device for complementing a chinese address. Referring to fig. 4, fig. 4 is a schematic diagram of an apparatus for chinese address completion according to an embodiment of the present application. As shown in fig. 4, the apparatus includes:
a first address storage module 401, configured to store a complete address in an address library according to a structure of a Trie tree;
an address level labeling module 402, configured to label an address level of each path in the Trie on a last node of the path;
address level lookup module 403: searching the input address keywords in the Trie tree to obtain the address keywords marked with the address level;
the full address acquisition module 404: and analyzing the labeled information in the address keywords to obtain the complete address corresponding to the address keywords.
Optionally, the apparatus further comprises:
the index establishing module is used for establishing an index for each complete address in the address base;
the second storage module is used for storing the place name information in each complete address in the address library in a grading way;
and the mapping generation module is used for generating mapping of the place name information of each level and the index of the complete address to which the place name information belongs.
Optionally, the first address storage module includes:
the first character insertion sub-module is used for sequentially inserting the complete address into one path of the Trie tree from a first character by taking the character as a unit;
the second character insertion sub-module is used for inserting second place name information in the complete address into another path in the Trie tree in sequence by taking characters as units from the first character after the first place name information in the complete address is inserted;
and the third character insertion sub-module is used for sequentially inserting all the place name information into the Trie tree for storage according to the rule after the second place name information is inserted.
Optionally, the first address storage module further includes:
and the fourth character insertion sub-module is used for inserting the current place name information into the Trie tree by taking the node corresponding to the first character in the previous path as the root node of the path corresponding to the current place name information when the first character of the current place name information is the same as the first character in the previous path.
Optionally, the address level lookup module includes:
the path searching submodule is used for searching each character of the address keyword from a root node of the Trie tree to obtain a plurality of paths corresponding to the address keyword;
and the first address level obtaining submodule is used for determining the address level of each place name information in the address key words according to the address level marked on the last node of each path in the paths.
Optionally, the address level lookup module further includes:
and the second address level obtaining submodule is used for marking the place name information by using marks except the address level when the shortest path matched with the place name information cannot be found in the Trie tree.
Optionally, the complete address obtaining module includes:
an address level determination submodule, configured to determine an address level of each of the location name information in the address keyword;
a complete address obtaining submodule, configured to determine the complete address where the place name information is located according to mapping between each piece of place name information and an index of the complete address to which the place name information belongs, so as to obtain a plurality of complete addresses including the place name information;
the first complete address determining submodule is used for calculating the intersection among the complete addresses to obtain the complete address corresponding to the address keyword when the address keyword contains at least two pieces of place name information;
and the second complete address obtaining submodule is used for selecting the complete address with the highest use frequency as the complete address of the address keyword when the address keyword only contains one piece of place name information.
Based on the same inventive concept, another embodiment of the present application provides a readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the method for chinese address completion according to any of the above embodiments of the present application.
Based on the same inventive concept, another embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, and when the processor executes the computer program, the method for chinese address completion according to any of the above embodiments of the present application is implemented.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one of skill in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The method, the device, the equipment and the storage medium for Chinese address completion provided by the application are introduced in detail, a specific example is applied in the text to explain the principle and the implementation mode of the application, and the description of the embodiment is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (10)
1. A method of chinese address completion, the method comprising:
storing the complete address in the address base according to the structure of the Trie tree;
marking the address level of each path on the last node of each path in the Trie tree;
searching the input address keywords in the Trie tree to obtain the address keywords marked with the address level;
and analyzing the labeled information in the address keywords to obtain the complete address corresponding to the address keywords.
2. The method of claim 1, wherein before storing the complete addresses in the address repository in the structure of the Trie, the method further comprises:
indexing each of the complete addresses in the address base;
storing the place name information in each complete address in the address library in a grading way;
generating a mapping of the place name information of each level to an index of the full address to which it belongs.
3. The method of claim 1, wherein storing the complete address in the address repository according to the structure of the Trie comprises:
sequentially inserting the complete address into a path of the Trie tree from a first character by taking the character as a unit;
after the first place name information in the complete address is inserted, sequentially inserting the second place name information in the complete address into another path in the Trie tree by taking characters as units from a first character;
and after the second place name information is inserted, sequentially inserting all the place name information into the Trie tree according to the rule for storage.
4. The method of claim 3, further comprising:
and when the first character of the current place name information is the same as the first character in the previous path, taking the node corresponding to the first character in the previous path as the root node of the path corresponding to the current place name information, and inserting the current place name information into the Trie tree.
5. The method of claim 1, wherein searching the input place name information in the Trie tree to obtain the place name information labeled with the address level comprises:
searching each character of the address keyword from a root node of the Trie tree to obtain a plurality of paths corresponding to the address keyword;
and determining the address level of each place name information in the address keywords according to the address level marked on the last node of each path in the paths.
6. The method of claim 5, further comprising:
and when the shortest path matched with the place name information cannot be found in the Trie tree, marking the place name information by using marks except the address level.
7. The method of claim 1, wherein analyzing the label information of the address keyword to obtain the complete address corresponding to the address keyword comprises:
determining an address level of each of the place name information in the address keyword;
determining the complete address of the place name information according to the mapping of the index of each place name information and the complete address to which the place name information belongs, and obtaining a plurality of complete addresses containing the place name information;
when the address keywords contain at least two pieces of place name information, calculating the intersection between the complete addresses to obtain the complete addresses corresponding to the address keywords;
and when the address keyword only contains one piece of place name information, selecting the complete address with the highest use frequency as the complete address of the address keyword.
8. An apparatus for Chinese address completion, the apparatus comprising:
the first address storage module is used for storing the complete address in the address base according to the structure of the Trie tree;
an address level marking module, configured to mark an address level of each path in the Trie on a last node of the path;
an address level lookup module: searching the input address keywords in the Trie tree to obtain the address keywords marked with the address level;
a complete address acquisition module: and analyzing the labeled information in the address keywords to obtain the complete address corresponding to the address keywords.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented when the computer program is executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011013241.1A CN112256821B (en) | 2020-09-23 | 2020-09-23 | Chinese address completion method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011013241.1A CN112256821B (en) | 2020-09-23 | 2020-09-23 | Chinese address completion method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112256821A true CN112256821A (en) | 2021-01-22 |
CN112256821B CN112256821B (en) | 2024-05-17 |
Family
ID=74231990
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011013241.1A Active CN112256821B (en) | 2020-09-23 | 2020-09-23 | Chinese address completion method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112256821B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113139033A (en) * | 2021-05-13 | 2021-07-20 | 平安国际智慧城市科技股份有限公司 | Text processing method, device, equipment and storage medium |
CN113204613A (en) * | 2021-04-26 | 2021-08-03 | 北京百度网讯科技有限公司 | Address generation method, device, equipment and storage medium |
CN113656450A (en) * | 2021-07-12 | 2021-11-16 | 大箴(杭州)科技有限公司 | Address processing method and device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030158667A1 (en) * | 2002-02-15 | 2003-08-21 | International Business Machines Corporation | Programmatically deriving street geometry from address data |
KR101132150B1 (en) * | 2010-10-12 | 2012-07-11 | (주)수지원넷소프트 | Address processing for formalizing addresses |
CN108369582A (en) * | 2018-03-02 | 2018-08-03 | 福建联迪商用设备有限公司 | A kind of address error correction method and terminal |
CN109815498A (en) * | 2019-01-25 | 2019-05-28 | 深圳市小赢信息技术有限责任公司 | A kind of Chinese address standardized method, device and electronic equipment |
CN110147420A (en) * | 2019-05-07 | 2019-08-20 | 武大吉奥信息技术有限公司 | A kind of place name address matching querying method and system based on spectrum model |
CN110442603A (en) * | 2019-07-03 | 2019-11-12 | 平安科技(深圳)有限公司 | Address matching method, apparatus, computer equipment and storage medium |
CN110750704A (en) * | 2019-10-23 | 2020-02-04 | 深圳计算科学研究院 | Method and device for automatically completing query |
-
2020
- 2020-09-23 CN CN202011013241.1A patent/CN112256821B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030158667A1 (en) * | 2002-02-15 | 2003-08-21 | International Business Machines Corporation | Programmatically deriving street geometry from address data |
KR101132150B1 (en) * | 2010-10-12 | 2012-07-11 | (주)수지원넷소프트 | Address processing for formalizing addresses |
CN108369582A (en) * | 2018-03-02 | 2018-08-03 | 福建联迪商用设备有限公司 | A kind of address error correction method and terminal |
CN109815498A (en) * | 2019-01-25 | 2019-05-28 | 深圳市小赢信息技术有限责任公司 | A kind of Chinese address standardized method, device and electronic equipment |
CN110147420A (en) * | 2019-05-07 | 2019-08-20 | 武大吉奥信息技术有限公司 | A kind of place name address matching querying method and system based on spectrum model |
CN110442603A (en) * | 2019-07-03 | 2019-11-12 | 平安科技(深圳)有限公司 | Address matching method, apparatus, computer equipment and storage medium |
CN110750704A (en) * | 2019-10-23 | 2020-02-04 | 深圳计算科学研究院 | Method and device for automatically completing query |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113204613A (en) * | 2021-04-26 | 2021-08-03 | 北京百度网讯科技有限公司 | Address generation method, device, equipment and storage medium |
CN113139033A (en) * | 2021-05-13 | 2021-07-20 | 平安国际智慧城市科技股份有限公司 | Text processing method, device, equipment and storage medium |
CN113656450A (en) * | 2021-07-12 | 2021-11-16 | 大箴(杭州)科技有限公司 | Address processing method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112256821B (en) | 2024-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112256821A (en) | Method, device, equipment and storage medium for complementing Chinese address | |
CN108959244B (en) | Address word segmentation method and device | |
CN109213844B (en) | Text processing method and device and related equipment | |
CN110019647B (en) | Keyword searching method and device and search engine | |
CN105528372B (en) | A kind of address search method and equipment | |
CN108228657B (en) | Method and device for realizing keyword retrieval | |
CN107862026B (en) | Data storage method and device, data query method and device, and electronic equipment | |
CN108846016B (en) | Chinese word segmentation oriented search algorithm | |
CN103123650B (en) | A kind of XML data storehouse full-text index method mapped based on integer | |
CN104199860B (en) | Dataset fragmentation method based on two-dimensional geographic position information | |
CN103186524A (en) | Address name identification method and device | |
CN107748778B (en) | Method and device for extracting address | |
CN101794307A (en) | Vehicle navigation POI (Point of Interest) search engine based on internetwork word segmentation idea | |
CN104866502A (en) | Data matching method and device | |
CN112069276A (en) | Address coding method and device, computer equipment and computer readable storage medium | |
CN104679801A (en) | Point of interest searching method and point of interest searching device | |
CN105701133B (en) | Address input method and equipment | |
CN112083812A (en) | Associative word determining method and device, storage medium and electronic equipment | |
CN107239549A (en) | Method, device and the terminal of database terminology retrieval | |
CN112528174A (en) | Address finishing and complementing method based on knowledge graph and multiple matching and application | |
CN106021556A (en) | Address information processing method and device | |
CN115563409A (en) | Address administrative division identification method, device, equipment and medium | |
CN111475511A (en) | Data storage method, data access method, data storage device, data access device and data access equipment based on tree structure | |
CN106294784B (en) | resource searching method and device | |
CN105025013A (en) | A dynamic IP coupling model based on a priority Trie tree |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |