CN110990520A - Address coding method and device, electronic equipment and storage medium - Google Patents

Address coding method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110990520A
CN110990520A CN201911194130.2A CN201911194130A CN110990520A CN 110990520 A CN110990520 A CN 110990520A CN 201911194130 A CN201911194130 A CN 201911194130A CN 110990520 A CN110990520 A CN 110990520A
Authority
CN
China
Prior art keywords
address
poi
fragment
word segmentation
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911194130.2A
Other languages
Chinese (zh)
Other versions
CN110990520B (en
Inventor
张海攀
汤益嘉
刘强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN201911194130.2A priority Critical patent/CN110990520B/en
Publication of CN110990520A publication Critical patent/CN110990520A/en
Application granted granted Critical
Publication of CN110990520B publication Critical patent/CN110990520B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The embodiment of the invention discloses an address compiling and sending method, an address compiling and sending device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring an address to be coded, and segmenting words of the address to be coded based on the segmentation keywords to obtain at least one segmentation slice; comparing each word segmentation sheet with an administrative division comparison table in sequence, and taking the word segmentation sheet existing in the administrative division comparison table as an address segmentation sheet; sequentially matching each address fragment with nodes of a POI tree on a pre-constructed POI tree, and determining a target POI matched with an address to be coded according to a matching result; and coding the address to be coded according to the information of the target POI. According to the embodiment of the invention, the address fragment is determined by combining the word segmentation key words and the administrative division comparison, so that the accuracy of determining the address fragment is improved. And the target POI is determined by matching the address fragment with the POI tree, so that the problem of mutual identification interference of the Chinese address fragments in text-based matching algorithms such as lucence is solved.

Description

Address coding method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of internet, in particular to an address coding method, an address coding device, electronic equipment and a storage medium.
Background
The address coding is a coding method for space positioning, supports the conversion of descriptive address information into space longitude and latitude, and is widely applied to map application. The method should associate the descriptive chinese address with the Point of interest (POI) as much as possible, so that the longitude and latitude in the POI is used as the conversion result.
The existing methods mainly have two types: (1) and segmenting the Chinese address and the poi address according to a comparison word bank, and calculating the similarity of segmentation and fragmentation of the Chinese address and the poi address so as to obtain the longitude and latitude information in the most similar poi. (2) And calculating the similarity between the Chinese address and the poi address by adopting text similarity matching tools such as lucence and the like, so as to obtain the longitude and latitude information in the most similar poi.
However, both of the two methods have certain disadvantages, the first method depends heavily on the richness of the word stock, but address information such as street names, cell names and the like is frequently adjusted and the number is very large, so that correct word segmentation of Chinese addresses is hardly realized. The second method relies on text similarity to calculate the similarity between the Chinese address and the poi address, which is easy to deviate, for example, the Chongqing second word in Chongqing south road in Shanghai city is similar to the text, and a certain probability is mistaken for Chongqing city. In addition, when writing, the Chinese address often misses writing in province, city, district and county, which is also one of the reasons for the deviation.
Disclosure of Invention
The embodiment of the invention provides an address coding method and device, electronic equipment and a storage medium, and aims to improve the precision of address word segmentation and the precision of similarity matching between a Chinese address and a POI address.
In a first aspect, an embodiment of the present invention provides an address coding method, where the method includes:
acquiring an address to be coded, and segmenting words of the address to be coded based on segmentation keywords to obtain at least one segmentation fragment, wherein the segmentation keywords comprise words representing an address hierarchical structure;
comparing each word segmentation sheet with an administrative division comparison table in sequence, and taking the word segmentation sheet existing in the administrative division comparison table as an address segmentation sheet;
sequentially matching each address fragment with nodes of a POI tree on a pre-constructed POI tree, and determining a target POI matched with the address to be coded according to a matching result; wherein the POI tree is composed of nodes representing an address hierarchy;
and coding the address to be coded according to the information of the target POI.
In a second aspect, an embodiment of the present invention further provides an address encoding apparatus, where the apparatus includes:
the word segmentation module is used for acquiring an address to be coded, and segmenting the address to be coded based on word segmentation keywords to obtain at least one word segmentation slice, wherein the word segmentation keywords comprise words representing the hierarchical structure of the address;
the comparison module is used for sequentially comparing each word segmentation sheet with an administrative division comparison table and taking the word segmentation sheets existing in the administrative division comparison table as address segmentation sheets;
the searching module is used for sequentially matching each address fragment with nodes of the POI tree on a pre-constructed POI tree and determining a target POI matched with the address to be coded according to a matching result; wherein the POI tree is composed of nodes representing an address hierarchy;
and the coding module is used for coding the address to be coded according to the information of the target POI.
In a third aspect, an embodiment of the present invention further provides an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement an address encoding method as in any one of the embodiments of the invention.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the address encoding method according to any embodiment of the present invention.
The embodiment of the invention divides the word of the address to be coded based on the word-dividing keywords, and compares the obtained word-dividing fragments with the notability of the administrative district to obtain the address fragments, thereby improving the precision of the address word-dividing. Moreover, the address fragments and the nodes of the POI tree are searched and matched in sequence, and the target POI matched with the address to be coded is determined according to the matching result, so that the similarity matching accuracy of the address to be coded and the POI address is improved, and the problem of mutual identification interference of the Chinese address fragments in text-based matching algorithms such as lucence and the like is solved.
Drawings
Fig. 1a is a schematic flowchart of an address encoding method according to a first embodiment of the present invention;
FIG. 1b is a schematic structural diagram of a POI tree according to a first embodiment of the present invention;
fig. 2 is a flowchart illustrating an address encoding method according to a second embodiment of the present invention;
fig. 3 is a flowchart illustrating an address encoding method according to a third embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an address encoding apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device in a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1a is a flowchart of an address encoding method according to an embodiment of the present invention, where the present embodiment is applicable to encoding a chinese address input by a user, and the method may be executed by an address encoding apparatus, and the apparatus may be implemented in a software and/or hardware manner, and may be integrated on a computer electronic device, such as a vehicle-mounted device or a server.
As shown in fig. 1a, the address encoding method specifically includes:
s101, an address to be coded is obtained, word segmentation is carried out on the address to be coded based on word segmentation keywords, and at least one word segmentation fragment is obtained, wherein the word segmentation keywords comprise words representing the hierarchical structure of the address.
The address to be coded can be selected from a Chinese query address input by a user. Because the Chinese address always has the following characteristics: (1) for example, the following are omitted: sichuan Chengdu double-flow northwest street. (2) Omitting levels, such as Min river in Yue Jiang Wei City in Sichuan province. (3) The administrative structure is changed, the correct address is the Tou-Bao-Bo-Bao-Lu in Luo-Bao, Guangzhou, Guangdong province. (4) The structure is multi-level, for example: sichuan province, metropolis, double-flow county, Huayang guangdong, 138 south yang flourishing age 4 units 2502. (5) Hierarchical disturbances, such as: chongqing Nanlu No. 15 in Shanghai city.
Therefore, in the embodiment of the invention, a 9-level address structure is defined in advance according to the hierarchical mechanism characteristics of the Chinese address. Specifically, level 1: province, direct municipality and autonomous region; and 2, stage: provincial cities, prefectural cities, direct prefecture cities and the like; and 3, level: county-level city, prefecture-level city county area, flag, etc.; 4, level: town, countryside, street; and 5, stage: villages, villages and villages; and 6, level: roads, streets, roads, lanes, bridges and alleys; and 7, stage: segment (minor way under major way); and 8, stage: number (a modifier of lu); and 9, stage: xx cells/buildings/companies.
On the basis of constructing a 9-level address structure, words representing an address hierarchical structure are used as participle keywords, and words such as 'province, city, district, county, state, union' and the like are used as the participle keywords. Therefore, after the address to be coded is obtained, word segmentation is carried out on the address to be coded based on the word segmentation keywords to obtain at least one word segmentation slice, and optionally, regular greedy matching is carried out on the address to be coded by using the word segmentation keywords to obtain the word segmentation slices. For example, the address to be encoded is: and dividing words into 4 word-dividing fragments of 88 numbers of the Sichuan province, the Chengdu city, the Jinjiang district and the Yangdu city in the Sinkiang city.
And S102, comparing the word segmentation sheets with an administrative division comparison table in sequence, and taking the word segmentation sheets existing in the administrative division comparison table as address segmentation sheets.
Optionally, administrative division recognition is performed on each word segmentation sheet, for example, each word segmentation sheet is sequentially compared with an administrative division comparison table, and if any target word segmentation sheet exists in the administrative division comparison table, it is indicated that the word segmentation of the target word segmentation sheet is correct. For example, if the word segmentation areas of sichuan, jungle, and jinjiang are found in the administrative division comparison table by query, the three word segmentation areas are the administrative division address segmentation areas.
Further, since there is a region with the same name in China, if any target word segmentation sheet exists in the administrative district comparison table, it is determined whether there is an administrative region with the same name as the target word segmentation sheet; if the address fragment exists, determining the address fragment according to the upper level administrative region to which the administrative region with the same name belongs; and if the target word segmentation fragment does not exist, the target word segmentation fragment is used as an address fragment.
Further, after the address fragment is determined, if it is detected that the administrative division corresponding to a certain address fragment has a change adjustment, the address fragment is replaced by the name of the administrative division after the change. For example, the address shards are Guangdong province, Guangzhou city, Rough Duty district, and Tourette division, respectively, and since the Rough Duty district changes its name to the Huangpu district, the final address shards are Guangdong province, Guangzhou city, Huangbu district, and Tourette division.
Furthermore, because the address to be coded has the condition of omitting administrative division keywords such as provinces, cities, districts and counties, if the situation determines that a certain address fragment lacks fragments according to the hierarchical structure information of the administrative division, the missing fragments are supplemented to ensure the accuracy of the subsequent address query.
It should be noted that after the address fragment is determined according to the administrative division comparison table, the corresponding address fragment can be obtained based on the recognition of the hierarchical structure of the village and the town and the village and the like by the keyword.
S103, sequentially matching each address fragment with nodes of the POI tree on a pre-constructed POI tree, and determining a target POI matched with the address to be coded according to a matching result.
Wherein, the POI tree is composed of nodes representing an address hierarchy, and optionally, the operation of constructing the POI tree in advance includes: firstly, performing word segmentation processing on a POI file, wherein the processing process is similar to S101-S102, obtaining a POI address in the POI file, and performing word segmentation on the POI address based on word segmentation keywords to obtain at least one word segmentation fragment, wherein the word segmentation keywords comprise words representing an address hierarchical structure; and comparing each word segmentation sheet with the administrative division comparison table in sequence, and taking the word segmentation sheet existing in the administrative division comparison table as the POI address segmentation sheet. And then the POI address fragments are used as nodes, and all the nodes are assembled into a POI tree according to the hierarchy of the POI address fragments. Illustratively, referring to fig. 1b, a schematic structural diagram of a POI tree is shown.
And on the basis of constructing the POI tree, sequentially matching each address fragment with the nodes of the POI tree, and determining a target POI matched with the address to be coded according to the matching result. It should be noted that, by sequentially matching the address fragments with the nodes of the POI, the problem of mutual interference between different address fragments can be avoided.
And S104, coding the address to be coded according to the information of the target POI.
After the target POI is determined, establishing an association relationship between the target POI and the address to be coded, and acquiring information of the target POI, such as acquiring the name and longitude and latitude information of the target POI, and completing the coding operation of the address to be coded.
In the embodiment of the invention, the address to be coded is segmented based on the segmentation keywords, and the obtained segmentation fragments are compared with the notability of the administrative district to obtain the address segmentation fragments, so that the address segmentation accuracy can be improved. And moreover, each address fragment is searched and matched with the node of the POI tree in sequence, the target POI matched with the address to be coded is determined according to the matching result, and because the address fragments are hierarchically arranged and matched with the POI tree in sequence, interference does not exist among different address fragments, the problem of mutual identification interference of the Chinese address fragments in text-based matching algorithms such as lucence and the like is solved, and the similarity matching accuracy of the address to be coded and the POI address is improved.
Example two
Fig. 2 is a schematic flow chart of an address coding method according to a second embodiment of the present invention, where the present embodiment is optimized based on the above-described embodiments, and adds an operation of obtaining address fragments by recombining word fragments if address fragments are not obtained, where the address coding method specifically includes:
s201, an address to be coded is obtained, word segmentation is carried out on the address to be coded based on word segmentation keywords, and at least one word segmentation fragment is obtained, wherein the word segmentation keywords comprise words representing a hierarchical structure of the address.
S202, comparing the word segmentation sheets with the administrative division comparison table in sequence, and if the word segmentation sheets do not exist in the administrative division comparison table, combining any two adjacent word segmentation sheets into a new word segmentation sheet.
After at least one participle fragment is obtained on the basis of S201, if the participle fragment is confirmed to be absent in the administrative division comparison table after being compared with the administrative division comparison table, the participle result of S201 is indicated to be inaccurate, and any two adjacent participle fragments need to be combined into a new participle fragment for judgment. Further, if only one word segmentation slice is confirmed by comparison and is not in the administrative region comparison table, the word segmentation slice and the previous word segmentation slice adjacent to the word segmentation slice form a new slice.
S203, selecting combinations of characters with different numbers from the new word segmentation sheets, sequentially comparing the combinations with an administrative division comparison table, and taking the combinations existing in the administrative division comparison table as address segmentation sheets.
For any one new word segmentation sheet, for example, selecting different numbers of characters from the new word segmentation sheet as combined words in sequence, for example, selecting a combined word composed of 1 character, two characters, or three characters, b sequentially comparing each obtained combined word with an administrative division comparison table, and using a combination existing in the administrative division comparison table as an address segmentation sheet.
And S204, sequentially matching each address fragment with the nodes of the POI tree on the pre-constructed POI tree, and determining a target POI matched with the address to be coded according to the matching result.
Wherein the POI tree is composed of nodes representing an address hierarchy.
S205, coding the address to be coded according to the information of the target POI.
In the embodiment of the invention, when the obtained word segmentation fragments are inaccurate, new word segmentation fragments are obtained by recombination and are compared with the administrative division comparison table, so that the accuracy of determining the address fragments is further improved, and the accuracy of address coding is further ensured.
EXAMPLE III
Fig. 3 is a schematic flow chart of an address coding method according to a third embodiment of the present invention, and this implementation is optimized based on the foregoing embodiments, where the address coding method specifically includes:
s301, an address to be coded is obtained, word segmentation is carried out on the address to be coded based on word segmentation keywords, and at least one word segmentation fragment is obtained, wherein the word segmentation keywords comprise words representing the hierarchical structure of the address.
S302, comparing each word segmentation sheet with an administrative division comparison table in sequence, and taking the word segmentation sheet existing in the administrative division comparison table as an address segmentation sheet.
In the embodiment of the present invention, the address fragments are sorted according to an address hierarchical structure, for example, an address to be coded is "Pi Dongdong street 156 # of street in Pi prefecture of metropolis, Sichuan province", and the address fragments obtained by the above operations are sorted according to the address hierarchical structure and then sequentially: sichuan, Chengdu, Pi Chengdu, Pi barrels, Dongda, 156.
And S303, matching the first-level address fragment with the POI tree, marking the target node of the matched POI tree as a father node, and taking the next address fragment after the first-level address fragment as the current address fragment.
S304, searching whether a target child node matched with the current address fragment exists in all child nodes of the father node.
Optionally, the target child node may be obtained by using the current address fragment and the child node with the same name in an exact matching manner. In addition, a child node containing the longest character string as the current address fragment can be used as a target child node in a fuzzy matching mode.
S305, if the target child node exists, setting the father node as the traversed node, and marking the target child node as the father node.
And S306, sequentially taking the address fragments arranged after the current address fragment as the current address fragment, executing the operations from S304 to S305 until the last address fragment is matched with the POI node, and taking the POI node matched at last as the target POI.
Illustratively, a first address fragment 'Sichuan' is matched with a POI tree of Sichuan, the matched node 'Sichuan' is taken as a father node, the next address fragment 'Chengdu' is taken as a current address fragment, whether a target child node 'Chengdu' matched with the current address fragment 'Chengdu' exists or not is searched from all child nodes of the father node 'Sichuan', if yes, the father node 'Sichuan' is marked as a traversed node, the target child node 'Chengdu' is marked as a new father node, the address fragment 'Pi' is taken as the current address fragment, and the judgment and the search are continuously carried out from the child nodes of the new father node 'Chengdu'. Similarly, the subsequent address fragments can be sequentially judged and searched until the last address fragment is matched with the POI node, and the POI node matched at last is used as the target POI.
It should be noted that, after the determination in step S304, if there is no target child node matching the current address fragment in all child nodes of the parent node, a matching method of downward exploration and upward backtracking is provided in the implementation of the present invention.
Aiming at the matching method of downward exploration, if a target child node matched with the current address fragment does not exist in all child nodes of a father node, guessing that the address fragment obtained according to the address to be coded lacks fragments, and determining whether each child node has a descendant node or not; if yes, searching whether a target descendant node matched with the current address fragment exists in the descendant node or not; if the matched target descendant nodes exist, marking the superior nodes of the target descendant nodes as traversed nodes, and taking the target descendant nodes as father nodes; and then, each address fragment arranged after the current address fragment is sequentially used as the current address fragment to continue matching search.
And if the matched target descendant nodes do not exist, the obtained address fragments are considered to be more than the corresponding address fragments, so that the search matching method of the upward backtracking is executed. Specifically, the node traversed by the previous level of the parent node is backed, the node traversed by the previous level is used as a new parent node, whether a child node matched with the current address fragment exists in all child nodes of the new parent node is judged, and if the child node does not exist, the operation of back judgment is continuously executed until the POI node matched with the current address fragment is found. And then the next address fragment is arranged after the current address is judged.
S307, coding the address to be coded according to the information of the target POI.
In the embodiment of the invention, the address fragments are matched and searched on the POI tree to determine the target POI most similar to the address to be coded, thereby solving the problem of mutual identification interference of the Chinese address fragments in text-based matching algorithms such as lucence and the like. Moreover, when any address fragment cannot be matched with the matching number, a search matching algorithm of downward exploration and upward backtracking is provided, and the recognition rate and the accuracy of matching are improved.
Example four
Fig. 4 is a schematic structural diagram of an address encoding apparatus according to a fourth embodiment of the present invention, which is disposed on an electronic device. As shown in fig. 4, the apparatus includes:
the word segmentation module 401 is configured to obtain an address to be encoded, perform word segmentation on the address to be encoded based on word segmentation keywords, and obtain at least one word segmentation segment, where the word segmentation keywords include words representing an address hierarchical structure;
the first comparison module 402 is configured to compare each word segmentation slice with an administrative division comparison table in sequence, and use the word segmentation slice existing in the administrative division comparison table as an address segmentation slice;
a searching module 403, configured to match, in sequence, each address fragment with a node of a preset POI tree, and determine, according to a matching result, a target POI matched with the address to be encoded; wherein the POI tree is composed of nodes representing an address hierarchy;
and the encoding module 404 is configured to encode the address to be encoded according to the information of the target POI.
In the embodiment of the invention, the address to be coded is segmented based on the segmentation keywords, and the obtained segmentation fragments are compared with the notability of the administrative district to obtain the address segmentation fragments, so that the address segmentation accuracy can be improved. Moreover, the address fragments and the nodes of the POI tree are searched and matched in sequence, and the target POI matched with the address to be coded is determined according to the matching result, so that the similarity matching accuracy of the address to be coded and the POI address is improved, and the problem of mutual identification interference of the Chinese address fragments in text-based matching algorithms such as lucence and the like is solved.
Optionally, the alignment module is specifically configured to:
comparing each word segmentation sheet with an administrative division comparison table in sequence, and if any target word segmentation sheet exists in the administrative division comparison table, judging whether an administrative region with the same name as the target word segmentation sheet exists;
if the address fragment exists, determining the address fragment according to the upper level administrative region to which the administrative region with the same name belongs;
and if the target word segmentation fragment does not exist, the target word segmentation fragment is used as an address fragment.
Optionally, the apparatus further comprises:
the combination module is used for combining any two adjacent participle fragments into a new participle fragment if the address fragment is not obtained;
and the second comparison module is used for selecting different numbers of character combinations from the new word segmentation sheets to be sequentially compared with the administrative division comparison table, and taking the combinations existing in the administrative division comparison table as address segmentation sheets.
Optionally, the apparatus further comprises:
the replacing module is used for replacing the address fragment with the name of the changed administrative division if detecting that the administrative division corresponding to the address fragment has change adjustment;
and the completion module is used for completing the missing fragments if the fragments lack in the address fragments according to the hierarchical structure information of the administrative division.
Optionally, the apparatus further includes a POI tree construction module, configured to:
the POI word segmentation unit is used for acquiring a POI address in a POI file, and carrying out word segmentation on the POI address based on word segmentation keywords to obtain at least one word segmentation fragment, wherein the word segmentation keywords comprise words representing an address hierarchical structure;
the comparison unit is used for sequentially comparing each word segmentation sheet with an administrative division comparison table and taking the word segmentation sheets existing in the administrative division comparison table as POI address segmentation sheets;
and the assembling unit is used for taking the POI address fragments as nodes and assembling all the nodes into a POI tree according to the hierarchy of the POI address fragments.
Optionally, the search module is specifically configured to:
s1, matching the first-level address fragment with the POI tree, marking a target node of the matched POI tree as a father node, and taking the next address fragment after the first-level address fragment as a current address fragment; wherein, each address fragment is ordered according to the address hierarchical structure;
s2, searching whether a target child node matched with the current address fragment exists in all child nodes of a father node;
s3, if the target child node exists, setting the father node as a traversed node, and marking the target child node as the father node;
and S4, sequentially using the address fragments arranged after the current address fragment as the current address fragment, executing the operations from S2 to S3 until the last address fragment is matched with the POI node, and using the last matched POI node as a target POI.
Optionally, the search module is further configured to:
if the target child node matched with the current address fragment does not exist in all child nodes of the father node, determining whether each child node has a descendant node or not;
if yes, searching whether a target descendant node matched with the current address fragment exists in the descendant node or not;
if the matched target descendant nodes exist, marking the superior nodes of the target descendant nodes as traversed nodes, and taking the target descendant nodes as father nodes;
and if the matched target descendant node does not exist, returning to a node traversed by the previous level of the father node, taking the node traversed by the previous level as a new father node, judging whether a child node matched with the current address fragment exists in all child nodes of the new father node, and if not, continuing to execute the operation of returning judgment until the POI node matched with the current address fragment is found.
The address coding device provided by the embodiment of the invention can execute the address coding method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention. FIG. 5 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 5 is only an example and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.
As shown in FIG. 5, electronic device 12 is embodied in the form of a general purpose computing electronic device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, implementing an address encoding method provided by an embodiment of the present invention, the method including:
acquiring an address to be coded, and segmenting words of the address to be coded based on segmentation keywords to obtain at least one segmentation fragment, wherein the segmentation keywords comprise words representing an address hierarchical structure;
comparing each word segmentation sheet with an administrative division comparison table in sequence, and taking the word segmentation sheet existing in the administrative division comparison table as an address segmentation sheet;
sequentially matching each address fragment with nodes of a POI tree on a pre-constructed POI tree, and determining a target POI matched with the address to be coded according to a matching result; wherein the POI tree is composed of nodes representing an address hierarchy;
and coding the address to be coded according to the information of the target POI.
EXAMPLE six
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements an address encoding method provided in an embodiment of the present invention, where the method includes:
acquiring an address to be coded, and segmenting words of the address to be coded based on segmentation keywords to obtain at least one segmentation fragment, wherein the segmentation keywords comprise words representing an address hierarchical structure;
comparing each word segmentation sheet with an administrative division comparison table in sequence, and taking the word segmentation sheet existing in the administrative division comparison table as an address segmentation sheet;
sequentially matching each address fragment with nodes of a POI tree on a pre-constructed POI tree, and determining a target POI matched with the address to be coded according to a matching result; wherein the POI tree is composed of nodes representing an address hierarchy;
and coding the address to be coded according to the information of the target POI.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. An address encoding method, characterized in that the method comprises:
acquiring an address to be coded, and segmenting words of the address to be coded based on segmentation keywords to obtain at least one segmentation fragment, wherein the segmentation keywords comprise words representing an address hierarchical structure;
comparing each word segmentation sheet with an administrative division comparison table in sequence, and taking the word segmentation sheet existing in the administrative division comparison table as an address segmentation sheet;
sequentially matching each address fragment with nodes of a POI tree on a pre-constructed POI tree, and determining a target POI matched with the address to be coded according to a matching result; wherein the POI tree is composed of nodes representing an address hierarchy;
and coding the address to be coded according to the information of the target POI.
2. The method according to claim 1, wherein comparing the word segmentation sheets with an administrative division comparison table in sequence, and using the word segmentation sheets existing in the administrative division comparison table as address segmentation sheets comprises:
comparing each word segmentation sheet with an administrative division comparison table in sequence, and if any target word segmentation sheet exists in the administrative division comparison table, judging whether an administrative region with the same name as the target word segmentation sheet exists;
if the address fragment exists, determining the address fragment according to the upper level administrative region to which the administrative region with the same name belongs;
and if the target word segmentation fragment does not exist, the target word segmentation fragment is used as an address fragment.
3. The method of claim 1, further comprising:
if the address fragment is not obtained, combining any two adjacent word segmentation fragments into a new word segmentation fragment;
and selecting combinations of characters with different numbers from the new word segmentation sheets, sequentially comparing the combinations with an administrative division comparison table, and taking the combinations existing in the administrative division comparison table as address segmentation sheets.
4. The method of claim 1, wherein after determining address fragmentation, the method further comprises:
if detecting that the administrative division corresponding to a certain address fragment has change adjustment, replacing the address fragment with the name of the changed administrative division;
and if the fragments are determined to be absent in the address fragments according to the hierarchical structure information of the administrative division, filling up the missing fragments.
5. The method of claim 1, wherein the operation of pre-building a POI tree comprises:
the method comprises the steps of obtaining a POI address in a POI file, and carrying out word segmentation on the POI address based on word segmentation keywords to obtain at least one word segmentation fragment, wherein the word segmentation keywords comprise words representing an address hierarchical structure;
comparing each word segmentation sheet with an administrative division comparison table in sequence, and taking the word segmentation sheet existing in the administrative division comparison table as a POI address segmentation sheet;
and taking the POI address fragments as nodes, and assembling all the nodes into a POI tree according to the hierarchy of the POI address fragments.
6. The method according to claim 1, wherein the step of sequentially matching each address fragment with a node of the POI tree on a pre-constructed POI tree and determining a target POI most similar to the address to be encoded according to a matching result comprises:
s1, matching the first-level address fragment with the POI tree, marking a target node of the matched POI tree as a father node, and taking the next address fragment after the first-level address fragment as a current address fragment; wherein, each address fragment is ordered according to the address hierarchical structure;
s2, searching whether a target child node matched with the current address fragment exists in all child nodes of a father node;
s3, if the target child node exists, setting the father node as a traversed node, and marking the target child node as the father node;
and S4, sequentially using the address fragments arranged after the current address fragment as the current address fragment, executing the operations from S2 to S3 until the last address fragment is matched with the POI node, and using the last matched POI node as a target POI.
7. The method of claim 6, further comprising:
if the target child node matched with the current address fragment does not exist in all child nodes of the father node, determining whether each child node has a descendant node or not;
if yes, searching whether a target descendant node matched with the current address fragment exists in the descendant node or not;
if the matched target descendant nodes exist, marking the superior nodes of the target descendant nodes as traversed nodes, and taking the target descendant nodes as father nodes;
and if the matched target descendant node does not exist, returning to a node traversed by the previous level of the father node, taking the node traversed by the previous level as a new father node, judging whether a child node matched with the current address fragment exists in all child nodes of the new father node, and if not, continuing to execute the operation of returning judgment until the POI node matched with the current address fragment is found.
8. An address encoding apparatus, characterized in that the apparatus comprises:
the word segmentation module is used for acquiring an address to be coded, and segmenting the address to be coded based on word segmentation keywords to obtain at least one word segmentation slice, wherein the word segmentation keywords comprise words representing the hierarchical structure of the address;
the comparison module is used for sequentially comparing each word segmentation sheet with an administrative division comparison table and taking the word segmentation sheets existing in the administrative division comparison table as address segmentation sheets;
the searching module is used for sequentially matching each address fragment with nodes of the POI tree on a pre-constructed POI tree and determining a target POI matched with the address to be coded according to a matching result; wherein the POI tree is composed of nodes representing an address hierarchy;
and the coding module is used for coding the address to be coded according to the information of the target POI.
9. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the address encoding method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the address coding method according to any one of claims 1 to 7.
CN201911194130.2A 2019-11-28 2019-11-28 Address coding method and device, electronic equipment and storage medium Active CN110990520B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911194130.2A CN110990520B (en) 2019-11-28 2019-11-28 Address coding method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911194130.2A CN110990520B (en) 2019-11-28 2019-11-28 Address coding method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110990520A true CN110990520A (en) 2020-04-10
CN110990520B CN110990520B (en) 2023-10-20

Family

ID=70087927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911194130.2A Active CN110990520B (en) 2019-11-28 2019-11-28 Address coding method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110990520B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069276A (en) * 2020-08-31 2020-12-11 平安科技(深圳)有限公司 Address coding method and device, computer equipment and computer readable storage medium
CN112364635A (en) * 2020-11-30 2021-02-12 中国银行股份有限公司 Enterprise name duplication checking method and device
CN112818665A (en) * 2021-01-29 2021-05-18 上海寻梦信息技术有限公司 Method and device for structuring address information, electronic equipment and storage medium
CN113076389A (en) * 2021-03-16 2021-07-06 百度在线网络技术(北京)有限公司 Article region identification method and device, electronic equipment and readable storage medium
CN113935293A (en) * 2021-12-16 2022-01-14 湖南四方天箭信息科技有限公司 Address splitting and complementing method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914544A (en) * 2014-04-03 2014-07-09 浙江大学 Method for quickly matching Chinese addresses in multi-level manner on basis of address feature words
CN106649464A (en) * 2016-09-26 2017-05-10 深圳市数字城市工程研究中心 Method of building Chinese address tree and device
CN109033086A (en) * 2018-08-03 2018-12-18 银联数据服务有限公司 A kind of address resolution, matched method and device
CN109344213A (en) * 2018-08-28 2019-02-15 浙江工业大学 A kind of Chinese Geocoding based on dictionary tree

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914544A (en) * 2014-04-03 2014-07-09 浙江大学 Method for quickly matching Chinese addresses in multi-level manner on basis of address feature words
CN106649464A (en) * 2016-09-26 2017-05-10 深圳市数字城市工程研究中心 Method of building Chinese address tree and device
CN109033086A (en) * 2018-08-03 2018-12-18 银联数据服务有限公司 A kind of address resolution, matched method and device
CN109344213A (en) * 2018-08-28 2019-02-15 浙江工业大学 A kind of Chinese Geocoding based on dictionary tree

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
111: "111" *
李新放等: ""K叉树地址的模糊匹配研究与实现"" *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069276A (en) * 2020-08-31 2020-12-11 平安科技(深圳)有限公司 Address coding method and device, computer equipment and computer readable storage medium
WO2021189977A1 (en) * 2020-08-31 2021-09-30 平安科技(深圳)有限公司 Address coding method and apparatus, and computer device and computer-readable storage medium
CN112069276B (en) * 2020-08-31 2024-03-08 平安科技(深圳)有限公司 Address coding method, address coding device, computer equipment and computer readable storage medium
CN112364635A (en) * 2020-11-30 2021-02-12 中国银行股份有限公司 Enterprise name duplication checking method and device
CN112364635B (en) * 2020-11-30 2023-11-21 中国银行股份有限公司 Enterprise name duplicate checking method and device
CN112818665A (en) * 2021-01-29 2021-05-18 上海寻梦信息技术有限公司 Method and device for structuring address information, electronic equipment and storage medium
CN113076389A (en) * 2021-03-16 2021-07-06 百度在线网络技术(北京)有限公司 Article region identification method and device, electronic equipment and readable storage medium
CN113935293A (en) * 2021-12-16 2022-01-14 湖南四方天箭信息科技有限公司 Address splitting and complementing method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110990520B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN110990520B (en) Address coding method and device, electronic equipment and storage medium
CN107656913B (en) Map interest point address extraction method, map interest point address extraction device, server and storage medium
CN109145169B (en) Address matching method based on statistical word segmentation
WO2020228706A1 (en) Fence address-based coordinate data processing method and apparatus, and computer device
CN108628811B (en) Address text matching method and device
US8996523B1 (en) Forming quality street addresses from multiple providers
CN112069276B (en) Address coding method, address coding device, computer equipment and computer readable storage medium
CN111159974A (en) Address information standardization method and device, storage medium and electronic equipment
CN111625732A (en) Address matching method and device
CN108733810A (en) A kind of address date matching process and device
CN115470307A (en) Address matching method and device
CN112650858A (en) Method and device for acquiring emergency assistance information, computer equipment and medium
CN111291099A (en) Address fuzzy matching method and system and computer equipment
CN109271625B (en) Pinyin spelling standardization method for Chinese place names
CN114780680A (en) Retrieval and completion method and system based on place name and address database
CN112069824B (en) Region identification method, device and medium based on context probability and citation
CN114595302A (en) Method, device, medium, and apparatus for constructing multi-level spatial relationship of spatial elements
CN113434708A (en) Address information detection method and device, electronic equipment and storage medium
CN114513550B (en) Geographic position information processing method and device and electronic equipment
CN113468881B (en) Address standardization method and device
CN115658837A (en) Address data processing method and device, electronic equipment and storage medium
CN114792091A (en) Chinese address element analysis method and equipment based on vocabulary enhancement and storage medium
CN116414808A (en) Method, device, computer equipment and storage medium for normalizing detailed address
CN113535883A (en) Business place entity linking method, system, electronic device and storage medium
CN112287671A (en) Simhash-based address resolution method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220920

Address after: 25 Financial Street, Xicheng District, Beijing 100033

Applicant after: CHINA CONSTRUCTION BANK Corp.

Address before: 25 Financial Street, Xicheng District, Beijing 100033

Applicant before: CHINA CONSTRUCTION BANK Corp.

Applicant before: Jianxin Financial Science and Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant