CN110990520B - Address coding method and device, electronic equipment and storage medium - Google Patents

Address coding method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110990520B
CN110990520B CN201911194130.2A CN201911194130A CN110990520B CN 110990520 B CN110990520 B CN 110990520B CN 201911194130 A CN201911194130 A CN 201911194130A CN 110990520 B CN110990520 B CN 110990520B
Authority
CN
China
Prior art keywords
address
poi
segmentation
node
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911194130.2A
Other languages
Chinese (zh)
Other versions
CN110990520A (en
Inventor
张海攀
汤益嘉
刘强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN201911194130.2A priority Critical patent/CN110990520B/en
Publication of CN110990520A publication Critical patent/CN110990520A/en
Application granted granted Critical
Publication of CN110990520B publication Critical patent/CN110990520B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The embodiment of the invention discloses an address compiling method, an address compiling device, electronic equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining an address to be encoded, and performing word segmentation on the address to be encoded based on word segmentation keywords to obtain at least one word segmentation slice; comparing each word segmentation and segmentation with an administrative division comparison table in sequence, and taking the word segmentation and segmentation existing in the administrative division comparison table as address segmentation; sequentially matching each address fragment with nodes of the POI tree on the pre-constructed POI tree, and determining a target POI matched with the address to be coded according to a matching result; and encoding the address to be encoded according to the information of the target POI. According to the embodiment of the invention, the address segmentation is determined by combining the word segmentation keywords with the administrative division pair photography, so that the accuracy of determining the address segmentation is improved. And the target POI is determined by matching the address fragments with the POI tree, so that the problem of mutual recognition interference of Chinese address fragments in a text-based matching algorithm such as lucence is solved.

Description

Address coding method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of Internet, in particular to an address coding method, an address coding device, electronic equipment and a storage medium.
Background
The address coding is a coding method for space positioning, supports the conversion of descriptive address information into space longitude and latitude, and is widely applied to map application. The method should associate the descriptive Chinese address with the point of interest (Point of Interest, POI) as much as possible, thereby taking the longitude and latitude in the point of interest as the conversion result.
The existing methods mainly comprise two types: (1) And dividing the Chinese address and the poi address into words according to a comparison word stock, and then calculating the similarity of word division and segmentation of the Chinese address and the poi address, so as to obtain longitude and latitude information in the poi which is the most similar. (2) And calculating the similarity of the Chinese address and the poi address by adopting a text similarity matching tool such as the nonce and the like, so as to obtain longitude and latitude information in the poi which is the most similar.
However, both the two methods have certain defects, the first method is seriously dependent on the richness of the word stock, but address information such as street names, cell names and the like is frequently adjusted, and the number is very large, so that correct word segmentation of the Chinese address becomes almost impossible. The second method relies on text similarity to calculate similarity between a Chinese address and a poi address, which is very easy to deviate, for example, chongqing two words in Chongqing south road of Shanghai city are mistaken as Chongqing city with a certain probability because of text similarity. In addition, when writing, the Chinese address often fails to write province, city and county, and is one of the reasons for easily causing deviation.
Disclosure of Invention
The embodiment of the invention provides an address coding method, an address coding device, electronic equipment and a storage medium, so as to achieve the purposes of improving the accuracy of address word segmentation and improving the accuracy of similarity matching of Chinese addresses and POI addresses.
In a first aspect, an embodiment of the present invention provides an address encoding method, where the method includes:
obtaining an address to be encoded, and segmenting the address to be encoded based on a segmentation keyword to obtain at least one segmentation segment, wherein the segmentation keyword comprises a word representing an address hierarchical structure;
comparing each word segmentation and segmentation with an administrative division comparison table in sequence, and taking the word segmentation and segmentation existing in the administrative division comparison table as address segmentation;
sequentially matching each address fragment with nodes of the POI tree on a pre-constructed POI tree, and determining a target POI matched with the address to be coded according to a matching result; wherein the POI tree is composed of nodes representing an address hierarchy;
and encoding the address to be encoded according to the information of the target POI.
In a second aspect, an embodiment of the present invention further provides an address encoding apparatus, where the apparatus includes:
the word segmentation module is used for obtaining an address to be encoded, and segmenting the address to be encoded based on a word segmentation keyword to obtain at least one word segmentation piece, wherein the word segmentation keyword comprises a word representing an address hierarchical structure;
the comparison module is used for comparing each word segmentation with an administrative division comparison table in sequence and taking the word segmentation existing in the administrative division comparison table as an address segmentation;
the searching module is used for sequentially matching each address fragment with the node of the POI tree on the pre-constructed POI tree, and determining a target POI matched with the address to be coded according to a matching result; wherein the POI tree is composed of nodes representing an address hierarchy;
and the encoding module is used for encoding the address to be encoded according to the information of the target POI.
In a third aspect, an embodiment of the present invention further provides an electronic device, including:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement an address encoding method as described in any of the embodiments of the present invention.
In a fourth aspect, embodiments of the present invention further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an address encoding method according to any of the embodiments of the present invention.
According to the embodiment of the invention, the address to be encoded is segmented based on the segmentation keywords, and the obtained segmentation fragments are compared with administrative division memorial to obtain the address fragments, so that the accuracy of the address segmentation can be improved. And by sequentially searching and matching each address fragment with the nodes of the POI tree, determining a target POI matched with the address to be encoded according to the matching result, improving the accuracy of similarity matching between the address to be encoded and the POI address, and solving the problem of mutual recognition interference of Chinese address fragments in text-based matching algorithms such as nonce.
Drawings
FIG. 1a is a flowchart of an address encoding method according to a first embodiment of the present invention;
fig. 1b is a schematic structural diagram of a POI tree according to the first embodiment of the present invention;
FIG. 2 is a flow chart of an address encoding method in a second embodiment of the present invention;
FIG. 3 is a flow chart of an address encoding method in a third embodiment of the present invention;
fig. 4 is a schematic diagram of an address coding device in a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device in a fifth embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Example 1
Fig. 1a is a flowchart of an address encoding method according to an embodiment of the present invention, where the method may be applied to encoding a chinese address input by a user, and the method may be performed by an address encoding device, which may be implemented in software and/or hardware, and may be integrated on an electronic device, such as a vehicle-mounted device or a server.
As shown in fig. 1a, the address coding method specifically includes:
s101, obtaining an address to be encoded, and segmenting the address to be encoded based on a segmentation keyword to obtain at least one segmentation segment, wherein the segmentation keyword comprises a word representing an address hierarchical structure.
Wherein, the address to be coded is a Chinese inquiry address which is input by a user and can be selected. Since the Chinese address always has the following characteristics: (1) Administrative division keywords such as province, city, county, etc. are omitted, for example: sichuan Cheng double-current northwest street. (2) The hierarchy is omitted, for example, city river weir and Minriver in Sichuan province. (3) The administrative structure is changed, and the correct address is the Huang XUN Dong road in Guangzhou, guangdong. (4) structural hierarchy is multiple, for example: 4-1 units 2502 are found in the city of si-chun, double-current county, huayang guangdong, street 138 nan yang. (5) hierarchical interference, such as: chongqing south road No. 15 in Shanghai city.
Therefore, in the embodiment of the invention, a 9-level address structure is defined in advance according to the hierarchical mechanism characteristics of the Chinese address. Specifically, level 1: province, direct administration and municipality and autonomous region; 2 stages: provincial cities, ground level cities, direct jurisdictions, etc.; 3 stages: county-level cities, district-level city-county regions, flags, etc.; 4 stages: town, country and street; 5 stages: village, house, lining, village; stage 6: road, street, roadway, bridge, hu-he; 7 stages: segments (small under the large road); 8 stages: number (robust modifier); stage 9: xx cells/buildings/companies.
On the basis of constructing a 9-level address structure, words representing the address hierarchy are used as word segmentation keywords, for example, words such as 'province, city, district, county, state, alliance' are used as word segmentation keywords. After the address to be encoded is obtained, the address to be encoded is segmented based on the segmentation keywords, at least one segmentation segment is obtained, and optionally, regular greedy matching is carried out on the address to be encoded by using the segmentation keywords, so that the segmentation segment is obtained. For example, the addresses to be encoded are: the method comprises the steps of (1) dividing words into 4 word segments, namely, the section of the Jinjiang area of the Chengdu city, the section of the Jinjiang area, and the section of the Du street, namely, the section of the Jinjiang area, and the section of the Jinjiang street, namely, the section of the Jinjiang area, the section of the Sichuan province.
S102, comparing each word segmentation with an administrative division comparison table in sequence, and taking the word segmentation existing in the administrative division comparison table as an address segmentation.
Optionally, the administrative division recognition is performed on each word segment, for example, each word segment is sequentially compared with an administrative division comparison table, and if any target word segment exists in the administrative division comparison table, the correct word segment of the target word segment is indicated. For example, if the division and segmentation are known to exist in the administrative division comparison table by inquiry in the division, for example, the division and segmentation is the administrative division address segmentation.
Further, since the regions with the same name exist in China, if any target word segmentation exists in the administrative division comparison table, judging whether an administrative region with the same name as the target word segmentation exists or not; if the administrative regions exist, determining address fragments according to the upper-level administrative region to which the administrative regions with the same name belong; and if the target word segmentation is not present, taking the target word segmentation as an address segmentation.
Further, after determining the address fragment, if it is detected that there is a change adjustment of an administrative division corresponding to a certain address fragment, the address fragment is replaced by the name after the administrative division is changed. For example, the address fragments are Guangdong, guangzhou, shift-on, and Xuan east, respectively, and the last address fragment is Guangdong, guangzhou, xuang, and Xuan east because the shift-on is called Huangpu.
Furthermore, since the key words of administrative division such as province and city and county are omitted in the address to be encoded, if the fact that a certain address fragment lacks fragments is determined according to the hierarchical structure information of the administrative division, the missing fragments are complemented, so that the accuracy of the subsequent query address is ensured.
It should be noted that after determining the address fragments according to the administrative division comparison table, the corresponding address fragments may be obtained based on the identification of the hierarchical structures such as villages and towns by the keywords such as villages and towns.
And S103, sequentially matching each address fragment with the nodes of the POI tree on the pre-constructed POI tree, and determining a target POI matched with the address to be coded according to a matching result.
Wherein the POI tree is composed of nodes representing an address hierarchy, and optionally, the operation of pre-constructing the POI tree includes: firstly, performing word segmentation processing on a POI file, wherein the processing process is similar to S101-S102, obtaining POI addresses in the POI file, and performing word segmentation on the POI addresses based on word segmentation keywords to obtain at least one word segmentation piece, wherein the word segmentation keywords comprise words representing an address hierarchical structure; and comparing each word segmentation and segmentation with an administrative division comparison table in sequence, and taking the word segmentation and segmentation existing in the administrative division comparison table as POI address segmentation. And then taking the POI address fragments as nodes, and assembling each node into a POI tree according to the hierarchical structure of the POI address fragments. For example, see fig. 1b, which shows a schematic structural diagram of a POI tree.
And on the basis of constructing the POI tree, sequentially matching each address fragment with the nodes of the POI tree, and determining a target POI matched with the address to be coded according to a matching result. It should be noted that, by sequentially matching the address fragments with the nodes of the POI, the problem that different address fragments interfere with each other can be avoided.
S104, encoding the address to be encoded according to the information of the target POI.
After the target POI is determined, an association relation between the target POI and the address to be encoded is established, information of the target POI is obtained, for example, the name and longitude and latitude information of the target POI are obtained, and the encoding operation of the address to be encoded is completed.
In the embodiment of the invention, the address to be encoded is segmented based on the segmentation keywords, and the obtained segmentation fragments are compared with administrative division, so that the address fragments are obtained, and the accuracy of the address segmentation can be improved. And by sequentially searching and matching each address fragment with the nodes of the POI tree, determining a target POI matched with the address to be coded according to the matching result, and because the address fragments are in hierarchical order and are sequentially matched with the POI tree, no interference exists among different address fragments, the problem of mutual identification interference of Chinese address fragments in text-based matching algorithms such as lucence is solved, and the accuracy of similarity matching of the address to be coded and the POI address is improved.
Example two
Fig. 2 is a schematic flow chart of an address coding method according to a second embodiment of the present invention, in which the optimization is performed based on the above embodiment, and an operation of obtaining address fragments by word segmentation by means of heavy segmentation is added if no address fragments are obtained, where the address coding method specifically includes:
s201, obtaining an address to be encoded, and segmenting the address to be encoded based on a segmentation keyword to obtain at least one segmentation segment, wherein the segmentation keyword comprises a word representing an address hierarchical structure.
S202, comparing each word segmentation with an administrative division comparison table in sequence, and if the word segmentation does not exist in the administrative division comparison table, combining any two adjacent word segmentation into a new word segmentation.
After obtaining at least one word segmentation on the basis of S201, if the comparison with the administrative division comparison table confirms that each word segmentation does not exist in the administrative division comparison table, the word segmentation result of S201 is inaccurate, and any two adjacent word segmentation fragments need to be combined into a new word segmentation fragment for judgment. Further, if the comparison confirms that only one word segmentation is no longer in the political region comparison table, the word segmentation and the last word segmentation adjacent to the word segmentation are combined into a new segmentation.
S203, selecting combinations of different numbers of words from the new word segmentation and segmentation, sequentially comparing the combinations with an administrative division comparison table, and taking the combinations existing in the administrative division comparison table as address segmentation.
For any new word segmentation, for example, selecting different numbers of words from the new word segmentation according to sequence as combined words, for example, selecting combined words consisting of 1 word, two words or three words, b sequentially comparing the obtained combined words with an administrative division comparison table, and taking the combination existing in the administrative division comparison table as an address segmentation.
S204, on a pre-constructed POI tree, sequentially matching each address fragment with a node of the POI tree, and determining a target POI matched with the address to be coded according to a matching result.
Wherein the POI tree is composed of nodes representing an address hierarchy.
S205, encoding the address to be encoded according to the information of the target POI.
In the embodiment of the invention, when the obtained word segmentation is inaccurate, the new word segmentation is obtained through recombination and is compared with the administrative division comparison table, so that the accuracy of determining the address segmentation is further improved, and the accuracy of address coding is further ensured.
Example III
Fig. 3 is a flow chart of an address coding method according to a third embodiment of the present invention, and the present implementation is optimized based on the above embodiment, where the address coding method specifically includes:
s301, obtaining an address to be encoded, and segmenting the address to be encoded based on a segmentation keyword to obtain at least one segmentation segment, wherein the segmentation keyword comprises a word representing an address hierarchical structure.
S302, comparing each word segmentation with an administrative division comparison table in sequence, and taking the word segmentation existing in the administrative division comparison table as an address segmentation.
In the embodiment of the invention, each address fragment is ordered according to an address hierarchical structure, for example, an address to be encoded is " city, county, barrel street east street 156 number", and the address fragments obtained by the above operation are sequentially ordered according to the address hierarchical structure: sichuan, chengdu, , , dongda, 156.
S303, matching the address fragments arranged at the head with the POI tree, marking the target node of the matched POI tree as a father node, and taking the next address fragment arranged at the head as the current address fragment.
S304, searching whether target child nodes matched with the current address fragments exist in all child nodes of the father node.
Alternatively, the current address fragment and the child node name are the same as the target child node in an accurate matching manner. In addition, the child node containing the longest identical character string with the current address fragment can be used as the target child node in a fuzzy matching mode.
S305, if a target child node exists, setting the parent node as a traversed node, and marking the target child node as the parent node.
And S306, sequentially taking each address fragment arranged after the current address fragment as the current address fragment, executing the operations from S304 to S305 until the last address fragment is matched with the POI node, and taking the finally matched POI node as a target POI.
The first address fragment is firstly matched with the POI tree of Sichuan, the matched node is used as a father node, the next address fragment is used as a current address fragment, whether a target child node which is matched with the current address fragment is found out from all child nodes of the father node, if so, the father node is marked as a traversed node, the target child node is marked as a new father node, the address fragment is used as the current address fragment, and the search is continuously judged from the child nodes of the new father node. Similarly, the subsequent address fragments can be sequentially judged and searched until the last address fragment is matched with the POI node, and the finally matched POI node is used as the target POI.
It should be noted that after the step S304, if there is no target child node matching with the current address fragment in all child nodes of the parent node, the embodiment of the present invention proposes a matching method for searching downwards and tracing upwards.
Aiming at the downward exploration matching method, if target child nodes matched with the current address fragments do not exist in all child nodes of the father node, if the target child nodes lack fragments in the address fragments obtained according to the address to be coded, determining whether all child nodes exist offspring nodes or not; if the descendant node exists, searching whether a target descendant node matched with the current address fragment exists in the descendant node or not; if the matched target offspring node exists, marking the superior node of the target offspring node as a traversed node, and taking the target offspring node as a father node; and then sequentially taking each address fragment arranged after the current address fragment as the current address fragment to continue the matching search.
If no matched target offspring node exists, the obtained address fragments are considered to be more than the corresponding address fragments, and therefore the upward backtracking searching and matching method is executed. Specifically, the method comprises the steps of backing to a node traversed by the previous stage of the father node, taking the node traversed by the previous stage as a new father node, judging whether all child nodes of the new father node have child nodes matched with the current address fragments, and if not, continuing to execute the operation of backing judgment until the POI node matched with the current address fragments is found. Then the next address fragment is arranged after the current address is judged.
S307, encoding the address to be encoded according to the information of the target POI.
In the embodiment of the invention, the address fragments are subjected to matching search on the POI tree to determine the target POI most similar to the address to be encoded, so that the problem of mutual recognition interference of Chinese address fragments in a text-based matching algorithm such as nonce is solved. In addition, when any address fragment cannot be matched with the number of matches, a search matching algorithm for downward exploration and upward backtracking is provided, so that the recognition rate and the accuracy of matching are improved.
Example IV
Fig. 4 is a schematic structural diagram of an address coding device in a fourth embodiment of the present invention, where the device is configured on an electronic apparatus. As shown in fig. 4, the apparatus includes:
the word segmentation module 401 acquires an address to be encoded, and segments the address to be encoded based on a word segmentation keyword to obtain at least one word segmentation slice, wherein the word segmentation keyword comprises a word representing an address hierarchical structure;
a first comparison module 402, configured to compare each word segment with an administrative division comparison table in sequence, and use the word segment existing in the administrative division comparison table as an address segment;
a search module 403, configured to sequentially match each address fragment with a node of the POI tree on a POI tree that is constructed in advance, and determine a target POI that matches the address to be encoded according to a matching result; wherein the POI tree is composed of nodes representing an address hierarchy;
and the encoding module 404 is configured to encode the address to be encoded according to the information of the target POI.
In the embodiment of the invention, the address to be encoded is segmented based on the segmentation keywords, and the obtained segmentation fragments are compared with administrative division, so that the address fragments are obtained, and the accuracy of the address segmentation can be improved. And by sequentially searching and matching each address fragment with the nodes of the POI tree, determining a target POI matched with the address to be encoded according to the matching result, improving the accuracy of similarity matching between the address to be encoded and the POI address, and solving the problem of mutual recognition interference of Chinese address fragments in text-based matching algorithms such as nonce.
Optionally, the comparison module is specifically configured to:
comparing each word segmentation and segmentation with an administrative division comparison table in sequence, and judging whether an administrative region with the same name as the target word segmentation and segmentation exists if any target word segmentation and segmentation exists in the administrative division comparison table;
if the administrative regions exist, determining address fragments according to the upper-level administrative region to which the administrative regions with the same name belong;
and if the target word segmentation is not present, taking the target word segmentation as an address segmentation.
Optionally, the apparatus further includes:
the combination module is used for combining any two adjacent word segmentation fragments into a new word segmentation fragment if the address fragments are not obtained;
and the second comparison module is used for selecting combinations of different numbers of words from the new word segmentation and segmentation, sequentially comparing the combinations with an administrative division comparison table, and taking the combinations existing in the administrative division comparison table as address segmentation.
Optionally, the apparatus further includes:
the replacing module is used for replacing the address fragment by the name after the administrative division is changed if the administrative division corresponding to the address fragment is detected to have change and adjustment;
and the filling module is used for filling the missing fragments if the fragments are determined to be missing in a certain address fragment according to the hierarchical structure information of the administrative division.
Optionally, the apparatus further includes a POI tree construction module, configured to:
the POI word segmentation unit is used for obtaining POI addresses in the POI file, and segmenting the POI addresses based on word segmentation keywords to obtain at least one word segmentation piece, wherein the word segmentation keywords comprise words representing an address hierarchical structure;
the comparison unit is used for comparing each word segmentation with the administrative division comparison table in sequence and taking the word segmentation existing in the administrative division comparison table as POI address segmentation;
and the assembling unit is used for taking the POI address fragments as nodes and assembling each node into a POI tree according to the hierarchical structure of the POI address fragments.
Optionally, the search module is specifically configured to:
s1, matching the address fragments arranged at the head with the POI tree, marking a target node of the matched POI tree as a father node, and taking the next address fragment arranged at the head after the address fragments as the current address fragment; wherein, each address fragment is ordered according to the address hierarchy;
s2, searching whether target child nodes matched with the current address fragments exist in all child nodes of the father node;
s3, if a target child node exists, setting the father node as a traversed node, and marking the target child node as the father node;
s4, sequentially taking each address fragment arranged after the current address fragment as the current address fragment, executing the operations from S2 to S3 until the last address fragment is matched with the POI node, and taking the POI node which is finally matched as the target POI.
Optionally, the search module is further configured to:
if the target child node matched with the current address fragment does not exist in all child nodes of the father node, determining whether each child node has a descendant node or not;
if the descendant node exists, searching whether a target descendant node matched with the current address fragment exists in the descendant node or not;
if the matched target offspring node exists, marking the superior node of the target offspring node as a traversed node, and taking the target offspring node as a father node;
if no matched target offspring node exists, returning to a node traversed by a previous stage of the father node, taking the node traversed by the previous stage as a new father node, judging whether all child nodes of the new father node exist child nodes matched with the current address fragments, and if not, continuing to execute the operation of returning judgment until the POI node matched with the current address fragments is found.
The address coding device provided by the embodiment of the invention can execute the address coding method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example five
Fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention. Fig. 5 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 5 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 5, the electronic device 12 is in the form of a general purpose computing electronic device. Components of the electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard disk drive"). Although not shown in fig. 5, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
The electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the electronic device 12, and/or any devices (e.g., network card, modem, etc.) that enable the electronic device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through a network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 over the bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing an address encoding method provided by an embodiment of the present invention, the method including:
obtaining an address to be encoded, and segmenting the address to be encoded based on a segmentation keyword to obtain at least one segmentation segment, wherein the segmentation keyword comprises a word representing an address hierarchical structure;
comparing each word segmentation and segmentation with an administrative division comparison table in sequence, and taking the word segmentation and segmentation existing in the administrative division comparison table as address segmentation;
sequentially matching each address fragment with nodes of the POI tree on a pre-constructed POI tree, and determining a target POI matched with the address to be coded according to a matching result; wherein the POI tree is composed of nodes representing an address hierarchy;
and encoding the address to be encoded according to the information of the target POI.
Example six
The sixth embodiment of the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an address encoding method as provided by the embodiments of the present invention, the method comprising:
obtaining an address to be encoded, and segmenting the address to be encoded based on a segmentation keyword to obtain at least one segmentation segment, wherein the segmentation keyword comprises a word representing an address hierarchical structure;
comparing each word segmentation and segmentation with an administrative division comparison table in sequence, and taking the word segmentation and segmentation existing in the administrative division comparison table as address segmentation;
sequentially matching each address fragment with nodes of the POI tree on a pre-constructed POI tree, and determining a target POI matched with the address to be coded according to a matching result; wherein the POI tree is composed of nodes representing an address hierarchy;
and encoding the address to be encoded according to the information of the target POI.
The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (9)

1. An address encoding method, the method comprising:
acquiring an address to be encoded, and performing regular greedy matching on the address to be encoded based on word segmentation keywords to obtain at least one word segmentation slice; wherein the word segmentation key words comprise words representing an address hierarchical structure;
comparing each word segmentation and segmentation with an administrative division comparison table in sequence, and judging whether an administrative region with the same name as the target word segmentation and segmentation exists if any target word segmentation and segmentation exists in the administrative division comparison table;
if the administrative regions exist, determining address fragments according to the upper-level administrative region to which the administrative regions with the same name belong;
if not, taking the target word segmentation as an address segmentation;
sequentially matching each address fragment with nodes of the POI tree on a pre-constructed POI tree, and determining a target POI matched with the address to be coded according to a matching result; wherein the POI tree is composed of nodes representing an address hierarchy;
and establishing an association relation between the target POI and the address to be encoded, and encoding the address to be encoded according to the information of the target POI.
2. The method according to claim 1, wherein the method further comprises:
if the address fragment is not obtained, combining any two adjacent fragments into a new fragment;
and selecting combinations of different numbers of words from the new word segmentation and segmentation, sequentially comparing the combinations with an administrative division comparison table, and taking the combinations existing in the administrative division comparison table as address segmentation.
3. The method of claim 1, wherein after determining the address shard, the method further comprises:
if the change and adjustment of the administrative division corresponding to a certain address fragment are detected, replacing the address fragment by the name of the administrative division after the change;
if the lack of the fragments in one address fragment is determined according to the hierarchical structure information of the administrative division, the missing fragments are filled in.
4. The method of claim 1, wherein the operation of pre-building the POI tree comprises:
acquiring a POI address in a POI file, and segmenting the POI address based on a segmentation keyword to obtain at least one segmentation segment, wherein the segmentation keyword comprises a word representing an address hierarchical structure;
comparing each word segmentation and segmentation with an administrative division comparison table in sequence, and taking the word segmentation and segmentation existing in the administrative division comparison table as POI address segmentation;
and taking the POI address fragments as nodes, and assembling each node into a POI tree according to the hierarchical structure of the POI address fragments.
5. The method according to claim 1, wherein sequentially matching each of the address fragments with a node of the POI tree on a pre-constructed POI tree, determining a target POI most similar to the address to be encoded according to a matching result, comprises:
s1, matching the address fragments arranged at the head with the POI tree, marking a target node of the matched POI tree as a father node, and taking the next address fragment arranged at the head after the address fragments as the current address fragment; wherein, each address fragment is ordered according to the address hierarchy;
s2, searching whether target child nodes matched with the current address fragments exist in all child nodes of the father node;
s3, if a target child node exists, setting the father node as a traversed node, and marking the target child node as the father node;
s4, sequentially taking each address fragment arranged after the current address fragment as the current address fragment, executing the operations from S2 to S3 until the last address fragment is matched with the POI node, and taking the POI node which is finally matched as the target POI.
6. The method of claim 5, wherein the method further comprises:
if the target child node matched with the current address fragment does not exist in all child nodes of the father node, determining whether each child node has a descendant node or not;
if the descendant node exists, searching whether a target descendant node matched with the current address fragment exists in the descendant node or not;
if the matched target offspring node exists, marking the superior node of the target offspring node as a traversed node, and taking the target offspring node as a father node;
if no matched target offspring node exists, returning to a node traversed by a previous stage of the father node, taking the node traversed by the previous stage as a new father node, judging whether all child nodes of the new father node exist child nodes matched with the current address fragments, and if not, continuing to execute the operation of returning judgment until the POI node matched with the current address fragments is found.
7. An address encoding device, the device comprising:
the word segmentation module is used for obtaining an address to be encoded, and performing regular greedy matching on the address to be encoded based on word segmentation keywords to obtain at least one word segmentation slice; wherein the word segmentation key words comprise words representing an address hierarchical structure;
the comparison module is used for comparing each word segmentation with an administrative division comparison table in sequence, and judging whether an administrative region with the same name as the target word segmentation exists or not if any target word segmentation exists in the administrative division comparison table; if the administrative regions exist, determining address fragments according to the upper-level administrative region to which the administrative regions with the same name belong; if not, taking the target word segmentation as an address segmentation;
the searching module is used for sequentially matching each address fragment with the node of the POI tree on the pre-constructed POI tree, and determining a target POI matched with the address to be coded according to a matching result; wherein the POI tree is composed of nodes representing an address hierarchy;
and the encoding module is used for establishing an association relation between the target POI and the address to be encoded and encoding the address to be encoded according to the information of the target POI.
8. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the address encoding method of any of claims 1-6.
9. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the address encoding method as claimed in any of claims 1-6.
CN201911194130.2A 2019-11-28 2019-11-28 Address coding method and device, electronic equipment and storage medium Active CN110990520B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911194130.2A CN110990520B (en) 2019-11-28 2019-11-28 Address coding method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911194130.2A CN110990520B (en) 2019-11-28 2019-11-28 Address coding method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110990520A CN110990520A (en) 2020-04-10
CN110990520B true CN110990520B (en) 2023-10-20

Family

ID=70087927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911194130.2A Active CN110990520B (en) 2019-11-28 2019-11-28 Address coding method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110990520B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069276B (en) * 2020-08-31 2024-03-08 平安科技(深圳)有限公司 Address coding method, address coding device, computer equipment and computer readable storage medium
CN112364635B (en) * 2020-11-30 2023-11-21 中国银行股份有限公司 Enterprise name duplicate checking method and device
CN112818665A (en) * 2021-01-29 2021-05-18 上海寻梦信息技术有限公司 Method and device for structuring address information, electronic equipment and storage medium
CN113076389A (en) * 2021-03-16 2021-07-06 百度在线网络技术(北京)有限公司 Article region identification method and device, electronic equipment and readable storage medium
CN113935293B (en) * 2021-12-16 2022-03-22 湖南四方天箭信息科技有限公司 Address splitting and complementing method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914544A (en) * 2014-04-03 2014-07-09 浙江大学 Method for quickly matching Chinese addresses in multi-level manner on basis of address feature words
CN106649464A (en) * 2016-09-26 2017-05-10 深圳市数字城市工程研究中心 Method of building Chinese address tree and device
CN109033086A (en) * 2018-08-03 2018-12-18 银联数据服务有限公司 A kind of address resolution, matched method and device
CN109344213A (en) * 2018-08-28 2019-02-15 浙江工业大学 A kind of Chinese Geocoding based on dictionary tree

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914544A (en) * 2014-04-03 2014-07-09 浙江大学 Method for quickly matching Chinese addresses in multi-level manner on basis of address feature words
CN106649464A (en) * 2016-09-26 2017-05-10 深圳市数字城市工程研究中心 Method of building Chinese address tree and device
CN109033086A (en) * 2018-08-03 2018-12-18 银联数据服务有限公司 A kind of address resolution, matched method and device
CN109344213A (en) * 2018-08-28 2019-02-15 浙江工业大学 A kind of Chinese Geocoding based on dictionary tree

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李新放等."K叉树地址的模糊匹配研究与实现".《测绘通报》.2018,(第undefined期),126-129. *

Also Published As

Publication number Publication date
CN110990520A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
CN110990520B (en) Address coding method and device, electronic equipment and storage medium
CN107656913B (en) Map interest point address extraction method, map interest point address extraction device, server and storage medium
WO2020228706A1 (en) Fence address-based coordinate data processing method and apparatus, and computer device
CN108628811B (en) Address text matching method and device
CN111291277A (en) Address standardization method based on semantic recognition and high-level language search
CN112256817A (en) Geocoding method, system, terminal and storage medium
CN111159974A (en) Address information standardization method and device, storage medium and electronic equipment
CN111625732A (en) Address matching method and device
CN108733810A (en) A kind of address date matching process and device
CN115470307A (en) Address matching method and device
CN111126422B (en) Method, device, equipment and medium for establishing industry model and determining industry
CN109522335B (en) Information acquisition method and device and computer readable storage medium
CN112069824B (en) Region identification method, device and medium based on context probability and citation
CN115658837A (en) Address data processing method and device, electronic equipment and storage medium
CN113535883B (en) Commercial venue entity linking method, system, electronic equipment and storage medium
CN114513550B (en) Geographic position information processing method and device and electronic equipment
CN113221558B (en) Express address error correction method and device, storage medium and electronic equipment
CN114792091A (en) Chinese address element analysis method and equipment based on vocabulary enhancement and storage medium
CN111597277B (en) Site aggregation method, device, computer equipment and medium in electronic map
CN111401051B (en) Express information analysis method and system
CN116414808A (en) Method, device, computer equipment and storage medium for normalizing detailed address
CN110609874B (en) Address entity coreference resolution method based on density clustering algorithm
CN116431625A (en) Positioning analysis method and device for geographic entity and computer equipment
CN112579713B (en) Address recognition method, address recognition device, computing equipment and computer storage medium
CN113568951A (en) Data mining and processing method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220920

Address after: 25 Financial Street, Xicheng District, Beijing 100033

Applicant after: CHINA CONSTRUCTION BANK Corp.

Address before: 25 Financial Street, Xicheng District, Beijing 100033

Applicant before: CHINA CONSTRUCTION BANK Corp.

Applicant before: Jianxin Financial Science and Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant