CN110147418B - Method and system for judging whether address is standardized or not and address is standardized - Google Patents

Method and system for judging whether address is standardized or not and address is standardized Download PDF

Info

Publication number
CN110147418B
CN110147418B CN201910314344.2A CN201910314344A CN110147418B CN 110147418 B CN110147418 B CN 110147418B CN 201910314344 A CN201910314344 A CN 201910314344A CN 110147418 B CN110147418 B CN 110147418B
Authority
CN
China
Prior art keywords
address
level
acquisition
standard
hit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910314344.2A
Other languages
Chinese (zh)
Other versions
CN110147418A (en
Inventor
周成祖
洪亚杰
陈志飞
连志阳
王海滨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Pico Information Co Ltd
Original Assignee
Xiamen Meiya Pico Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Pico Information Co Ltd filed Critical Xiamen Meiya Pico Information Co Ltd
Priority to CN201910314344.2A priority Critical patent/CN110147418B/en
Publication of CN110147418A publication Critical patent/CN110147418A/en
Application granted granted Critical
Publication of CN110147418B publication Critical patent/CN110147418B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The method comprises the steps of cleaning and converting an acquisition address, converting Chinese numbers in the acquisition address into Arabic numbers, splitting the acquisition address according to address levels, performing hit analysis on each level after the acquisition address is split and addresses in a standard address library, obtaining an address score of the acquisition address by using hit conditions of each level, preset level weight coefficients and influence coefficients among the levels, and judging whether the acquisition address completes address standardization or not by comparing the address score of the acquisition address with a preset score threshold. The invention can greatly improve the efficiency of address matching, perfectly solves the problems that the existing system is too old and the collection address is disordered and cannot be expanded, is more convenient for improving the level of the address system and is beneficial to the unification of address services.

Description

Method and system for judging whether address is standardized or not and address is standardized
Technical Field
The invention relates to the technical field of communication addresses, in particular to a method and a system for judging whether an address is standardized or not and standardizing the address.
Background
At present, Geographic Information Systems (GIS) are applied more and more widely in various systems, and the application of the communication address technology is related to the life information of people, such as postal letter transactions, banking systems, information management systems of public security departments and the like, which all need to store, identify, update and the like address data. The non-standard or non-standard communication address brings much inconvenience to people. For example: because of the non-standard user address, the postal system needs to invest a lot of manpower and material resources to solve the problem of how to identify the correct and standard communication address, otherwise, the situations of wrong delivery and repeated delivery will be caused, and with the increase of postal service data, the investment will be correspondingly enlarged, and the situation will be hard to bear for the postal system. The banking system also has the problem of non-standard user addresses, and if the banking system does not perform standardized processing on the user addresses, the banking system faces the phenomena of low processing speed, low efficiency and disordered data of the banking system along with the increase of the data volume of the banking system and the incompatibility of different databases, so that the loss and the loss of customers are easily caused.
Due to the fact that large quantities of non-spatial data (without longitude and latitude coordinates) exist in databases of some old systems, when the old systems are modified and upgraded, functions such as drawing marking and area analysis need to be added, but due to the fact that address collection is not standard, the probability of matching with standard addresses is low. Upgrading the system is very difficult. Therefore, it is necessary to determine whether or not the address is standardized and to perform the address standardization process.
Disclosure of Invention
The invention provides a method and a system for judging whether an address is standardized or not and judging whether the address is standardized or not.
In one aspect, the present invention provides a method for determining whether an address is standardized, including the following steps:
s1: cleaning and converting the acquisition address, wherein Chinese numbers in the acquisition address are converted into Arabic numbers;
s2: splitting the collected address according to the address hierarchy;
s3: performing hit analysis on each level after splitting the acquired address and the address in the standard address library respectively;
s4: obtaining the address score of the acquisition address by using the hit condition of each hierarchy, a preset hierarchy weight coefficient and an influence coefficient among the hierarchies;
s5: and comparing the address score of the acquisition address with a preset score threshold value to judge whether the acquisition address completes address standardization.
In an alternative embodiment, the weighting factor of each level in step S4 is different, and the weight increases with increasing level. The calculation of the address score is more reasonable and accurate through the setting of the weight coefficients of different levels.
In an alternative embodiment, the influence coefficient between the levels is embodied as the influence degree of hits of other levels and the standard library on the weight coefficient of the current level, and the larger the interval between the other levels and the current level is, the smaller the influence coefficient is. By means of the setting of the influence coefficient, the accuracy of the address score result is further enhanced.
In an alternative embodiment, the weighting factor and the influence factor are both in the form of percentages. The finally obtained address score is also in a percentile form, and the judgment of the address score can be more intuitively carried out by virtue of the percentile form.
In an alternative embodiment, the address score of the collection address specifically includes the sum of the address scores of each level. The total address score of the acquisition address is obtained through the sum of the address scores of all the levels, and the address matching hit condition of the acquisition address can be objectively reflected.
In an alternative embodiment, the address score of a hierarchy is embodied as the product of the hits of the hierarchy and the overall coefficient of the hierarchy, embodied as the sum of the weighting coefficient and the impact coefficient of the hierarchy. The product of the hit and the synthesis coefficient reflects the address score condition of the hierarchy.
In an alternative embodiment, the address score is calculated by the formula
Figure BDA0002032586050000021
Wherein s isiRepresents a weight coefficient, x, corresponding to the i leveliIndicating whether an address field of the i level hits, xjWhether the address field of the j level is hit or not is represented, and the value is as follows: 0 represents a miss, 1 represents a hit, aijWeight coefficient s for i level indicating whether j level hitiThe influence coefficient of (c).
In an alternative embodiment, the hit analysis of step S3 specifically includes: and matching the road name and the house number in a standard library to obtain a hit standard address set Rn, and performing hit analysis in the standard address set Rn according to the room number of the acquired address. The accurate matching mode of the road name and the house number can be used for efficiently carrying out the standardized judgment of the acquisition address,
in an alternative embodiment, the hit analysis of step S3 further includes obtaining a standard address set Pn hit by matching with the road name by splitting, and extracting the numbers before the house number and/or the house number in the collected address and the standard address set Pn for hit analysis. Under the condition that the accurate matching mode of the road name and the house number cannot be adopted, the road name is used for matching and hitting, and subsequent matching is carried out according to the extraction of numbers before the house number and/or the house number, so that the effective hit matching of the collected address can be ensured, and the missing condition is prevented.
In an alternative embodiment, step S5 is to compare the calculated address score S with a preset score threshold LS, where if S > LS, it indicates that the normalization of the acquisition address is successful, and if S < LS, it indicates that the normalization of the acquisition address is failed. The standardization of the acquisition address is judged by means of a preset score threshold value, so that the hit condition of the acquisition address can be intuitively obtained.
According to another aspect of the present invention, a method for address standardization is provided, which includes the above method for judging whether the address is standardized, and the method further includes mapping the collected address judged to be successfully standardized with the address in the standard address base.
According to a third aspect of the invention, a computer-readable storage medium is proposed, on which one or more computer programs are stored, which when executed by a computer processor perform the above-mentioned method.
According to a fourth aspect of the present invention, a system for determining whether an address is standardized is provided, the system comprising:
the standard address library is configured to be used as a standard for comparing the acquisition addresses;
the address cleaning and converting unit is configured for cleaning and converting the acquisition address according to a standard address language;
the splitting unit is configured for splitting the acquisition address according to the address hierarchy;
the computing unit is configured to compute and obtain an address score of the acquisition address by using the hit condition of each hierarchy, the weight coefficient of each hierarchy and the influence coefficient among the hierarchies;
and the judging unit is configured for judging whether the acquisition address is standardized or not.
According to a fifth aspect of the present invention, an address standardization system is provided, including the above system for determining whether an address is standardized, and further including a mapping unit configured to establish a mapping relationship between a successfully standardized acquisition address and an address in a standard address base.
The invention splits the acquisition address according to the hierarchy of the standard address library, performs hit matching analysis on each hierarchy respectively, calculates the address score of each hierarchy according to the hit condition, the weight coefficient of the preset hierarchy and the influence coefficient between the hierarchies, finally obtains the address score of the acquisition address, and judges whether the acquisition address is standardized or not by comparing and analyzing the address score with the set score threshold. The matching efficiency is greatly improved, the problems that the existing system is too old, the acquisition address is disordered and cannot be expanded are solved, the system is convenient to modify and upgrade, and the unification of address services is facilitated.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Other embodiments and many of the intended advantages of embodiments will be readily appreciated as they become better understood by reference to the following detailed description. Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow diagram of a method of determining address normalization according to one embodiment of the invention;
FIG. 2 is a flow diagram of a method for address score calculation in accordance with a specific embodiment of the present invention;
FIG. 3 is a flow diagram of a method of address normalization for one embodiment of the invention;
FIG. 4 is a system diagram of determining address normalization according to one embodiment of the invention;
FIG. 5 is a system diagram of address normalization for one embodiment of the present invention;
fig. 6 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates a method for determining address normalization according to an embodiment of the present invention, which includes the following steps:
s101: cleaning and converting the acquisition address, wherein Chinese numbers in the acquisition address are converted into Arabic numbers; for example, the 'region of city' eight-thirteen times 'is converted into' region of city '83 times', and the Chinese number of the collection address is converted into the Arabic number, so that matching hit can be more easily performed, and the judgment efficiency is improved.
In a specific embodiment, the converting of the collecting address further comprises converting the local address language habit into the expression of a standard address library, such as local address languages of domino, unit, fiddle and the like, and uniformly converting into the house number.
S102: and splitting the acquisition address according to the address hierarchy. Splitting according to 8 levels of province, city, district or county, street or town, community or village, road name, house number and room number to form a standard address library, wherein the address of each level is represented by a corresponding code, and the province corresponds to a code 1, the city corresponds to a code 2, the district or county corresponds to a code 3, the street or town corresponds to a code 4, the community or village corresponds to a code 5, the road name corresponds to a code 6, the house number corresponds to a code 7 and the room number corresponds to a code 8.
S103: and performing hit analysis on each level after the acquired address is split and the address in the standard address library. And analyzing and comparing each hierarchy of the split acquisition address with an address hierarchy in the standard address base respectively, and judging whether the address of each hierarchy hits the address hierarchy in the standard address base or not.
S104: and obtaining the address score of the acquisition address by using the hit condition of each hierarchy, a preset hierarchy weight coefficient and an influence coefficient among the hierarchies. The hit condition is specifically expressed as a matching condition of each split level of the acquisition address and a corresponding level in the standard address base, and if the level of the acquisition address is consistent with the corresponding level in the standard address base, the acquisition address hits in the level; if not, a miss is indicated.
In a specific embodiment, the preset hierarchy weight coefficient specifically includes a basic weight and a fluctuation weight of each hierarchy, the basic weight of each hierarchy is consistent, and the fluctuation weight increases with the increase of the hierarchy code, so that the larger the hierarchy code, the larger the preset weight coefficient corresponding to the hierarchy is. It is understood that the address information corresponding to the later hierarchy is more specific and precise, and the corresponding occupied weight ratio is larger.
As an example, the weighting factor corresponding to each level may be set to 1% -8% according to the increment of the level code, the weighting factor corresponding to the province level corresponding to the code 1 may be 1%, the weighting factor corresponding to the city level corresponding to the code 2 may be 1.5%, the weighting factor corresponding to the city level corresponding to the code 3 may be 2.5%, the weighting factor corresponding to the city level corresponding to the code 4 may be 3%, the weighting factor corresponding to the city level corresponding to the code 5 may be 4%, the weighting factor corresponding to the city level corresponding to the code 6 may be 5%, the weighting factor corresponding to the room number corresponding to the code 7 may be 6%, and the weighting factor corresponding to the city level corresponding to the code 8 may be 7%. It should be appreciated that the weighting coefficients of the layers can be adjusted according to the needs of the actual application, so as to meet the needs of different application scenarios.
The influence coefficient between the levels is expressed as the influence degree of the hit condition of other levels and the standard library on the weight coefficient of the current level, and the larger the interval between other levels and the current level is, the smaller the influence coefficient of whether the hit condition has the impact on the current level is. The weight coefficient and the influence coefficient are introduced to calculate the address score, the influence degree of whether each hierarchy hits on the result and the mutual influence of whether different hierarchies hit are considered more comprehensively, the address score of the acquisition address is calculated comprehensively, and the judgment result can be more reasonable and accurate.
In a particular embodiment, the address score is calculated by the formula
Figure BDA0002032586050000051
Wherein s isiRepresents a weight coefficient, x, corresponding to the i leveliIndicating whether an address field of the i level hits, xjWhether the address field of the j level is hit or not is represented, and the value is as follows: 0 represents a miss, 1 represents a hit, aijThe weighting factor s representing whether the j-th level hits on the i-th leveliCoefficient of influence of siAnd aijThe address score S obtained by final calculation is also in a percentage form, so that the subsequent judgment of the standardization degree is more convenient.
As an example, the influence coefficient between the levels may be set to be 1.75% between two adjacent levels, 1.5% between two levels with an influence coefficient of 1 level therebetween, and 1.25%. the influence coefficient of 6 levels with an influence coefficient of 0.25%. For example a12Weighting factor s for level 1 to indicate whether level 2 hits1Has an influence coefficient of 1.75%, a18Weighting factor s for level 1 to indicate whether level 8 hits1The coefficient of influence of (a) is 0.25%. It should be appreciated that the influence coefficient between the layers can be adjusted according to the needs of the actual application, so as to meet the requirements of different application scenarios.
S105: and comparing the address score of the acquisition address with a preset score threshold value to judge whether the acquisition address completes address standardization. The hit condition of the acquisition address can be visually judged through the comparison of the preset score threshold values, and the standardized judgment of the acquisition address is completed.
As an example, a preset score threshold may be set to 80%, when the address score calculated by the collecting address is greater than the preset score threshold, the collecting address completes the standardized matching, and otherwise, when the address score calculated by the collecting address is less than the preset score threshold, the standardized matching of the collecting address fails. It should be understood that the preset score threshold value can be set to be a reasonable value according to actual use requirements, so that judgment of various scenes is facilitated.
FIG. 2 shows a flow diagram of a method of address score calculation according to an embodiment of the invention, the method comprising the steps of:
step S201, forming a standard address library.
In this embodiment, the method specifically includes the steps of splitting the standard address in step S2011, splitting the standard address according to 8 levels, namely province, city, district or county, street or town, community or village, road name, house number and room number, and forming a standard address library after splitting; step S2012 is also included for collecting language habits, such as local address languages like domino, unit, fiddle, etc., and providing a conversion basis for the address languages with relevant language habits in the subsequent collected addresses.
Step S202 acquires address processing. And splitting the acquisition address in the same way according to 8 levels of province, city, district or county, street or town, community or village, road name, house number and room number.
In this embodiment, the method further includes two steps before the processing of the collection address: step S2021, address cleaning conversion is carried out according to language habits, and the cleaning conversion of the collected address comprises the steps of converting the local address language habits into expressions of a standard address library, for example, local address languages such as domino, unit, fingering and the like, and uniformly converting the local address languages into house number; step S2022, converting the Chinese numbers and the Arabic numbers, and converting the Chinese numbers in the acquisition address into the Arabic numbers; for example, the 'region of city' eight-thirteen times 'is converted into' region of city '83 times', and the Chinese number of the collection address is converted into the Arabic number, so that matching hit can be more easily performed, and the judgment efficiency is improved.
Step S203 mode building. The construction mode 2031 specifically includes two modes: the mode 1 is an accurate matching mode of the road name and the house number, and the mode 2 is a matching mode of the road name.
And step S204, mode judgment is carried out, according to the splitting result of the acquired address, mode judgment is carried out, and a proper mode is selected for calculating the address score.
In a specific embodiment, step S2051 obtains n standard addresses Rn associated with the road name + house number, and sets a road name field value x61, house number x71. And taking the collected address as a calculation element, taking the road name and the number of the house number in each address as a mode 1, taking the collected address as a calculation source based on a multi-mode matching algorithm to match the modes of all standard addresses, and obtaining n standard addresses Rn related to the road name and the number of the house number as a calculation result. For example, when the collection address is "x city x district x mansion x way 22, 1101 room", and the standard address is "x city x district x way 22, 1101 room", the "x way 22" is set as the pattern 1, and the matching result is all the addresses below the x way 22. Setting x6=1;x7=1。
Step S2052, the Rn is subjected to descending order and circulation according to the house number; then the acquisition address is matched using step S2053 canonical match.
Step S2054 judges whether the matching is performed or not, and if the matching is performed, the next step S2055 is executed to set x8Then, the process proceeds to a final step S207, where an address score S is calculated according to a formula; if not, the process goes directly to step S207 to calculate the address score S according to the formula.
In another mode, when other interfering address languages exist between the road name and the house number, so that the mode 1 cannot be used for accurate matching, the method enters the mode 2, namely step S2061 obtains n standard addresses Pn related to the road name, and sets the field value x of the road name61. And taking the road name in each address as a mode, matching the modes of all standard addresses by using the acquired address as a calculation source based on a multi-mode matching algorithm, and obtaining n standard addresses Pn related to the road name as a calculation result. For example, when the collection address is "room 1101" in the form of a region "from" city "to" mansion 22 "and the standard address is" room 1101 "in the form of a region" from "city" to "road 22", the "road" is used as a pattern, and the matching result is all the addresses below the road. Splitting road name, if splitting succeeds, filling road name into address segment, setting x6=1。
And step S2062, doorplate number extraction. And extracting the number before the first appearing number in the acquisition address by using the regular expression "\ \ d {1, } as the house number of the acquisition address for matching.
Step S2063 judges whether a house number is extracted or not, if the house number is extracted, the next step S2064 is carried out to set a house number field value x71 is ═ 1; and splitting and extracting the house number, and if the house number is not extracted, directly entering the step S207 to calculate the address score S according to a formula.
On the basis of extracting the house number, the step S2065 is performed to judge whether the house number is extracted or not, and if the house number is extracted, the next step S2066 is performed to set the field value x of the house number8If the room number is not extracted, the process proceeds directly to step S207, where the address score S is calculated according to the formula. Extracting room number in the acquisition address by using a regular expression "\ \ d {1, } room | \ \ d {1, } unit, circulating the address of Pn, filling a house number into an address section if the address ends with a road name + N + number, and setting x71 is ═ 1; if the address ends with "road name + N + number + S + room", the room number is filled into the address field, setting x8=1。
In a specific embodiment, if the acquisition address has "-" or "", the acquisition address is split into two addresses, and each address repeats the above steps for matching. For example, ". star region,. street,. 363 & 'three floor". sj "is divided into". star region,. street,. 363 &' three floor "and". star region,. street,. 369, three floor "and two addresses are matched.
As an example, the above pattern matching is performed on the collection address, the successful matching is performed to the house number, the chamber number is not matched, and the formula is used
Figure BDA0002032586050000081
All of the level 1 addresses to the level 7 addresses are hits, x1To x7All 1, level 8 address miss, x8When the value of the address is 0, the value of the address S is 79% according to the formula, and if the preset value threshold is 80%, it may be determined that the normalization of the acquisition address fails.
With continued reference to FIG. 3, a flow diagram of one embodiment of a method of address normalization in accordance with the present application is shown. The method comprises a step S301 of judging whether the address is standardized, wherein the step S301 of judging whether the address is standardized is all the steps described above; step S302 is also included to establish a mapping relation between the collection address judged to be successfully standardized and the address in the standard address base. The collection address judged to be successful in standardization is associated with the address in the standard address base corresponding to the collection address by establishing a mapping relation, so that the address base is more perfect.
Embodiments of the present invention also relate to a computer-readable storage medium having stored thereon one or more computer programs which, when executed by a computer processor, implement the above method. The computer program comprises program code for performing the method illustrated in the flow chart. It should be noted that the computer readable medium of the present application can be a computer readable signal medium or a computer readable medium or any combination of the two.
With further reference to fig. 4, as an implementation of the method described above for the embodiment shown in fig. 1, the present application provides an embodiment of a system for determining whether an address is standardized, where the embodiment of the system corresponds to the embodiment of the method shown in fig. 1, and the system is particularly applicable to various electronic devices.
As shown in fig. 4, the system for determining whether an address is standardized according to this embodiment includes a standard address library 401, an address cleansing conversion unit 402, a splitting unit 403, a calculating unit 404, and a determining unit 405. :
the standard address library 401 is configured to be used as a standard for comparing the collected addresses, split the collected addresses according to 8 levels of province, city, district or county, street or town, community or village, road name, house number and room number according to the existing standard addresses, collect local address language habits, form a standard address library, and provide a data base for subsequent matching analysis.
An address cleaning and converting unit 402 configured to perform cleaning and conversion on the acquisition address according to a standard address language; the accuracy of data calculation is ensured, corresponding problems caused by manual operation can be effectively avoided, and the investment of human resources is greatly reduced.
The splitting unit 403 is configured to split the acquisition address according to an address hierarchy, specifically, 8 hierarchies of province, city, district or county, street or town, community or village, road name, house number, and room number.
A calculating unit 404, configured to calculate an address score of the acquisition address by using a hit condition of each of the hierarchies, the weight coefficient of the hierarchy, and an influence coefficient between the hierarchies. The hit condition is specifically expressed as a matching condition of each split level of the acquisition address and a corresponding level in the standard address base, and if the level of the acquisition address is consistent with the corresponding level in the standard address base, the acquisition address hits in the level; if not, a miss is indicated.
In a particular embodiment, the address score is calculated by the formula
Figure BDA0002032586050000091
Wherein s isiRepresents a weight coefficient, x, corresponding to the i leveliIndicating whether an address field of the i level hits, xjWhether the address field of the j level is hit or not is represented, and the value is as follows: 0 represents a miss, 1 represents a hit, aijThe weighting factor s representing whether the j-th level hits on the i-th leveliCoefficient of influence of siAnd aijThe address score S obtained by final calculation is also in a percentage form, so that the subsequent judgment of the standardization degree is more convenient. The fluctuation weight increases with the increase of the code of the hierarchy, so that the larger the code of the hierarchy is, the larger the preset weight coefficient corresponding to the hierarchy is. The influence coefficient between the levels is expressed as the influence degree of the hit condition of other levels and the standard library on the weight coefficient of the current level, and the larger the interval between other levels and the current level is, the smaller the influence coefficient of whether the hit condition has the impact on the current level is. The weight coefficient and the influence coefficient are introduced to calculate the address score, the influence degree of each hierarchy on the result and the mutual influence of each hierarchy on the result are considered more comprehensively, and the collection address is calculated comprehensivelyThe address score can make the judgment result more reasonable and accurate.
A judging unit 405 configured to judge whether the acquisition address is standardized. The hit condition of the acquisition address can be visually judged through the comparison of the preset score threshold values, and the standardized judgment of the acquisition address is completed.
The elements of the system may be implemented in dedicated hardware, in general purpose programmable logic devices, or as a combination of hardware and software.
With further reference to fig. 5, as an implementation of the method described above for the embodiment shown in fig. 3, the present application provides an embodiment of a system for address normalization, which corresponds to the embodiment of the method shown in fig. 3, and which is particularly applicable to various electronic devices.
As shown in fig. 5, the system for address normalization of the present embodiment includes a system 501 for determining whether an address is normalized and a mapping unit 502.
A system 501 for determining whether an address is standardized, the system comprising all the elements of the system for determining whether an address is standardized in fig. 4.
The mapping unit 502 is configured to establish a mapping relationship between the successfully standardized acquisition address and an address in the standard address library. The collection address judged as successful standardization is associated with the address in the standard address base corresponding to the collection address by establishing a mapping relation, so that the system of the address base is more perfect and the address service is convenient to unify.
The elements of the system may be implemented in dedicated hardware, in general purpose programmable logic devices, or as a combination of hardware and software.
Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing a terminal device or server of an embodiment of the present application. The terminal device or the server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a Liquid Crystal Display (LCD) and the like and a speaker and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
According to the method and the system for judging whether the address is standardized or not and the address is standardized, the acquisition address is cleaned and converted, the acquisition address is split according to the address levels, each level after the acquisition address is split is subjected to hit analysis with the address in a standard address base, the hit condition of each level, the preset level weight coefficient and the influence coefficient among the levels are utilized, the address value of the acquisition address is obtained through formula calculation, and the value is compared with the preset value threshold value to judge whether the address standardization of the acquisition address is finished or not. Meanwhile, the collection address which is successfully judged to be standardized is mapped with the standard address library, the whole process of address standardization is realized, the efficiency of address matching is greatly improved, the defects that the collection address is disordered and has no standard and the probability of matching with the standard address is low in the existing system are perfectly overcome, the problem that the collection address is disordered and cannot be expanded is solved, the upgrading and the reconstruction of an address system are facilitated, and the unification of address services is favorably realized.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. A method for judging whether an address is standardized, which is characterized by comprising the following steps:
s1: cleaning and converting an acquisition address, wherein Chinese numbers in the acquisition address are converted into Arabic numbers;
s2: splitting the acquisition address according to an address hierarchy;
s3: performing hit analysis on each level after the acquired address is split and the address in a standard address library;
s4: obtaining an address score of the acquisition address by using a hit condition of each of the hierarchies, a preset hierarchy weight coefficient and an influence coefficient between the hierarchies, wherein the address score of a hierarchy is expressed as a product of the hit condition of the hierarchy and a comprehensive coefficient of the hierarchy, the comprehensive coefficient is expressed as a sum of the weight coefficient and the influence coefficient of the hierarchy, and a calculation formula of the address score is
Figure FDA0003504932800000011
Wherein s isiRepresents a weight coefficient, x, corresponding to the i leveliIndicating whether an address field of the i level hits, xjWhether the address field of the j level is hit or not is represented, and the value is as follows: 0 represents a miss, 1 represents a hit, aijThe weighting factor s representing whether the j-th level hits on the i-th leveliThe influence coefficient of (a);
s5: and comparing the address score of the acquisition address with a preset score threshold value to judge whether the acquisition address completes address standardization.
2. The method according to claim 1, wherein the weighting factor of each of the levels in the step S4 is different, and the weighting increases as the level increases.
3. The method as claimed in claim 1, wherein the influence coefficient between the levels is expressed as a degree of influence of hits of other levels and the standard library on the weight coefficient of the current level, and the larger the interval between the other levels and the current level is, the smaller the influence coefficient is.
4. The method of claim 3, wherein the weighting factor and the influence factor are in percentage form.
5. The method as claimed in claim 1, wherein the address score of the collection address specifically includes a sum of the address scores of each of the levels.
6. The method as claimed in claim 1, wherein the hit analysis of step S3 specifically includes: and matching the road name and the house number in the standard library to obtain a hit standard address set Rn, and performing hit analysis in the standard address set Rn according to the room number of the acquired address.
7. The method as claimed in claim 6, wherein the hit analysis of step S3 further includes splitting to obtain a standard address set Pn hit by matching with the road name, and extracting numbers before the house number and/or the house number in the collected address and the standard address set Pn for hit analysis.
8. The method as claimed in claim 1, wherein the step S5 is specifically performed by comparing the calculated address score S with a preset score threshold LS, where if S > LS, it indicates that the address is successfully normalized, and if S < LS, it indicates that the address is not successfully normalized.
9. A method of address normalization comprising the method of any one of claims 1 to 8, wherein the method further comprises mapping the collection addresses determined to be successfully normalized to addresses within the standard address base.
10. A computer-readable storage medium having one or more computer programs stored thereon, which when executed by a computer processor perform the method of any one of claims 1 to 9.
11. A system for determining whether an address is standardized, the system comprising:
the standard address library is configured to be used as a standard for comparing the acquisition addresses;
the address cleaning and converting unit is configured for cleaning and converting the acquisition address according to a standard address language;
the splitting unit is configured to split the acquisition address according to an address hierarchy;
a calculating unit configured to calculate an address score of the acquisition address by using a hit condition of each of the hierarchies, a weight coefficient of the hierarchy, and an influence coefficient between the hierarchies, where the address score of the hierarchy is expressed as a product of the hit condition of the hierarchy and a comprehensive coefficient of the hierarchy, the comprehensive coefficient is expressed as a sum of the weight coefficient and the influence coefficient of the hierarchy, and a calculation formula of the address score is
Figure FDA0003504932800000021
Figure FDA0003504932800000022
Wherein s isiRepresents a weight coefficient, x, corresponding to the i leveliIndicating whether an address field of the i level hits, xjWhether the address field of the j level is hit or not is represented, and the value is as follows: 0 represents a miss, 1 represents a hit, aijThe weighting factor s representing whether the j-th level hits on the i-th leveliThe influence coefficient of (a);
and the judging unit is configured for judging whether the acquisition address is standardized or not.
12. An address standardization system comprising a system for judging whether an address is standardized as claimed in claim 11, characterized in that the system further comprises a mapping unit configured to map the acquisition address successfully standardized with an address in the standard address base.
CN201910314344.2A 2019-04-18 2019-04-18 Method and system for judging whether address is standardized or not and address is standardized Active CN110147418B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910314344.2A CN110147418B (en) 2019-04-18 2019-04-18 Method and system for judging whether address is standardized or not and address is standardized

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910314344.2A CN110147418B (en) 2019-04-18 2019-04-18 Method and system for judging whether address is standardized or not and address is standardized

Publications (2)

Publication Number Publication Date
CN110147418A CN110147418A (en) 2019-08-20
CN110147418B true CN110147418B (en) 2022-04-29

Family

ID=67588536

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910314344.2A Active CN110147418B (en) 2019-04-18 2019-04-18 Method and system for judging whether address is standardized or not and address is standardized

Country Status (1)

Country Link
CN (1) CN110147418B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866083B (en) * 2019-12-04 2023-11-07 国网浙江省电力有限公司 Address auditing method for electric power standard structured address library
CN111144930A (en) * 2019-12-10 2020-05-12 云南电网有限责任公司信息中心 Structured address analysis application system for regional power grid electricity utilization customer
CN111160011B (en) * 2019-12-17 2023-06-27 浙江大华技术股份有限公司 Organization unit standardization method, device, equipment and storage medium
CN111222345A (en) * 2020-01-15 2020-06-02 合肥慧图软件有限公司 Place name address visualization analysis method based on semantic word segmentation technology
CN113836357B (en) * 2021-10-12 2022-09-16 北京商越网络科技有限公司 Address database data processing method and control system based on text similarity calculation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096024A (en) * 2016-06-24 2016-11-09 北京京东尚科信息技术有限公司 The appraisal procedure of address similarity and apparatus for evaluating
CN107145577A (en) * 2017-05-08 2017-09-08 上海东方网络金融服务有限公司 Address standardization method, device, storage medium and computer

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110307484A1 (en) * 2010-06-11 2011-12-15 Nitin Dinesh Anand System and method of addressing and accessing information using a keyword identifier
CN105740257A (en) * 2014-12-09 2016-07-06 朗新科技股份有限公司 Method and system for establishing standard geographic name address base
CN109002544B (en) * 2018-07-25 2020-11-06 北京金堤科技有限公司 Data processing method, device and computer readable medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096024A (en) * 2016-06-24 2016-11-09 北京京东尚科信息技术有限公司 The appraisal procedure of address similarity and apparatus for evaluating
CN107145577A (en) * 2017-05-08 2017-09-08 上海东方网络金融服务有限公司 Address standardization method, device, storage medium and computer

Also Published As

Publication number Publication date
CN110147418A (en) 2019-08-20

Similar Documents

Publication Publication Date Title
CN110147418B (en) Method and system for judging whether address is standardized or not and address is standardized
CN109492103B (en) Label information acquisition method and device, electronic equipment and computer readable medium
CN108446470B (en) Medical facility accessibility analysis method based on vehicle trajectory data and population distribution
CN110781413B (en) Method and device for determining interest points, storage medium and electronic equipment
CN111564223B (en) Infectious disease survival probability prediction method, and prediction model training method and device
WO2021174812A1 (en) Data cleaning method and apparatus for profile, and medium and electronic device
CN112465231B (en) Method, apparatus and readable storage medium for predicting regional population health status
CN112084793B (en) Semantic recognition method, device and readable storage medium based on dependency syntax
US20230124389A1 (en) Model Determination Method and Electronic Device
Biljecki et al. Raise the roof: Towards generating LOD2 models without aerial surveys using machine learning
CN112668238A (en) Rainfall processing method, device, equipment and storage medium
CN111161884A (en) Disease prediction method, device, equipment and medium for unbalanced data
CN110674208B (en) Method and device for determining position information of user
CN116028702A (en) Learning resource recommendation method and system and electronic equipment
CN111383766A (en) Computer data processing method, device, medium and electronic equipment
CN111161238A (en) Image quality evaluation method and device, electronic device, and storage medium
CN115689106A (en) Method, device and equipment for quantitatively identifying regional space structure of complex network view angle
CN111125272B (en) Regional characteristic acquisition method, regional characteristic acquisition device, computer equipment and medium
CN105824871A (en) Picture detecting method and equipment
CN111261165A (en) Station name identification method, device, equipment and storage medium
CN114707729B (en) Population quantity prediction method and device for old people, computer equipment and storage medium
CN108197811A (en) Engineering tracking and device
CN114895982B (en) Application calling method, system, equipment and storage medium based on user information
CN115204438A (en) Town administrative village population scale prediction method, system, equipment and medium
US20230419195A1 (en) System and Method for Hierarchical Factor-based Forecasting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant