CN109961259B - Address standardization processing method and equipment - Google Patents

Address standardization processing method and equipment Download PDF

Info

Publication number
CN109961259B
CN109961259B CN201910246155.6A CN201910246155A CN109961259B CN 109961259 B CN109961259 B CN 109961259B CN 201910246155 A CN201910246155 A CN 201910246155A CN 109961259 B CN109961259 B CN 109961259B
Authority
CN
China
Prior art keywords
address
name
level
normalized
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910246155.6A
Other languages
Chinese (zh)
Other versions
CN109961259A (en
Inventor
张伟丰
任亚彬
刘佛生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhongtongji Network Technology Co Ltd
Original Assignee
Shanghai Zhongtongji Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhongtongji Network Technology Co Ltd filed Critical Shanghai Zhongtongji Network Technology Co Ltd
Priority to CN201910246155.6A priority Critical patent/CN109961259B/en
Publication of CN109961259A publication Critical patent/CN109961259A/en
Application granted granted Critical
Publication of CN109961259B publication Critical patent/CN109961259B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The application relates to an address standardization processing method and equipment, wherein the method comprises the following steps: acquiring an initial address; preprocessing the initial address to obtain a preprocessed address; extracting at least one address name recorded in a standardized address information base from the preprocessed addresses, wherein the standardized address information base is established in advance, standard names and level information of all levels of national addresses and alias names of all levels of addresses are recorded, all levels of addresses comprise provincial addresses, city addresses and prefecture addresses, and level information is used for indicating the upper-lower level relation of all levels of addresses; determining the address name of the lowest level in the extracted address names according to the level information corresponding to the extracted address names, and determining the standard name of at least one level of address corresponding to the address name according to the normalized address information base; the standard name is used as an address after the normalization processing. According to the method and the device, the standard names of the national provincial level address, the municipal level address and the prefecture level address can be obtained according to the initial address.

Description

Address standardization processing method and equipment
Technical Field
The present application relates to the field of address processing technologies, and in particular, to an address standardization processing method and device.
Background
In the logistics industry, the mailing address and the receiving address of goods to be shipped are of critical importance, which identify where the goods are coming from and where they are going to be shipped.
In the related technology, after receiving goods mailed by a client, an express company inputs a mailing address and a receiving address of the goods into a logistics system through code scanning, the logistics system judges the quantity of the goods sent from one place to another according to the mailing address and the receiving address, and then the express company arranges express vehicles according to the quantity of the goods. Due to different cognitive habits of customers on addresses and different address describing modes, a large number of addresses which are not described in standard and clear exist in a logistics system. Some addresses comprise province, city, county/district and detailed addresses, and some addresses comprise province, county/district and detailed addresses; for example, the address of the golden aster No. 3 of Fugou county of Zhou city, Henan and the address of the golden aster No. 3 of Fugou county, Henan represent the same place, but the latter lacks of Zhou city. Some addresses are full names of provinces and urban districts, and some addresses are short names, for example, the full name address is the Fugou county of Zhou city in Henan province, and the short name address is the Fugou county of Zhou city in Henan province. Some addresses are addresses before administrative districts of city counties are re-divided, for example, the original fir yang county of the Anhui province Anqing city is divided into Anhui province copper tomb cities, but some customers do not know that the addresses are still the fir yang counties of the Anhui province Anhui city when writing the addresses. The logistics system is enabled to have larger errors when judging the quantity of goods sent from one place to another according to the recorded sending address and receiving address, so that the vehicle routing planning is improper, and the logistics cost and the time cost are increased.
Disclosure of Invention
To overcome, at least to some extent, the problems in the related art, the present application provides an address standardization processing method and apparatus.
According to a first aspect of embodiments of the present application, there is provided an address normalization processing method, including:
acquiring an initial address;
preprocessing the initial address to obtain a preprocessed address;
extracting at least one address name recorded in a normalized address information base from the preprocessed addresses, wherein the normalized address information base is established in advance, standard names and level information of each level of national addresses and alias names of each level of addresses are recorded in the normalized address information base, and each level of addresses comprise: the system comprises provincial-level addresses, city-level addresses and prefecture-level addresses, wherein the level information is used for indicating the superior-inferior relation of the addresses of all levels;
determining the address name of the lowest level in the extracted address names according to the level information corresponding to the extracted address names, and determining the standard name of at least one level address corresponding to the address name of the lowest level according to the normalized address information base;
and taking the standard name of at least one level address corresponding to the address name of the lowest level as the address of the initial address after the standardization processing, and outputting the address after the standardization processing.
Optionally, the preprocessing the initial address includes:
and deleting punctuation marks and spaces in the initial address.
Optionally, the extracting, from the preprocessed addresses, at least one address name recorded in a normalized address information base includes:
extracting a first address name from the preprocessed address;
after extracting a first address name, if the address name recorded in a normalized address information base cannot be extracted again from the preprocessed address, determining the first address name as the extracted address name; alternatively, the first and second electrodes may be,
after extracting a first address name, a second address name can be extracted from the preprocessed address, and if the second address name is a county-level address, the first address name and the second address name are determined as the extracted address name; alternatively, the first and second electrodes may be,
after extracting a first address name, a second address name can be extracted from the preprocessed address, and if the second address name is not a district-level address and a third address name can be extracted from the preprocessed address, the first address name, the second address name, and the third address name are determined as the extracted address names;
the first address name, the second address name and the third address name are all address names recorded in a normalized address information base, the first address name is a superior address of the second address name, and the second address name is a superior address of the third address name.
Optionally, the extracting a first address name from the preprocessed address includes:
extracting a first group of characters according to a first extraction length from a first starting point of the preprocessed address, wherein the position of the first starting point and the first extraction length are both updatable values, the initial position of the first starting point is the position of a first character of the preprocessed address, the end position of the first starting point is the position of a last character of the preprocessed address, and the minimum value, the step value and the maximum value of the first extraction length are preset values;
and when the first group of characters is the address name recorded in the normalized address information base, determining the first group of characters as the first address name.
Optionally, the extracting, from the preprocessed addresses, at least one address name recorded in a normalized address information base further includes:
extracting a second group of characters according to a second extraction length from a second starting point of the preprocessed address, wherein the position of the second starting point and the second extraction length are updatable values, the initial position of the second starting point is the position of the next character of the first address name, the end position of the second starting point is the position of the last character of the preprocessed address, and the minimum value, the step value and the maximum value of the second extraction length are preset values;
and when the second group of characters are address names recorded in a normalized address information base and the first address name is determined to be the upper address of the second group of characters according to the normalized address information base, determining the second group of characters as the second address name.
Optionally, the extracting, from the preprocessed addresses, at least one address name recorded in a normalized address information base further includes:
extracting a third group of characters according to a third extraction length from a third starting point of the preprocessed address, wherein the position of the third starting point and the third extraction length are updatable values, the initial position of the third starting point is the position of a character next to the name of the second address, the end position of the third starting point is the position of the last character of the preprocessed address, and the minimum value, the stepping value and the maximum value of the third extraction length are preset values;
and when the third group of characters are address names recorded in a normalized address information base and the second address name is determined to be the upper address of the third group of characters according to the normalized address information base, determining the third group of characters as the third address name.
Optionally, the determining, according to the normalized address information base, a standard name of at least one level address corresponding to the address name of the lowest level includes:
when the address name of the lowest level is a district-county level address, obtaining a standard name of the district-county level address corresponding to the address name of the lowest level, a standard name of a city-level address corresponding to the address name of the lowest level and a standard name of a provincial level address corresponding to the address name of the lowest level according to the normalized address information base; alternatively, the first and second electrodes may be,
when the address name of the lowest level is a city address, obtaining a standard name of the city address corresponding to the address name of the lowest level and a standard name of a provincial address corresponding to the address name of the lowest level according to the normalized address information base; alternatively, the first and second electrodes may be,
and when the address name of the lowest level is a provincial address, obtaining a standard name of the provincial address corresponding to the address name of the lowest level according to the normalized address information base.
Optionally, the level information is digital information, where a digital bit number of the level information corresponding to the lower address is greater than a digital bit number of the level information corresponding to the upper address, and the level information corresponding to the lower address includes level information corresponding to the upper address.
Optionally, the obtaining the initial address includes:
identifying express information of the express delivery to obtain a delivery address and an addressee, wherein the express information is filled by a user;
and respectively taking the mailing address and the receiving address as the initial addresses.
According to a second aspect of embodiments of the present application, there is provided an address normalization processing apparatus, including:
a processor, and a memory coupled to the processor;
the memory is used for storing a computer program;
the processor is configured to invoke and execute the computer program in the memory to perform the method as follows:
acquiring an initial address;
preprocessing the initial address to obtain a preprocessed address;
extracting at least one address name recorded in a normalized address information base from the preprocessed addresses, wherein the normalized address information base is established in advance, standard names and level information of each level of national addresses and alias names of each level of addresses are recorded in the normalized address information base, and each level of addresses comprise: the system comprises provincial-level addresses, city-level addresses and prefecture-level addresses, wherein the level information is used for indicating the superior-inferior relation of the addresses of all levels;
determining the address name of the lowest level in the extracted address names according to the level information corresponding to the extracted address names, and determining the standard name of at least one level address corresponding to the address name of the lowest level according to the normalized address information base;
and taking the standard name of at least one level address corresponding to the address name of the lowest level as the address of the initial address after the standardization processing, and outputting the address after the standardization processing.
Optionally, the preprocessing the initial address includes:
and deleting punctuation marks and spaces in the initial address.
Optionally, the extracting, from the preprocessed addresses, at least one address name recorded in a normalized address information base includes:
extracting a first address name from the preprocessed address;
after extracting a first address name, if the address name recorded in a normalized address information base cannot be extracted again from the preprocessed address, determining the first address name as the extracted address name; alternatively, the first and second electrodes may be,
after extracting a first address name, a second address name can be extracted from the preprocessed address, and if the second address name is a county-level address, the first address name and the second address name are determined as the extracted address name; alternatively, the first and second electrodes may be,
after extracting a first address name, a second address name can be extracted from the preprocessed address, and if the second address name is not a district-level address and a third address name can be extracted from the preprocessed address, the first address name, the second address name, and the third address name are determined as the extracted address names;
the first address name, the second address name and the third address name are all address names recorded in a normalized address information base, the first address name is a superior address of the second address name, and the second address name is a superior address of the third address name.
Optionally, the extracting a first address name from the preprocessed address includes:
extracting a first group of characters according to a first extraction length from a first starting point of the preprocessed address, wherein the position of the first starting point and the first extraction length are both updatable values, the initial position of the first starting point is the position of a first character of the preprocessed address, the end position of the first starting point is the position of a last character of the preprocessed address, and the minimum value, the step value and the maximum value of the first extraction length are preset values;
and when the first group of characters is the address name recorded in the normalized address information base, determining the first group of characters as the first address name.
Optionally, the extracting, from the preprocessed addresses, at least one address name recorded in a normalized address information base further includes:
extracting a second group of characters according to a second extraction length from a second starting point of the preprocessed address, wherein the position of the second starting point and the second extraction length are updatable values, the initial position of the second starting point is the position of the next character of the first address name, the end position of the second starting point is the position of the last character of the preprocessed address, and the minimum value, the step value and the maximum value of the second extraction length are preset values;
and when the second group of characters are address names recorded in a normalized address information base and the first address name is determined to be the upper address of the second group of characters according to the normalized address information base, determining the second group of characters as the second address name.
Optionally, the extracting, from the preprocessed addresses, at least one address name recorded in a normalized address information base further includes:
extracting a third group of characters according to a third extraction length from a third starting point of the preprocessed address, wherein the position of the third starting point and the third extraction length are updatable values, the initial position of the third starting point is the position of a character next to the name of the second address, the end position of the third starting point is the position of the last character of the preprocessed address, and the minimum value, the stepping value and the maximum value of the third extraction length are preset values;
and when the third group of characters are address names recorded in a normalized address information base and the second address name is determined to be the upper address of the third group of characters according to the normalized address information base, determining the third group of characters as the third address name.
Optionally, the determining, according to the normalized address information base, a standard name of at least one level address corresponding to the address name of the lowest level includes:
when the address name of the lowest level is a district-county level address, obtaining a standard name of the district-county level address corresponding to the address name of the lowest level, a standard name of a city-level address corresponding to the address name of the lowest level and a standard name of a provincial level address corresponding to the address name of the lowest level according to the normalized address information base; alternatively, the first and second electrodes may be,
when the address name of the lowest level is a city address, obtaining a standard name of the city address corresponding to the address name of the lowest level and a standard name of a provincial address corresponding to the address name of the lowest level according to the normalized address information base; alternatively, the first and second electrodes may be,
and when the address name of the lowest level is a provincial address, obtaining a standard name of the provincial address corresponding to the address name of the lowest level according to the normalized address information base.
Optionally, the level information is digital information, where a digital bit number of the level information corresponding to the lower address is greater than a digital bit number of the level information corresponding to the upper address, and the level information corresponding to the lower address includes level information corresponding to the upper address.
Optionally, the obtaining the initial address includes:
identifying express information of the express delivery to obtain a delivery address and an addressee, wherein the express information is filled by a user;
and respectively taking the mailing address and the receiving address as the initial addresses.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
by obtaining an initial address; preprocessing the initial address to obtain a preprocessed address; extracting at least one address name recorded in a normalized address information base from the preprocessed addresses, wherein the normalized address information base is established in advance; because the standardized address information base records the standard name and the level information of each level of the national address and the alias name of each level of the national address, each level of the national address comprises: the system comprises provincial-level addresses, city-level addresses and prefecture-level addresses, wherein the level information is used for indicating the superior-inferior relation of the addresses of all levels; thus, the address name of the lowest level among the extracted address names can be determined based on the level information corresponding to the extracted address names; then, determining the standard name of at least one level address corresponding to the address name of the lowest level according to the normalized address information base; and taking the standard name of at least one level address corresponding to the address name of the lowest level as the address of the initial address after the standardization processing, and outputting the address after the standardization processing. By the method, whether different names exist in the initial, whether city-level address names are wrongly written or not or whether the attributive regions are wrongly written or not can be used, the national provincial address standard name, the city-level address standard name and the standard name of the district-level address can be extracted from the initial address, and the quantity of goods sent from one region to another region can be accurately obtained from massive addresses, so that vehicle routing can be optimized, and logistics cost and time cost are greatly reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a flowchart illustrating an address normalization processing method according to an exemplary embodiment.
Fig. 2 is a schematic structural diagram illustrating an address normalization processing apparatus according to another exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of methods and apparatus consistent with certain aspects of the present application, as detailed in the appended claims.
Fig. 1 is a flowchart illustrating an address normalization processing method according to an exemplary embodiment.
As shown in fig. 1, the address normalization processing method provided in this embodiment includes the following steps:
step S11, acquiring an initial address;
step S12, preprocessing the initial address to obtain a preprocessed address;
in this step, since the obtained initial address may include punctuation marks such as pause marks, commas, and spaces, the initial address needs to be preprocessed, that is, the punctuation marks and the spaces are deleted, so as to obtain an address only containing chinese characters.
Step S13, extracting at least one address name recorded in a normalized address information base from the preprocessed addresses, where the normalized address information base is pre-established, the normalized address information base records standard names and level information of each level of national addresses and alias names of each level of addresses, and each level of addresses includes: the system comprises provincial-level addresses, city-level addresses and prefecture-level addresses, wherein the level information is used for indicating the superior-inferior relation of the addresses of all levels;
for example, shanghai 1-shanghai city 11-qingpu district (qingpu) 111, where "shanghai" is a provincial address, "shanghai" is a city-level address, "qingpu district" is a district-level address, and "qingpu" is an alias name of the district-level address "qingpu district"; "1", "11", and "111" are level information, "1" is used to indicate "shanghai" as a provincial address, "11" is used to indicate "shanghai city" as a city-level address, "111" is used to indicate "kups district (kups)" as a prefecture-level address, and "1" indicates an upper level of "11" and "11" indicates an upper level of "111".
Step S14, determining the address name of the lowest level among the extracted address names according to the level information corresponding to the extracted address names, and determining the standard name of at least one level address corresponding to the address name of the lowest level according to the normalized address information base;
since at least one address name extracted in the preprocessed address in step S13 is already recorded in the normalized address information base, the level information corresponding to the address name can be found in the normalized address information base by the extracted address name.
For example, at least one address name extracted from the preprocessed addresses is shanghai-qingpu, the level information corresponding to the shanghai is determined to be "1" and the level information corresponding to the qingpu is determined to be "111" according to the normalized address information base, and the address corresponding to the "111" is determined to be a county-level address, that is, the address name of the lowest level, that is, the name of the county-level address according to the level information; however, according to the normalized address information base, it is possible to determine a standard name of the prefecture-level address as "Qingpu district", then determine a standard name of an address whose previous level address is "11", i.e., the city-level address "Shanghai city", according to the level information "111", and then determine a standard name of an address whose previous level address is "1", i.e., the provincial address "Shanghai", according to "11"; thus, the standard name of at least one level of address is obtained: Shanghai-Shanghai City-Qingpu district.
Step S15, using the standard name of at least one level address corresponding to the lowest level address name as the address after the standardization process of the initial address, and outputting the address after the standardization process.
In the embodiment, the initial address is obtained; preprocessing the initial address to obtain a preprocessed address; extracting at least one address name recorded in a normalized address information base from the preprocessed addresses, wherein the normalized address information base is established in advance; because the standardized address information base records the standard name and the level information of each level of the national address and the alias name of each level of the national address, each level of the national address comprises: the system comprises provincial-level addresses, city-level addresses and prefecture-level addresses, wherein the level information is used for indicating the superior-inferior relation of the addresses of all levels; thus, the address name of the lowest level among the extracted address names can be determined based on the level information corresponding to the extracted address names; then, determining the standard name of at least one level address corresponding to the address name of the lowest level according to the normalized address information base; and taking the standard name of at least one level address corresponding to the address name of the lowest level as the address of the initial address after the standardization processing, and outputting the address after the standardization processing. By the method, whether different names exist in the initial, whether city-level address names are wrongly written or not or whether the attributive regions are wrongly written or not can be used, the national provincial address standard name, the city-level address standard name and the standard name of the district-level address can be extracted from the initial address, and the quantity of goods sent from one region to another region can be accurately obtained from massive addresses, so that vehicle routing can be optimized, and logistics cost and time cost are greatly reduced.
Further, the preprocessing the initial address includes:
and deleting punctuation marks and spaces in the initial address.
Further, the extracting, from the preprocessed addresses, at least one address name recorded in a normalized address information base includes:
extracting a first address name from the preprocessed address;
after extracting a first address name, if the address name recorded in a normalized address information base cannot be extracted again from the preprocessed address, determining the first address name as the extracted address name; alternatively, the first and second electrodes may be,
after extracting a first address name, a second address name can be extracted from the preprocessed address, and if the second address name is a county-level address, the first address name and the second address name are determined as the extracted address name; alternatively, the first and second electrodes may be,
after extracting a first address name, a second address name can be extracted from the preprocessed address, and if the second address name is not a district-level address and a third address name can be extracted from the preprocessed address, the first address name, the second address name, and the third address name are determined as the extracted address names;
the first address name, the second address name and the third address name are all address names recorded in a normalized address information base, the first address name is a superior address of the second address name, and the second address name is a superior address of the third address name.
For example, the address after the preprocessing is "houbei province baoding city competitive show area", and the record in the normalized address information base is "houbei province (houbei) 2-baoding city (baoding) 22-competitive show area (competitive show) 222", where "houbei province", "baoding city" and "competitive show area" are respectively the standard names of provincial level address, city level address and district level address, and "houbei", "baoding" and "competitive show" are respectively the alias names of provincial level address, city level address and district level address, and "2", "22" and "222" respectively represent provincial level address, city level address and district level address. When the first extracted address name is "north river", the "north river" is an alias name of a provincial address and is recorded in the normalized address information base, but the provincial address name recorded in the normalized address information base, namely the "north river province", can be extracted again from the preprocessed address, and then the "north river province" is determined as the extracted address name. Then, after the 'Hebei province' is extracted, the name of a second address, namely 'baoding city', can be extracted from the preprocessed address, after the 'baoding city' is judged not to be a county-level address, the name of a third address, namely 'competitive show district', can be extracted continuously, and finally the 'Hebei province-baoding city-competitive show district' is determined to be the extracted address name.
For another example, the address after the preprocessing is the "competitive show area in Hebei province", the first address name "Hebei province" and the second address name "competitive show area" are extracted from the address, and since the "competitive show area" is a county-level address, the third address name does not need to be extracted, and the "Hebei province-competitive show area" is directly determined as the extracted address name.
Further, the extracting the first address name from the preprocessed address includes:
extracting a first group of characters according to a first extraction length from a first starting point of the preprocessed address, wherein the position of the first starting point and the first extraction length are both updatable values, the initial position of the first starting point is the position of a first character of the preprocessed address, the end position of the first starting point is the position of a last character of the preprocessed address, and the minimum value, the step value and the maximum value of the first extraction length are preset values;
and when the first group of characters is the address name recorded in the normalized address information base, determining the first group of characters as the first address name.
Wherein the minimum value of the first extraction length is 1, the step value is 1, and the maximum value is 12.
For example, the address after preprocessing is "facing yang in beijing city", and it is recorded as "beijing 3-beijing city 33-facing yang district (facing yang) 333" in the normalized address information base. Starting a first extraction length of 1, and extracting a character from a first north character to obtain north, wherein the north is not an address name recorded in a normalized address information base; therefore, the first extraction length is increased by 1 to become 2, and two words "beijing", which are address names recorded in the normalized address information base, are extracted from the first "north" word, and thus "beijing" is determined as the first address name.
Further, the extracting, from the preprocessed addresses, at least one address name recorded in a normalized address information base further includes:
extracting a second group of characters according to a second extraction length from a second starting point of the preprocessed address, wherein the position of the second starting point and the second extraction length are updatable values, the initial position of the second starting point is the position of the next character of the first address name, the end position of the second starting point is the position of the last character of the preprocessed address, and the minimum value, the step value and the maximum value of the second extraction length are preset values;
and when the second group of characters are address names recorded in a normalized address information base and the first address name is determined to be the upper address of the second group of characters according to the normalized address information base, determining the second group of characters as the second address name.
Wherein the minimum value of the second extraction length is 1, the step value is 1, and the maximum value is 12.
For example, the address after preprocessing is "facing yang in beijing", and the first address name extracted is "beijing". At the beginning, the second extraction length is 1, then the second start point position is the position of "north" after the first address name "beijing", one character "north" is extracted, which is not the address name recorded in the normalized address information base, then the second extraction length is added by 1, two characters "beijing" are extracted again from the "north" character, which is the address name recorded in the normalized address information base, but the level information corresponding to the first address name "beijing" is all "3", then the first address name "beijing" is not the upper level address of the second group character "beijing", therefore, the second extraction length is added by 1, three characters are extracted again, the second group character "beijing city" is obtained again, which is the address name recorded in the normalized address information base, and the level information corresponding to the second group character is "33", so that the determination can be made, the first address name "beijing" is an upper-level address name of the second group character "beijing city", and therefore, the second group character "beijing city" is determined as the second address name.
Further, the extracting, from the preprocessed addresses, at least one address name recorded in a normalized address information base further includes:
extracting a third group of characters according to a third extraction length from a third starting point of the preprocessed address, wherein the position of the third starting point and the third extraction length are updatable values, the initial position of the third starting point is the position of a character next to the name of the second address, the end position of the third starting point is the position of the last character of the preprocessed address, and the minimum value, the stepping value and the maximum value of the third extraction length are preset values;
and when the third group of characters are address names recorded in a normalized address information base and the second address name is determined to be the upper address of the third group of characters according to the normalized address information base, determining the third group of characters as the third address name.
Wherein the minimum value of the third extraction length is 1, the step value is 1, and the maximum value is 12.
For example, the preprocessed address is "beijing city facing yang", the first address name "beijing" and the second address name "beijing city" are extracted, and the second address name "beijing city" is not a district-level address name, so that a third address name may be extracted.
Specifically, one character "heading" is extracted as a third group of characters starting from the position "heading" of the next character of the second address name "beijing city", which is not an address name recorded in the normalized address information base, and then the third extraction length is increased by 1, two characters "heading for the sun" are extracted as a third group of characters starting from the "heading" character, which is an address name already recorded in the normalized address information base, and the level information corresponding to the third group of characters "heading for the sun" is "333", whereby it can be determined that the second address name "beijing city" is an upper-level address name of the third group of characters "heading for the sun", and therefore, the third group of characters "heading for the sun" can be determined as the third address name.
It should be noted that before determining the second address name and the third address name, it is necessary to determine whether the first address name is an upper level name of the second group of characters, determine the second group of characters as the second address name when the first address name is the upper level name of the second group of characters, determine whether the second address name is the upper level address name of the third group of characters, and determine the third group of characters as the third address name when the second address name is the upper level address name of the third group of characters. The purpose is to ensure the accuracy of the standard name of at least one level of address finally output. For example, two addresses of "Heilongjiang province (Heilongjiang) 4-peony river city (peony river) 44-Xian district (Xian) 444" and "Shaanxi province (Shaanxi) 5-Xian city (Xian) 55", a district-county address of the Heilongjiang province "Xian" and a city-level address of the Shaanxi province are duplicate names, if the first address name in the address of the Xian district of the peony river city of the Heilongjiang province is extracted as "Heilongjiang province", the second address name is "peony river city", and the third address name is "Xian", then if the address name of the peony river city is not determined to be "Xian" address name of the Xian ", the most probable third address name of the Xian" is extracted as a superior address of the Shaanxi province. Therefore, only if the upper-level and lower-level relations are determined, the accurate standard name of the three-level address corresponding to the initial address can be finally obtained. When the extracted second address name "peony river city" is not the upper address of the third address name "xi ' and the third extraction length is updated according to the step value, that is, the third extraction length is extracted again after 1 is added to obtain the third group of characters" xi ' an area ", and then whether the second address name" peony river city "is the upper address name of the" xi ' an area "is judged again to obtain the accurate address name finally.
Further, the determining a standard name of the at least one level address corresponding to the address name of the lowest level according to the normalized address information base includes:
when the address name of the lowest level is a district-county level address, obtaining a standard name of the district-county level address corresponding to the address name of the lowest level, a standard name of a city-level address corresponding to the address name of the lowest level and a standard name of a provincial level address corresponding to the address name of the lowest level according to the normalized address information base; alternatively, the first and second electrodes may be,
when the address name of the lowest level is a city address, obtaining a standard name of the city address corresponding to the address name of the lowest level and a standard name of a provincial address corresponding to the address name of the lowest level according to the normalized address information base; alternatively, the first and second electrodes may be,
and when the address name of the lowest level is a provincial address, obtaining a standard name of the provincial address corresponding to the address name of the lowest level according to the normalized address information base.
For example, at least one address name extracted from the preprocessed addresses is "shanghai-kups", the level information corresponding to "shanghai" can be determined as "1" and the level information corresponding to "kups" is "111" according to the normalized address information base, and the address corresponding to "111" can be determined as a prefecture-level address, that is, the lowest-level address name, that is, the "kups" is a prefecture-level address name according to the level information; however, according to the normalized address information base, it is possible to determine a standard name of the prefecture-level address as "Qingpu district", then determine a standard name of an address whose previous level address is "11", i.e., the city-level address "Shanghai city", according to the level information "111", and then determine a standard name of an address whose previous level address is "1", i.e., the provincial address "Shanghai", according to "11"; thus, the standard name of at least one level of address is obtained: Shanghai-Shanghai City-Qingpu district.
Further, the level information is digital information, wherein the number of digits of the level information corresponding to the lower address is greater than the number of digits of the level information corresponding to the upper address, and the level information corresponding to the lower address includes the level information corresponding to the upper address.
For example, shanghai 1-shanghai city 11-qingpu district (qingpu) 111, where "shanghai" is a provincial address, "shanghai" is a city-level address, "qingpu district" is a district-level address, and "qingpu" is an alias name of the district-level address "qingpu district"; "1", "11", and "111" are level information, "1" is used to indicate "shanghai" as a provincial address, "11" is used to indicate "shanghai city" as a city-level address, "111" is used to indicate "kups district (kups)" as a prefecture-level address, and "1" indicates an upper level of "11" and "11" indicates an upper level of "111".
It is understood that the level information may also be alphabetical information, such as "A-AA-AAA"; where "A" may be used to represent a provincial address, "AA" may represent a city-level address, and "AAA" may represent a prefecture-level address.
Further, the obtaining the initial address includes:
identifying express information of the express delivery to obtain a delivery address and an addressee, wherein the express information is filled by a user;
and respectively taking the mailing address and the receiving address as the initial addresses.
Further, the method provided by this embodiment further includes:
when the first group of characters are not the address names recorded in the normalized address information base, updating the first extraction length according to the step value of the first extraction length, re-extracting the first group of characters, and judging whether the first group of characters are recorded in the normalized address information base;
when the first group of characters are not the address names recorded in the normalized address information base when the first extraction length reaches the maximum value, updating the position of the first starting point to be the next character position of the current position, re-extracting the first group of characters from the minimum value of the first extraction length, and judging whether the first group of characters are recorded in the normalized address information base or not;
and outputting analysis failure information when the position of the first starting point is updated to the end position of the first starting point and the first group of characters are not address names recorded in the normalized address information base.
Further, the method provided by this embodiment further includes:
when the second group of characters are not the address names recorded in the normalized address information base, updating the second extraction length according to the step value of the second extraction length and re-extracting the second group of characters, judging whether the second group of characters are recorded in the normalized address information base, and judging whether the first address name is the superior address of the second group of characters when the second group of characters are recorded in the normalized address information base;
when the second extraction length reaches the maximum value, the second group of characters are not address names recorded in a normalized address information base, or when the first address name is not a superior address of the second group of characters, the position of the second starting point is updated to be the next character position of the current position, the second group of characters are extracted again from the minimum value of the second extraction length, whether the second group of characters are recorded in the normalized address information base or not is judged, and whether the first address name is the superior address of the second group of characters or not is judged when the second group of characters are recorded in the normalized address information base;
and when the position of the second starting point is updated to the end position of the second starting point, the second group of characters are not address names recorded in a normalized address information base, or the first address names are not superior addresses of the second group of characters, outputting analysis failure information.
Further, the method provided by this embodiment further includes:
when the third group of characters is not the address name recorded in the normalized address information base, updating the third extraction length according to the step value of the third extraction length, re-extracting the third group of characters, judging whether the third group of characters is recorded in the normalized address information base, and judging whether the second address name is the superior address of the third group of characters when the third group of characters is recorded in the normalized address information base;
when the third extraction length reaches the maximum value, the third group of characters are not address names recorded in a normalized address information base, or the second address names are not upper-level addresses of the third group of characters, the position of the third starting point is updated to be the next character position of the current position, the third group of characters are extracted again from the minimum value of the third extraction length, whether the third group of characters are recorded in the normalized address information base or not is judged, and whether the second address names are upper-level addresses of the third group of characters or not is judged when the third group of characters are recorded in the normalized address information base;
and when the position of the third starting point is updated to the end position of the third starting point, and the third group of characters are not address names recorded in a normalized address information base, or the second address names are not superior addresses of the third group of characters, outputting analysis failure information.
In the embodiment, by the method, no matter whether the initial name has a different name, whether the city-level address name is wrongly written or whether the home region is wrongly written, the national provincial address standard name, the city-level address standard name and the standard name of the district-level address can be extracted from the initial address, and the quantity of goods sent from one region to another region can be accurately obtained from massive addresses, so that the vehicle route can be optimized, and the logistics cost and the time cost are greatly reduced. Furthermore, the other methods are used for a set of standard names, alias names and upper and lower level information of national provincial level addresses, municipal level addresses, prefecture and county and addresses.
For better understanding of the present application, the following describes the overall process of address normalization as follows:
1. address preprocessing: remove special symbols, such as: comma, period, etc.
2. Extracting the first two-level names in the address;
2.1 initialization: starting from the ith word of the address, i > is 1 and i < address length, if i > is address length, jump to step 5.2;
2.2 intercepting the address: j words are truncated from i, denoted as word1, j > 2 and j < > 12. When j >12, i ═ i +1, then return to step 2.1;
2.3 judging legitimacy of provincial and urban areas: word1 is judged to be a standard provincial name or alias: if not, j is j +1, and then the step 2.2 is returned;
2.4 intercept the address again: k words are cut out of i + j, denoted word2, k > 2 and k < > 12. When k >12, j ═ j +1, then return to step 2.2;
2.5, judging the legitimacy of the provincial and urban areas: judging that word2 is a standard provincial name or alias, if not, k is k +1, and then returning to step 2.4;
2.6 upper and lower level judgment of the provincial and urban areas: judging that word1 is the upper level of word2 according to the ids of word1 and word2, if not, k is k +1, and then returning to step 2.4; if yes, go to step 3.
3. Extracting a third-level name in the address;
3.1, judging: check that word2 is county level, if yes, go to step 4;
3.2 intercepting the address: intercepting m words from i + j + k, recording as word3, wherein m > is 2 and m < > is 12, and if m >12, entering a step 5.2;
3.3 judging legitimacy of provincial and urban areas: word3 is judged to be a standard provincial name or alias: if not, m is m +1, and then the step 3.2 is returned;
3.4, judging the upper level and the lower level of the provincial and urban areas: judging that word2 is the upper level of word3 according to the ids of word2 and word3, if not, m is m +1, and then returning to step 3.2; if yes, go to step 4.
4. And (3) completion:
4.1 if the district is extracted from the address, supplementing the standard names of province, city and district according to the id, and turning to the step 5.1;
4.2 if there is a city extracted from the address, the step 5.1 is carried out according to the id and the standard names of the province and the city;
4.3 if there is a province extracted from the address, according to id, the standard name of the complete province, go to step 5.1.
5. End up
5.1 returning the successful analysis;
5.2 returning the analysis failure;
fig. 2 is a schematic structural diagram illustrating an address normalization processing apparatus according to another exemplary embodiment.
As shown in fig. 2, the address normalization processing apparatus provided in this embodiment includes:
a processor 21, and a memory 22 connected to the processor;
the memory is used for storing a computer program;
the processor is configured to invoke and execute the computer program in the memory to perform the method as follows:
acquiring an initial address;
preprocessing the initial address to obtain a preprocessed address;
extracting at least one address name recorded in a normalized address information base from the preprocessed addresses, wherein the normalized address information base is established in advance, standard names and level information of each level of national addresses and alias names of each level of addresses are recorded in the normalized address information base, and each level of addresses comprise: the system comprises provincial-level addresses, city-level addresses and prefecture-level addresses, wherein the level information is used for indicating the superior-inferior relation of the addresses of all levels;
determining the address name of the lowest level in the extracted address names according to the level information corresponding to the extracted address names, and determining the standard name of at least one level address corresponding to the address name of the lowest level according to the normalized address information base;
and taking the standard name of at least one level address corresponding to the address name of the lowest level as the address of the initial address after the standardization processing, and outputting the address after the standardization processing.
Further, the preprocessing the initial address includes:
and deleting punctuation marks and spaces in the initial address.
Further, the extracting, from the preprocessed addresses, at least one address name recorded in a normalized address information base includes:
extracting a first address name from the preprocessed address;
after extracting a first address name, if the address name recorded in a normalized address information base cannot be extracted again from the preprocessed address, determining the first address name as the extracted address name; alternatively, the first and second electrodes may be,
after extracting a first address name, a second address name can be extracted from the preprocessed address, and if the second address name is a county-level address, the first address name and the second address name are determined as the extracted address name; alternatively, the first and second electrodes may be,
after extracting a first address name, a second address name can be extracted from the preprocessed address, and if the second address name is not a district-level address and a third address name can be extracted from the preprocessed address, the first address name, the second address name, and the third address name are determined as the extracted address names;
the first address name, the second address name and the third address name are all address names recorded in a normalized address information base, the first address name is a superior address of the second address name, and the second address name is a superior address of the third address name.
Further, the extracting the first address name from the preprocessed address includes:
extracting a first group of characters according to a first extraction length from a first starting point of the preprocessed address, wherein the position of the first starting point and the first extraction length are both updatable values, the initial position of the first starting point is the position of a first character of the preprocessed address, the end position of the first starting point is the position of a last character of the preprocessed address, and the minimum value, the step value and the maximum value of the first extraction length are preset values;
and when the first group of characters is the address name recorded in the normalized address information base, determining the first group of characters as the first address name.
Further, the extracting, from the preprocessed addresses, at least one address name recorded in a normalized address information base further includes:
extracting a second group of characters according to a second extraction length from a second starting point of the preprocessed address, wherein the position of the second starting point and the second extraction length are updatable values, the initial position of the second starting point is the position of the next character of the first address name, the end position of the second starting point is the position of the last character of the preprocessed address, and the minimum value, the step value and the maximum value of the second extraction length are preset values;
and when the second group of characters are address names recorded in a normalized address information base and the first address name is determined to be the upper address of the second group of characters according to the normalized address information base, determining the second group of characters as the second address name.
Further, the extracting, from the preprocessed addresses, at least one address name recorded in a normalized address information base further includes:
extracting a third group of characters according to a third extraction length from a third starting point of the preprocessed address, wherein the position of the third starting point and the third extraction length are updatable values, the initial position of the third starting point is the position of a character next to the name of the second address, the end position of the third starting point is the position of the last character of the preprocessed address, and the minimum value, the stepping value and the maximum value of the third extraction length are preset values;
and when the third group of characters are address names recorded in a normalized address information base and the second address name is determined to be the upper address of the third group of characters according to the normalized address information base, determining the third group of characters as the third address name.
Further, the determining a standard name of the at least one level address corresponding to the address name of the lowest level according to the normalized address information base includes:
when the address name of the lowest level is a district-county level address, obtaining a standard name of the district-county level address corresponding to the address name of the lowest level, a standard name of a city-level address corresponding to the address name of the lowest level and a standard name of a provincial level address corresponding to the address name of the lowest level according to the normalized address information base; alternatively, the first and second electrodes may be,
when the address name of the lowest level is a city address, obtaining a standard name of the city address corresponding to the address name of the lowest level and a standard name of a provincial address corresponding to the address name of the lowest level according to the normalized address information base; alternatively, the first and second electrodes may be,
and when the address name of the lowest level is a provincial address, obtaining a standard name of the provincial address corresponding to the address name of the lowest level according to the normalized address information base.
Further, the level information is digital information, wherein the number of digits of the level information corresponding to the lower address is greater than the number of digits of the level information corresponding to the upper address, and the level information corresponding to the lower address includes the level information corresponding to the upper address.
Further, the obtaining the initial address includes:
identifying express information of the express delivery to obtain a delivery address and an addressee, wherein the express information is filled by a user;
and respectively taking the mailing address and the receiving address as the initial addresses.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (8)

1. An address standardization processing method, comprising:
acquiring an initial address;
preprocessing the initial address to obtain a preprocessed address;
extracting at least one address name recorded in a normalized address information base from the preprocessed addresses, wherein the normalized address information base is established in advance, standard names and level information of each level of national addresses and alias names of each level of addresses are recorded in the normalized address information base, and each level of addresses comprise: the system comprises provincial-level addresses, city-level addresses and prefecture-level addresses, wherein the level information is used for indicating the superior-inferior relation of the addresses of all levels;
determining the address name of the lowest level in the extracted address names according to the level information corresponding to the extracted address names, and determining the standard name of at least one level address corresponding to the address name of the lowest level according to the normalized address information base;
taking the standard name of at least one level address corresponding to the lowest level address name as the address of the initial address after standardization processing, and outputting the address after standardization processing;
wherein, in the address after the preprocessing, extracting at least one address name recorded in a normalized address information base, includes:
extracting a first address name from the preprocessed address, including: extracting a first group of characters according to a first extraction length from a first starting point of the preprocessed address, wherein the position of the first starting point and the first extraction length are both updatable values, the initial position of the first starting point is the position of a first character of the preprocessed address, the end position of the first starting point is the position of a last character of the preprocessed address, and the minimum value, the step value and the maximum value of the first extraction length are preset values; when the first group of characters is an address name recorded in a normalized address information base, determining the first group of characters as a first address name;
extracting a second group of characters according to a second extraction length from a second starting point of the preprocessed address, wherein the position of the second starting point and the second extraction length are updatable values, the initial position of the second starting point is the position of the next character of the first address name, the end position of the second starting point is the position of the last character of the preprocessed address, and the minimum value, the step value and the maximum value of the second extraction length are preset values;
when the second group of characters are address names recorded in a normalized address information base and the first address name is determined to be an upper address of the second group of characters according to the normalized address information base, determining the second group of characters as a second address name;
when the second group of characters are not the address names recorded in the normalized address information base, updating the second extraction length according to the step value of the second extraction length and re-extracting the second group of characters, judging whether the second group of characters are recorded in the normalized address information base, and judging whether the first address name is the superior address of the second group of characters when the second group of characters are recorded in the normalized address information base;
when the second extraction length reaches the maximum value, the second group of characters are not address names recorded in the normalized address information base, or when the first address name is not the superior address of the second group of characters, the position of the second starting point is updated to be the next character position of the current position, the second group of characters are extracted again from the minimum value of the second extraction length, whether the second group of characters are recorded in the normalized address information base or not is judged, and whether the first address name is the superior address of the second group of characters or not is judged when the second group of characters are recorded in the normalized address information base.
2. The method of claim 1, wherein the pre-processing the initial address comprises:
and deleting punctuation marks and spaces in the initial address.
3. The method according to claim 1, wherein the extracting at least one address name recorded in a normalized address information base from the preprocessed addresses comprises:
after extracting a first address name, if the address name recorded in a normalized address information base cannot be extracted again from the preprocessed address, determining the first address name as the extracted address name; alternatively, the first and second electrodes may be,
after extracting a first address name, a second address name can be extracted from the preprocessed address, and if the second address name is a county-level address, the first address name and the second address name are determined as the extracted address name; alternatively, the first and second electrodes may be,
after extracting a first address name, a second address name can be extracted from the preprocessed address, and if the second address name is not a district-level address and a third address name can be extracted from the preprocessed address, the first address name, the second address name, and the third address name are determined as the extracted address names;
the first address name, the second address name and the third address name are all address names recorded in a normalized address information base, the first address name is a superior address of the second address name, and the second address name is a superior address of the third address name.
4. The method according to claim 1, wherein the extracting at least one address name recorded in a normalized address information base from the preprocessed addresses further comprises:
extracting a third group of characters according to a third extraction length from a third starting point of the preprocessed address, wherein the position of the third starting point and the third extraction length are updatable values, the initial position of the third starting point is the position of a character next to the name of the second address, the end position of the third starting point is the position of the last character of the preprocessed address, and the minimum value, the stepping value and the maximum value of the third extraction length are preset values;
and when the third group of characters are address names recorded in a normalized address information base and the second address name is determined to be the upper address of the third group of characters according to the normalized address information base, determining the third group of characters as the third address name.
5. The method according to claim 1, wherein said determining a standard name of at least one level address corresponding to the lowest level address name according to the normalized address information base comprises:
when the address name of the lowest level is a district-county level address, obtaining a standard name of the district-county level address corresponding to the address name of the lowest level, a standard name of a city-level address corresponding to the address name of the lowest level and a standard name of a provincial level address corresponding to the address name of the lowest level according to the normalized address information base; alternatively, the first and second electrodes may be,
when the address name of the lowest level is a city address, obtaining a standard name of the city address corresponding to the address name of the lowest level and a standard name of a provincial address corresponding to the address name of the lowest level according to the normalized address information base; alternatively, the first and second electrodes may be,
and when the address name of the lowest level is a provincial address, obtaining a standard name of the provincial address corresponding to the address name of the lowest level according to the normalized address information base.
6. The method according to claim 1, wherein the level information is digital information, wherein the number of digits of the level information corresponding to the lower address is greater than the number of digits of the level information corresponding to the upper address, and the level information corresponding to the lower address includes the level information corresponding to the upper address.
7. The method of claim 1, wherein obtaining the initial address comprises:
identifying express information of the express delivery to obtain a delivery address and an addressee, wherein the express information is filled by a user;
and respectively taking the mailing address and the receiving address as the initial addresses.
8. An address normalization processing apparatus, comprising:
a processor, and a memory coupled to the processor;
the memory is used for storing a computer program;
the processor is configured to invoke and execute the computer program in the memory to perform the method of any of claims 1-7.
CN201910246155.6A 2019-03-28 2019-03-28 Address standardization processing method and equipment Active CN109961259B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910246155.6A CN109961259B (en) 2019-03-28 2019-03-28 Address standardization processing method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910246155.6A CN109961259B (en) 2019-03-28 2019-03-28 Address standardization processing method and equipment

Publications (2)

Publication Number Publication Date
CN109961259A CN109961259A (en) 2019-07-02
CN109961259B true CN109961259B (en) 2021-07-27

Family

ID=67025235

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910246155.6A Active CN109961259B (en) 2019-03-28 2019-03-28 Address standardization processing method and equipment

Country Status (1)

Country Link
CN (1) CN109961259B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956043A (en) * 2019-12-17 2020-04-03 人和未来生物科技(长沙)有限公司 Domain professional vocabulary word embedding vector training method, system and medium based on alias standardization
CN111353309A (en) * 2019-12-25 2020-06-30 北京合力亿捷科技股份有限公司 Method and system for processing communication quality complaint address based on text analysis
CN111639493A (en) * 2020-05-22 2020-09-08 上海微盟企业发展有限公司 Address information standardization method, device, equipment and readable storage medium
CN112199458A (en) * 2020-09-23 2021-01-08 北京睿企信息科技有限公司 Address grading standard method based on big data
CN115238692A (en) * 2022-06-29 2022-10-25 青岛海尔科技有限公司 Method, system, device and storage medium for identifying place name

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014163977A1 (en) * 2013-03-13 2014-10-09 Google Inc. Systems, methods and computer-readable media for interpreting geographical search queries
CN104598887A (en) * 2015-01-29 2015-05-06 华东师范大学 Recognition method for written Chinese address of non-specification format
CN106959961A (en) * 2016-01-11 2017-07-18 阿里巴巴集团控股有限公司 A kind of Address Recognition method and device
CN108959244A (en) * 2018-06-07 2018-12-07 北京京东尚科信息技术有限公司 The method and apparatus of address participle

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140330865A1 (en) * 2011-11-30 2014-11-06 Nokia Corporation Method and apparatus for providing address geo-coding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014163977A1 (en) * 2013-03-13 2014-10-09 Google Inc. Systems, methods and computer-readable media for interpreting geographical search queries
CN104598887A (en) * 2015-01-29 2015-05-06 华东师范大学 Recognition method for written Chinese address of non-specification format
CN106959961A (en) * 2016-01-11 2017-07-18 阿里巴巴集团控股有限公司 A kind of Address Recognition method and device
CN108959244A (en) * 2018-06-07 2018-12-07 北京京东尚科信息技术有限公司 The method and apparatus of address participle

Also Published As

Publication number Publication date
CN109961259A (en) 2019-07-02

Similar Documents

Publication Publication Date Title
CN109961259B (en) Address standardization processing method and equipment
CN102523129B (en) Universal avionics bus test analysis method and device
CN101509783B (en) Data checking method and device applying to navigation electronic map production
CN107656913A (en) Map point of interest address extraction method, apparatus, server and storage medium
US20040243304A1 (en) Map information retrieving
CN110990520B (en) Address coding method and device, electronic equipment and storage medium
CN106469372B (en) Address mapping method and device
KR101132150B1 (en) Address processing for formalizing addresses
CN114492438A (en) Address standardization method based on knowledge graph and natural language processing technology
EP2575054A1 (en) Method of generating search trees and navigation device
CN111414357A (en) Address data processing method, device, system and storage medium
EP2034272A1 (en) Computer-implemented method, system and computer program product for transmission of feedback information
CN111859956B (en) Address word segmentation method for financial industry
KR20140075840A (en) System and Method for Refining of Address Database for Improvement of Mail Automated Reordering Sorting Machine
US20110191357A1 (en) Map data, storage medium and navigation apparatus
CN106469429B (en) Method and device for determining options through species relationship and electronic equipment
CN115204167A (en) Method and equipment for determining administrative region based on address information
CN101887462A (en) Rapid classification and registration method capable of continuously optimizing geographical name database
CN107885651B (en) Automatic system regression testing method and device for mobile terminal positioning algorithm
CN114443657A (en) Spatial data layer field checking method and system applied to digital twin city
CN109829025A (en) Route bearing calibration and device, electronic equipment, storage medium
EP0302547A2 (en) Device for executing a search in a topological representation of a geographical interconnection network.
US7587519B2 (en) Method and device for modifying modular messages
CN112966293B (en) Database dirty page detection method and device, computing device and storage medium
JP2003223459A (en) Managing method for address information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant