WO2015027835A1

WO2015027835A1 - System and terminal for querying mailing address postal codes

Info

Publication number: WO2015027835A1
Application number: PCT/CN2014/084607
Authority: WO
Inventors: 王国印; 贾西贝
Original assignee: 深圳市华傲数据技术有限公司
Priority date: 2013-08-27
Filing date: 2014-08-18
Publication date: 2015-03-05
Also published as: CN103440312B; CN103440312A

Abstract

The present invention provides a system for querying mailing address postal codes, said system comprising a mailing address input subsystem and a postal code query subsystem; by means of said address input subsystem prompting a user in real time to input text, the user determines, according to the prompted list of addresses, the mailing address to be queried; said postal code query subsystem standardizes the mailing address to be queried and retrieves the closest standardized mailing address, while also returning the postal code corresponding to said standardized mailing address. By means of assisting a user to input a prompt, the present invention makes the query format freer; on the basis of named-entity recognition, the invention can identify the level of user input address metadata, thereby achieving progressive address querying and simultaneous completion of mailing addresses, making query results more precise; in addition, the user can also obtain query results in two-dimensional code or find the location by linking to a map. The present invention also provides a terminal for querying mailing address postal codes.

Description

System and terminal for querying postal code by communication address

Technical field

The present invention relates to the field of zip code inquiry, and in particular to a system and terminal for querying a zip code by a communication address.

Background technique

With the rapid development of e-commerce and the informationization of the logistics industry, people can save time and money by completing shopping and mailing items without leaving the house. The e-commerce and logistics industries are inseparable from the communication address (also known as the mailing address, referred to as the address) and the postal code. These data need to be provided by the user. The current main practices of some e-commerce websites and the logistics industry are as follows: The complete address and address corresponding zip code; through the drop-down list to provide provinces, provinces below the prefecture-level city and prefecture-level cities below the districts and counties, these relatively fixed addresses for users to choose, the rest of the address and zip code manually input by the user; The entered address and zip code are convenient for the next time to use again. That is, if the address and the postal code that was entered this time have already been received, directly selected, the user is prevented from repeatedly inputting. The main problems of the above methods are as follows: In many cases, the user may not know the zip code corresponding to the address entered by himself; due to the pinyin-based input method and the flaws in Chinese itself (Chinese characters have multiple syllables, and multiple Chinese characters have the same pronunciation, most of them are based on Pinyin input methods are based on statistical language models), plus some uncommon words in the address will cause the input address to have typos; because the place name has an alias phenomenon, that is, the same place name has multiple names, for example "Guangdong Province" has the aliases "Guangdong" and "Yue", so they can't identify different descriptions of the same place name; in some cases users can't enter the full address, when they type, they look blank and helpless; because of the address There are incomplete changes and incomplete collections, and the data on these sites are often not updated. Some other websites currently solve the first problem, which is to help users get the zip code corresponding to the address. However, they often use database technology to implement the system. For addresses below the district level, they often use string fuzzy queries (like %XXX%) to participate in the search. For performance reasons, this method is large. The query of data volume is very inefficient. In addition, the database-based query makes the user's input format and content greatly restricted. For example, the user first selects the name of the provincial administrative region (including the province, special administrative region, autonomous region, and municipality), and then selects the prefecture-level administrative region ( Including the name of the prefecture-level city, autonomous prefecture, region, and alliance level, and then the name of the county-level administrative region (including the municipal district, county, flag, SAR, forest area, autonomous county, and autonomous flag, etc.), and the last user enters the township level and Village roads, etc. The input process of the query is very mechanical.

[0005] In addition, based on the database query mode, the address format is required to satisfy four levels, that is, provincial, prefecture-level, district-level, and then other specific addresses. However, not all addresses satisfy this situation. For example, there is no prefecture-level city level between the municipality and the provinces directly under the central government or the county-level cities under the jurisdiction of the municipality. Some special prefecture-level cities do not have district-level and county-level, such as Zhongshan, Guangdong Province. City, Dongguan City, Guangdong Province, Sanya City, Hainan Province, Sansha City, Hainan Province, Jiayuguan City, Gansu Province; their solutions, replaced by other names, such as "direct jurisdiction", "municipal jurisdiction", "provincial jurisdiction "etc, but the results of the query also generally contain data for these non-real addresses. Therefore, there is a need for a system that implements an accurate query zip code that assists the user in entering a prompt, giving a complete reference address, and normalizing the address to be queried.

Summary of the invention

To this end, the present invention has been made to solve one of the above drawbacks. Therefore, the present invention provides a system and terminal for querying a postal code by a communication address, which helps the user input a prompt to make the query format more free; based on the named entity identification technology, the level of the user input address metadata can be identified, thereby realizing the address. The step-by-step query and the completion of the communication address make the query result more accurate. In addition, the user can also obtain the query result by means of a two-dimensional code, or link the map for positioning.

Therefore, an embodiment of the present invention provides a system for querying a postal code by a communication address, the system comprising a communication address input subsystem and a postal code query subsystem; and the address input subsystem promptly prompts the user to input text, The user determines the communication address to be queried according to the prompt list address; the zip code query subsystem normalizes the communication address to be queried and retrieves the closest standardized communication address, Returns the zip code corresponding to the standardized communication address.

Preferably, the determining the communication address to be queried may further include: the user may not select the address in the prompt list, and determine the communication address to be queried only according to the user input text.

The real-time prompt includes: automatically changing the prompt content as the user inputs each increment of the text; the step of implementing the prompt content is specifically: acquiring an address text input by the current user and performing pre-processing to delete extra spaces; The segmentation obtains the address metadata and labels all the address levels; obtains the final place name entity annotation sequence by the place name entity identification, and generates a Query query statement; retrieves the index address file to obtain the prompt list address content.

Preferably, the preprocessing further comprises: converting a full-width character of a number or a letter into a half-width character; wherein the dictionary is stored in a pre-processing process using a dual array-based Trie tree data structure.

The prompt list address includes: The obtained prompt list addresses are arranged in descending order according to the closest standard address.

The standardizing the communication address to be queried includes the following specific steps: obtaining a communication address to be queried determined by the user and performing pre-processing; performing address segmentation to obtain address metadata, and labeling all address levels; Obtain the final list of place name entities and generate a Query query statement; parse the Query query statement and retrieve the index file to compare with it to obtain the closest communication address; perform address completion to generate a standardized communication address, and return the standardized communication address The postal code corresponding to the address.

Preferably, the corresponding zip code is determined according to a lowest address level value of the marked address.

The returning the zip code corresponding to the standardized communication address may further include: selecting the determined zip code query result, the user may obtain the map location; or sending the zip code query result to the mobile terminal device by using the two-dimensional code.

Preferably, the address segmentation adopts a binary model segmentation method; the named entity recognition technology identifies the most likely address level of each place name metadata in the place name entity annotation result.

Another embodiment of the present invention provides a terminal for querying a zip code by using a communication address, where the terminal includes: a user input prompting unit and a zip code query unit, wherein the user inputs a prompting unit for real-time Prompting the user to input and receive the communication address to be queried finally determined by the user; the zip code query unit is configured to retrieve a standardized communication address that is closest to the communication address to be queried, and receive a post corresponding to the standardized communication address coding. The invention makes the query format more free by helping the user input prompts; the named entity identification technology can identify the level of the user input address metadata, thereby implementing the level-by-level query of the address, and simultaneously completing the communication address, so that the search is performed. The result is more accurate. In addition, the user can also obtain the query result in two-dimensional code, or link the map for positioning.

DRAWINGS

FIG. 1 is a schematic flowchart of a system for querying a postal code by using a communication address according to an embodiment of the present invention. 2 is a detailed flow chart of an address input subsystem implemented by an embodiment of the present invention. FIG. 3 is a detailed flow chart of an address input subsystem implemented by an embodiment of the present invention.

FIG. 4 is a schematic diagram of an example of address completion in a postal code query subsystem implemented by an embodiment of the present invention. detailed description

The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. The system and terminal for querying the zip code of the communication address provided by the invention provide the user with the prompt to make the query format more free; the named entity identification technology can identify the level of the user input address metadata, thereby realizing the address The query is step by step, and the communication address is complemented at the same time, so that the query result is more accurate. In addition, the user can also obtain the query result by means of a two-dimensional code, or link the map for positioning. FIG. 1 is a schematic flowchart of a system for querying a postal code of a communication address according to an embodiment of the present invention. The system includes a communication address input subsystem and a postal code query subsystem, and specifically includes the following steps: Step S110: Address The input subsystem prompts the user to input the text in real time, and the user determines the communication address to be queried according to the prompt list address.

The detailed process of step S110 is as shown in FIG. 2, specifically: Step S111: Obtain the address text input by the user, and perform pre-processing on the obtained address text, and the pre-processing mainly includes turning the full angle of the number or letter. Change to half-width characters and remove extra spaces.

This input prompt automatically changes the prompt content as the user inputs each increase in the text, and can also save the real-time prompt. The user can directly input the communication address text to be queried in the address input prompting system, and if the real-time prompt is selected, Then the obtained hint list addresses are arranged in descending order according to the closest standard address.

Step S112: The address text is divided into addresses.

Since the word segmentation method used in full-text indexing is a binary model, that is, the longest Chinese word length in the index is 2, and the length of Chinese place names is generally more than 2, and each identified address metadata is identified to generate a PhraseQuery check. The syntax is used to filter out the words consisting of the last word of the previous address metadata and the first word of the next address metadata in the adjacent two address metadata. For example, the text input by the user: Shenzhen City, Guangdong Province, after the identification of the place name, the constructed PhraseQuery query syntax is: "Guangdong Province" "Shenzhen City", that is, each place name metadata is enclosed in double quotes. In this way, it is possible to filter out the results of the query caused by the words "deep-deep" and greatly improve the accuracy.

Dictionary-based word segmentation usually has a positive (left to right) match and a reverse (from right to left) match. In general, the inverse matching is half the error rate of the positive matching segmentation. For the solution of cross ambiguity, the cross ambiguity is defined as: ABC three consecutive Chinese characters, AB and BC can be words; in general, Chinese BC constituent words The probability is greater. Address segmentation is based on the address metadata dictionary using the inverse maximum matching algorithm to scan the user input address text from right to left to achieve the address segmentation. In order to improve the search speed, the dictionary uses a double array (Double Array) based Trie tree. The data structure is stored.

Step S113: Perform address labeling.

In this step, address metadata is required, which can be obtained from Wikipedia and the National Bureau of Statistics regarding the address metadata of the Chinese administrative division, and from the complete communication address by address segmentation and identification technology. The address metadata contains the following data: provincial administrative district names (including provinces, autonomous regions, municipalities and special administrative regions), prefecture-level administrative district names (prefecture-level cities, autonomous prefectures, regions, and alliances), and county-level administrative districts (including municipal districts, County-level cities, counties, autonomous counties, flags, autonomous flags, special zones and forest areas), township-level administrative district names (including townships, Town, street, Sumu, district office), other address data (including road name, village name, community name, building name and square name). The address metadata dictionary should contain various aliases for place names, and its format is defined as: Address metadata dictionary consists of multiple lines, each line becomes a term (Term), each Term should contain the address level corresponding to the place name and place name (level ) , where the name is key, the address level is the attribute or value of the key. The address metadata dictionary contains 2 items for each Term, that is, the address level corresponding to the place name and the place name. They are separated by a semicolon semicolon ";", and some place names contain multiple address levels (such as some standard versions). The alias of the address is also an alias for other standard version addresses. The different level levels are separated by a comma ",". When people write addresses, the usual formats are as follows: Provincial administrative districts, one prefecture-level administrative district, one county-level administrative district, one township-level administrative district, one other (this format is often used in the Internet), for example: Fuyang City, Anhui Province Huxiaozhai Village, Chenqiao Village Committee, Guanji Town, County; County-level administrative district, a township-level administrative district, and other (when the county-level administrative district is a county-level city, county, autonomous county, flag, autonomous flag, special zone and forest area) You can omit the prefecture-level administrative district. This format is often used on ID cards. For example: Huxiaozhai Village, Chenqiao Village Committee, Guanji Town, Taihe County, Anhui Province; Provincial-level administrative district, one-level administrative district, one township-level administrative district, one other ( This format is mainly used in the case of no county-level administrative districts under the prefecture-level administrative districts, such as Zhongshan City, Guangdong Province, Dongguan City, Guangdong Province, Sanya City, Hainan Province, Sansha City, Hainan Province, Jiayuguan City, Gansu Province, for example: Dongguan City, Guangdong Province Jiuming Village, Zhangmutou Town; Provincial-level administrative district, one county-level administrative district, one other, for example: No. 29, Gaoxin South Ring Road, Nanshan District, Shenzhen, Guangdong, China Student entrepreneurial building; provincial administrative district, one county-level administrative district, one other (this format is mainly used in the address under the municipality, or there is no address of the prefecture-level city, such as Hainan Province except the Sanya City, Sansha City and Haikou City County-level or provincial-administered counties, for example: No. 1500, Nanjing West Road, Pudong New Area, Shanghai. According to the above 5 points, in order to facilitate the processing, the address is generally divided into five levels, as shown in Table 1 below:

Address, etc.

Administrative area

Level Province, autonomous region, municipality, Guangdong Province, Inner Mongolia Autonomous Region, Shanghai

First level

Special Administrative Region City, Hong Kong Special Administrative Region

Shenzhen City, Pudong New Area, Daxinganling

Prefecture-level cities, municipalities directly under the central government, regions, and Enshi Tujia Miao autonomy

Region, autonomous prefecture, alliance, county, Xilin Gol League, Tongcheng, too

Secondary

City, county, autonomous county, flag, and county, Changbai Korean Autonomous County, Branch

Autonomous flag, special zone, forest area, Erqi left wing flag, Elunchun autonomy

Flag, Liuzhi Special Zone, Shennongjia Forest Area

Tertiary prefecture-level city jurisdiction Nanshan District

Zhaoji Township, Xutang Yizu Township, Guanji Town,

Township, ethnic township, town, street,

Level 4 Yuehai Street, Darihan Ulasumu,

Sumu, road

Shennan Avenue

Villages, communities, buildings, Liutang Village, Haiyi Oriental Garden, study abroad

Five-level square, number, unidentified living and entrepreneurial building, Wanda Plaza, Tiangan,

Place name

Table 1: Five-level hierarchical model of address level. For convenience of processing, the value of level is set to 1, 2, 3, 4, 0 in order according to the address level. That is, "1" represents the address level as one level, "2" represents the address level as level 2, "3" represents the address level as level 3, "4" represents the address level as level 4, and "0" represents the address level as level 5, "0" represents the address level as level 5 . The address level can be obtained from the attribute of each place name in the address metadata dictionary. If the segmented address does not exist in the dictionary, the address is an unrecognized address, and the address level is marked as level 0. Step S114: Perform geographical name entity identification. Because there are aliases for place names, plus people use the most simplified principle when expressing information, that is, using the short name (alias) of the place name to describe the place name, and the randomness of the expression (omit the high-level place name in the address, the common default is the province). Levels of place names, etc.) and entering any level of address or a short address fragment hope to get an approximate result or prompt, etc., which requires a strong address recognition capability, which is what this step is to achieve. The geographical name entity identification is to identify the most likely address level of each place name in the result of the place name entity labeling, for example, an address sequence: "Guangdong Shenzhen Baoan Xixiang" is the full name of "Xixiang Street, Bao'an District, Shenzhen City, Guangdong Province"; The results after the points and labels are: "Guangdong (1) Shenzhen (2, 4) Baoan (3) Xixiang (2, 4)"; The correct labeling sequence is: "Guangdong (1) Shenzhen (2) Baoan (3) Xixiang (4)". The system uses dynamic programming algorithm plus backtracking (Viterbi algorithm) to find the most accurate labeling sequence. The observation value and state in Viterbi algorithm are address levels. At this time, the algorithm becomes a first-order Markov process.

[0038] The toponymic entity identification includes two parts, one part is a processing flow for obtaining an optimal address level labeling sequence by the Viterbi algorithm, and the other part is to correct an optimal labeling level sequence that does not satisfy the rule according to the knowledge of the context, so that the recognition result is obtained. More precise. The Viterbi algorithm is described as follows: Contains an initial state value: = ;3⁄4, 3⁄4 3⁄4' ^ 3⁄4^, where is the initial probability of address level i. The value in Pi is set according to experience or prior knowledge. The value of each value in the following follows the following principles: The higher the administrative level of the address, the higher the initial probability, such as the initial probability of the provincial level is greater than the prefecture level.

[0039] An example of the implementation of the above algorithm is illustrated. Constructing a probabilistic model of the Viterbi algorithm based on prior knowledge,

Pi and A can take the following initial values:

Pi={0.05, 0.45, 0.25, 0.15, 0.1 };

A = { {0.05, 0.45, 0.25, 0.15, 0.10};

{0.05, 0.23, 0.45, 0.17, 0.10};

{0.05, 0.18, 0.25, 0.30, 0.22};

{0.05, 0.35, 0.05, 0.05, 0.50};

{0.05, 0.30, 0.15, 0.05, 0.45} }.

If the input address is: "Shenzhen Shenzhen Baoan Xixiang", after the address cutting and address labeling processing, the following four results sequence can be obtained: "Guangdong (1) Shenzhen (2) Baoan (3) Xixiang ( 4) "," Guangdong (1) Shenzhen (2) Baoan (3) Xixiang (2)", "Guangdong (1) Shenzhen (4) Baoan (3) Xixiang (4)", "Guangdong (1) Shenzhen (4) Baoan (3) Xixiang (2)". According to the Viterbi algorithm, we can know the weights of the four label states:

1. Guangdong (1) Shenzhen (2) Baoan (3) Xixiang (4); P = 0.030375;

2. Guangdong (1) Shenzhen (2) Baoan (3) Xixiang (2); P = 0.0030375;

3. Guangdong (1) Shenzhen (4) Baoan (3) Xixiang (4); P = 0.001125; 4. Guangdong (1) Shenzhen (4) Baoan (3) Xixiang (2); P = 1.125E-4.

The most probable sequence of labels is the first type of labeling. Therefore, the result of the dynamic programming algorithm output is also the first type of labeling status "Guangdong (1) Shenzhen (2) Baoan (3) Xixiang (4)".

Under the model and algorithm, it is impossible to solve the problem that the alias of a prefecture-level city area and the county or county-level city have the same alias, such as "Taihe County" (subordinate to Fuyang City, Anhui Province) and "Taihe District" (subordinate to Liaoning Province) Jinzhou), their aliases are "Taihe", but they belong to different address level levels. When "Xiangyang (City) Taihe" and "Jinzhou (City) Taihe" appear, the probability of "Taihe" labeling at the third pole address level is the largest according to the algorithm and probability model. The address name above determines whether the address level is "2" or "3", and so on as a special case for the correction of the labeling sequence. Examples are as follows:

The address entered is: "Hebei Shijiazhuang Pingshan Ancient Moon", the address sequence marked is: "Hebei (1, 2, 4) Shijiazhuang (2, 4) Pingshan (2, 3, 4) Ancient Moon (4) ", this The labeling level of each address in the labeling sequence is interpreted as: "Hebei" may be an alias of "Hebei Province", or an alias of "Hebei District" in Tianjin, or an alias of "Hebei Township"; "Shijiazhuang" It can be an alias for "Shijiazhuang City" and "Shijiazhuang Town"; "Pingshan" can be an alias for "Pingshan County" or "Pingshan District" or "Pingshan Town".

The optimal labeling sequence is: "Hebei (1) Shijiazhuang (2) Pingshan (3) Ancient Moon (4)".

The sequence of the mark after the correction according to the context is: "Hebei (1) Shijiazhuang (2) Pingshan (2) Ancient Moon (4)", because "Pingshan" is "Pingshan County".

It can be seen that when the alias of a prefecture-level city and the county or county-level city are the same, the prefecture-level city marked as the three-level address is its direct predecessor address, if not corrected. In order to facilitate the context, the rules are stored in the opposite manner as described above, that is, the alias of the prefecture-level city to which the county or county-level city belongs is the context, for example, (Taihe Yiyang). Therefore, when this context is satisfied, the level of the label is modified, and no modification is made when it is not satisfied.

At the same time, there are cases where the second-level address and the fourth-level address have the same name, mainly in the county-level city or county alias and the same name of the township alias. Since the four-level address can appear multiple times in a complete address, Sometimes the secondary address is marked on level 4. At this time, it is also necessary to discriminate according to the context. The sequence of the note. Examples are as follows:

The input address is: "Heihe River Heilongjiang Wudalianchi Xinfa Township and Mincun", the optimal labeling sequence is: "Heilongjiang (1) Heihe (2) Wudalianchi (4) Xinfa Township (4) and Mincun (0)", this The "Five Dalian Pool" was marked at the fourth level address level, in fact it is a county-level city.

The sequence of labels after correction according to the context is: "Heilongjiang (1) Heihe (2) Wudalianchi (2) Xinfa Township (4) and Mincun (0) ", similar to the solution with the same alias in the district and county, for townships and counties In the case of the same name, the rule reserved by the system is that the alias is the context of the prefecture-level city of the county or county-level city, for example (Wudao Pool Ihehe). Therefore, when this context is satisfied, the level of the label is modified, and no modification is made when it is not satisfied.

Therefore, for some special cases, a mechanism is also provided to correct the optimal label sequence according to the context. The method of processing is to eliminate the ambiguity caused by the alias according to the address context (an alias corresponds to multiple address levels). The result is more accurate.

Step S120: The postal code query subsystem normalizes the communication address to be queried and retrieves the closest standardized communication address, and returns the zip code corresponding to the standardized communication address.

In the postal code query subsystem, an index file for the address query zip code needs to be established. The index file is composed of a plurality of documents, each of which contains fields: an address field, a complete standard address. ; ZIP code domain, the zip code associated with the full standard address; the lowest level of the address, the administrative level of the lowest level address in the address. The data field value of the lowest level field (Level Field) is as follows:

Provincial administrative district level (including provinces, autonomous regions, municipalities directly under the Central Government and special administrative regions), represented by province; prefecture-level administrative district level (including prefecture-level cities, autonomous prefectures, regions, alliances, municipalities directly under the central government), represented by city; county-level administrative district level (including Municipal districts, counties, flags, special zones, forest areas, autonomous counties, and autonomous flags, etc., are indicated by district;

Township-level administrative district level (including township, town, street, Sumu, district office), represented by town;

Below the township level, use all to indicate.

An address text whose value corresponding to the lowest address level field is calculated as follows: First, the address text is preprocessed. The preprocessing includes deleting extra spaces, and the full-width characters are converted into half-width characters; the second is address segmentation and address labeling;

Next is the address naming entity identification, which obtains the final sequence of the geographical names of the geographical names.

Then calculate the value of the lowest address level of the text of the address according to the rules, and the rules are defined as follows: The address level in the labeling sequence is defined as follows:

1 > 2 >3 > 4 > 0, ie primary address > secondary address > tertiary address > four-level address > five-level address;

Returns 0 when the lowest address level in the label sequence is a five-level address;

Otherwise, when the lowest address level in the label sequence is four, and the number exceeds one, it directly returns 0; otherwise, when the number of secondary addresses in the label sequence exceeds 2 or the number of third-level addresses exceeds 1 Or when the sum of the number of three-level addresses plus the number of secondary addresses exceeds 2, directly returns 4;

Otherwise, when the lowest address level in the label sequence is exactly two consecutive secondary addresses, it returns directly to 3;

Otherwise, when the lowest address level in the label sequence is four, and the number is exactly one, if the four-level address is a road, it returns 0, otherwise it returns 4;

In other cases, return the lowest address level;

The lowest address is mapped to the lowest level value of the address field _{level: l → province; 2 → city} ; 3 → district; 4 → town; 0 → all.

The detailed process diagram of step S120 is shown in FIG. 3, which is specifically as follows:

Step S121: Acquire a communication address to be queried determined by the user and perform pre-processing.

Since in the address input subsystem, there may be an address text selected by the user to input by itself, and the input prompt function provided by the system is not used, it is necessary to pre-process the pre-queried communication address confirmed by the user, the pre-processing process and the content and The same is true in the address input subsystem.

Step S122: Perform address segmentation to obtain address metadata, and mark all address levels.

Step S123: obtaining the final place name entity labeling sequence by the place name entity identification, and generating a Query query statement. Step S124: Parse the Query query statement and retrieve the index file to compare with the index file to obtain the closest communication address.

Step S125: Perform address restoration to generate a standardized communication address, and return a zip code corresponding to the standardized communication address.

The steps of the zip code query subsystem are similar to the steps of the address input subsystem. The only difference is that the zip code query subsystem needs to complete the communication address, so step S 121 of the zip code query subsystem is The specific implementation process of the S 124 step implementation process refers to the specific implementation process in the address input subsystem, and the address completion process is mainly described here, as follows:

When the user submits a query request, the system returns the result of the query and ranks the address most similar to the address text entered by the user. Because the reference data is not collected too much, plus every year there are new buildings, roads, communities, etc., there are some administrative divisions, etc., so that the address in the first location is the address after the district and county location. There is a discrepancy with the address entered by the user. The system uses address completion technology to modify the most similar return results to make it closer to the user's requirements.

Address completion is a technique to improve the results of queries based on user input, making the results closer to the needs of users. Address replenishment is mainly used at a certain level of address, and it is difficult to collect all of them. The new addition is relatively large, mainly concentrated on the four-level and five-level addresses. In the case of address completion, the order of the address level entered by the user is normal, that is, there is no one or two level address appearing after the level four or five level address. The four-level address and the subsequent part of the address level input by the user are identified, and are stitched to the third-level address in the address with the most similar search result. An example of address completion is shown in Figure 4.

In step S125, the corresponding zip code is determined according to the lowest address level value of the marked address, and finally the zip code corresponding to the standardized communication address is returned, and the determined zip code query result can be selected, and the user can obtain the map location or pass the two-dimensional The code sends the postal code query result to the mobile terminal device.

Another embodiment of the present invention provides a terminal for a communication address query zip code, the terminal includes: a user input prompting unit and a zip code query unit, wherein the user input prompting unit is configured to prompt the user to input and Receiving a communication address to be queried finally determined by the user; The normalized communication address closest to the communication address to be queried is retrieved, and the zip code corresponding to the standardized communication address is received. The invention makes the query format more free by helping the user input prompts; the named entity identification technology can identify the level of the user input address metadata, thereby implementing the level-by-level query of the address, and simultaneously completing the communication address, so that the search is performed. The result is more accurate. In addition, the user can also obtain the query result in two-dimensional code, or link the map for positioning.

The above is a further detailed description of the present invention in conjunction with the specific preferred embodiments. It is not intended that the specific embodiments of the invention are limited to the description. For those skilled in the art to which the present invention pertains, several simple derivations or substitutions may be made without departing from the inventive concept.

Claims

A system for querying a postal code by a communication address, characterized in that the system comprises a communication address input subsystem and a postal code query subsystem;

The address input subsystem prompts the user to input the text in real time, and the user determines the communication address to be queried according to the prompt list address;

The postal code query subsystem normalizes the communication address to be queried and retrieves the closest standardized communication address, and returns the zip code corresponding to the standardized communication address.

The system according to claim 1, wherein the determining the communication address to be queried may further include: the user may not select an address in the prompt list, and determine the communication address to be queried only according to the user input text. .

3. The system according to claim 1, wherein the real-time prompt comprises:

Automatically change the prompt content as the user enters each increment of the article;

The implementation steps of the prompt content are specifically:

Obtain the address text input by the current user and perform preprocessing to remove extra spaces;

Perform address segmentation to obtain address metadata and label all address levels;

The final place name entity tag sequence is obtained by the place name entity identification, and a Query query statement is generated; the index address file is retrieved, and the prompt list address content is obtained.

The system according to claim 3, wherein the preprocessing further comprises:

Converts a full-width character of a number or letter to a half-width character; the dictionary is stored in a pre-processing process using a dual array-based Trie tree data structure.

5. The system according to claim 1, wherein the prompt list address comprises: the obtained prompt list address is arranged in descending order according to a closest standard address.

6. The system according to claim 1, wherein the standardizing the communication address to be queried comprises the following specific steps: Obtaining a communication address to be queried determined by the user and performing preprocessing;

Obtaining the final place name entity annotation sequence by the place name entity identification, and generating a Query query statement; parsing the Query query statement and retrieving the index file to compare with the same, obtaining the closest communication address; performing address completion to generate a standardized communication address, and Returns the zip code corresponding to the standardized communication address.

7. The system of claim 1, wherein the corresponding zip code is determined based on a lowest address level value of the tagged address.

The system according to claim 6, wherein the returning the zip code corresponding to the standardized communication address may further include: selecting a determined zip code query result, the user may obtain a map location; or adopting a two-dimensional code Send the postal code query results to the mobile device.

The system according to claim 3 or 6, wherein the address segmentation adopts a binary model segmentation method; the named entity recognition technology recognizes that each place name metadata in the landmark name annotation result is most likely Address level.

A terminal for querying a zip code by a communication address, wherein the terminal comprises a user input prompting unit and a zip code query unit; the user input prompting unit is configured to prompt the user to input and receive the user final determination in real time. The zip code query unit is configured to retrieve a standardized communication address that is closest to the communication address to be queried, and receive a zip code corresponding to the standardized communication address.