CN102024024A - Method and device for constructing address database - Google Patents
Method and device for constructing address database Download PDFInfo
- Publication number
- CN102024024A CN102024024A CN 201010540110 CN201010540110A CN102024024A CN 102024024 A CN102024024 A CN 102024024A CN 201010540110 CN201010540110 CN 201010540110 CN 201010540110 A CN201010540110 A CN 201010540110A CN 102024024 A CN102024024 A CN 102024024A
- Authority
- CN
- China
- Prior art keywords
- address
- normal form
- original
- information
- participle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for constructing an address database. The method comprises the following steps of: acquiring original address data; classifying the original address data and generating a paradigm address by a segmentation model; and classifying the paradigm address into a paradigm address database. The invention also discloses a device for constructing the address database. The invention has the advantages that an address to be classified is segmented and classified by using the segmentation model through address attributes and is stored to the paradigm address database, so that the construction efficiency of the address database is high, and the accuracy is also high.
Description
[technical field]
The present invention relates to a kind of constructing method and device of address database, refer in particular to a kind of address database constructing method and device based on learning model.
[background technology]
Since more than ten years in past, along with the development of Internet technology, that people depend on more and more that the internet provides is abundant, fast, information timely.But how in vast as the open sea information, to find information to be searched, become a problem that presses for solution, correspondingly, arise at the historic moment in numerous internet search engines and corresponding website, the outstanding person in the middle of this comprises Baidu's search (www.baidu.com) of company of Baidu and Google's search (www.google.cn) of Google.
In numerous information that need to search, the important information of a class is Search Address information, and the demand of this class obtains paying attention to when searching online electronic map information especially.So-called online electronic chart is with respect to the traditional paper map or the electronic chart of unit, it has and upgrades in time, is convenient to inquire about, use the succinct and plurality of advantages such as abundant information that provide directly perceived, at present the Baidu's map of relatively extensively being recommended in the online electronic chart supplier of China (map.baidu.com) that comprises company of Baidu and the google map (ditu.google.cn) of Google wherein especially satisfy Chinese user's use habit more and have obtained widespread use with Baidu's map of company of Baidu.
Wherein, when the user of online electronic chart inquired about the address searching frame of certain address to be checked Input Online electronic chart, this address to be checked can be inquired about in the address database of construction.
Yet there are some defectives in existing structure address database technology.Existing address database just utilizes dictionary when construction, vocabulary, suffix Keyword List and the artificial mode of summing up are classified into address database after with the address date participle that receives, it is often by manually adapting to the address date that receives, illustrate: when being " No. 100, south, street, Zhong Guan-cun " as if the address that receives, it at first passes through dictionary, vocabulary, the suffix Keyword List, carry out participle, as, the suffix Keyword List may be: " street ", " road ", " road ", " number " etc., running into as " street " so, " road ", " road ", " number " etc. during key word, promptly behind key word, carry out participle, illustrate: if the address that receives is " No. 100, south, street, Zhong Guan-cun ",, be " street, Zhong Guan-cun " then with this address participle by the suffix Keyword List, " south ", " No. 100 "; After participle finishes, again by the artificial mode that adapts to, for the address information behind the participle adds attribute, its attribute labeling is in proper order: link name-orientation name-doorplate name is that link name, " south " add the orientation name, " No. 100 " middle adding attribute is the doorplate name as add attribute in " street, Zhong Guan-cun ".Yet, if the address that receives is " No. 100 south, street, Zhong Guan-cun ", after it being divided into " street, Zhong Guan-cun ", " No. 100 ", " south " by above-mentioned participle, also will add new attribute labeling for the address information behind this participle is in proper order: link name-doorplate name-orientation name, and, be that the adding attribute is that doorplate name, " south " add the orientation name in link name, " No. 100 " as in " street, Zhong Guan-cun ", adding attribute to the adding of the address behind this participle attribute.
Above-mentioned address date constructing method because of will constantly adding new attribute labeling order, thereby causes processing procedure comparatively complicated, efficient is lower, in addition, just carry out participle, can cause the participle accuracy rate lower by the mode of dictionary, vocabulary, suffix keyword.
Therefore, need provide a kind of improved address database constructing method and device.
[summary of the invention]
The object of the present invention is to provide a kind of constructing method of improved address database, described method is set up the normal form address database based on a large amount of original address data of input.
Another object of the present invention is to provide a kind of construction device of improved address database, described device is set up the normal form address database based on a large amount of original address data of input.
Correspondingly, the constructing method of the address database of one embodiment of the present invention comprises:
A kind of constructing method of normal form address database comprises:
S1, obtain the original address data;
S2, participle model is to described original address data qualification and produce the normal form address;
S3, described normal form address is sorted out into the normal form address database.
As a further improvement on the present invention, described S2 may further comprise the steps:
Described participle model carries out participle to described original address;
Produce described normal form address by described participle.
As a further improvement on the present invention, described S1 comprises:
Judge described original address data whether with the format match of normal form address;
If coupling is then directly exported described original address data as the normal form address.
As a further improvement on the present invention, described S1 comprises:
Judge described original address data whether with the format match of normal form address;
If do not match, then enter S2.
As a further improvement on the present invention, also comprise address statistical study step behind described S1: described address statistical study step is carried out statistical study to the original address data, produces the normal form address.
As a further improvement on the present invention, described S1 comprises:
Judge described original address data whether with the format match of normal form address;
If do not match, then enter address statistical study step.
As a further improvement on the present invention, described address statistical study step comprises:
First address information before the identification unknown address information;
Second address information after the identification unknown address information;
Address style information in the address date resources bank in the middle of described first address information of statistics and second address information, and calculate the probability that described address style information occurs;
Address style information that probability is the highest and preset threshold relatively if be higher than described threshold value, then produce the normal form address with described address style information in conjunction with first address information and second address information.
As a further improvement on the present invention, described address statistical study step comprises:
If be lower than described threshold value, then enter the S2 step.
As a further improvement on the present invention, before described S2, further comprising the steps of:
Address date obtains: obtain the original address data;
Generate language material: some described original address data are become language material according to the normal form standard participle of formulating;
Study language material:, make up described participle model by the machine learning mode based on described language material.
As a further improvement on the present invention, described machine learning mode is the condition random field type.
As a further improvement on the present invention, described machine learning mode is the support vector machine mode.
As a further improvement on the present invention, described machine learning mode is a hidden Markov model.
As a further improvement on the present invention, described S3 specifically may further comprise the steps:
Address base is set up step: the normal form address base of setting up a tree structure;
Address input step: receive described normal form address;
Address sort step: analyze described normal form address, and described normal form address is sorted out to described normal form address base according to described tree structure.
As a further improvement on the present invention, described normal form address base has some branches, and the end of each branch has at least one leaf node.
As a further improvement on the present invention, described address sort step also comprises described normal form address sort in the described standard normal form address base at least one leaf node.
As a further improvement on the present invention, the tree structure of described normal form address base comprises administrative region layer and the subaddressing layer based on the address logic level.
As a further improvement on the present invention, described administrative region layer comprises four levels: first level is province/autonomous region/municipality directly under the Central Government; Second level is city/autonomous prefecture; The 3rd level is district/county; The 4th level is township/town/street.
As a further improvement on the present invention, described subaddressing layer comprises one of them of road class address, regional class address and terrestrial reference class address at least.
As a further improvement on the present invention, described road class address is used to define the specific address with headed by the road.
As a further improvement on the present invention, described regional class address is used to define the specific address with headed by the sub-district.
As a further improvement on the present invention, described terrestrial reference class address is used to define a concrete location point.
The constructing method of the address database of another embodiment of the invention comprises:
S1, obtain the original address data;
S2, participle model is to described original address data qualification and produce candidate's normal form address;
S3, described candidate's normal form address is sorted out into the normal form address database.
As a further improvement on the present invention, described S2 may further comprise the steps:
Described participle model carries out participle to described original address;
Produce candidate's normal form address by described participle.
As a further improvement on the present invention, described S3 may further comprise the steps:
With described candidate's normal form address process is the normal form address;
Described normal form address is sorted out into the normal form address database.
As a further improvement on the present invention, described S 1 comprises:
Judge described original address data whether with the format match of candidate's normal form address;
If coupling is then directly exported described original address data as candidate's normal form address.
As a further improvement on the present invention, described S1 comprises:
Judge described original address data whether with the format match of candidate's normal form address;
If do not match, then enter S2.
As a further improvement on the present invention, also comprise address statistical study step behind described S1: described address statistical study step is carried out statistical study to the original address data, produces the normal form address.
As a further improvement on the present invention, described S1 comprises:
Judge described original address data whether with the format match of candidate's normal form address;
If do not match, then enter described address statistical study step.
As a further improvement on the present invention, described address statistical study step comprises:
First address information before the identification unknown address information;
Second address information after the identification unknown address information;
Address style information in the address date resources bank in the middle of described first address information of statistics and second address information, and calculate the probability that described address style information occurs;
Address style information that probability is the highest and preset threshold relatively if be higher than described threshold value, then produce candidate's normal form address with described address style information in conjunction with first address information and second address information.
As a further improvement on the present invention, described address statistical study step comprises:
If be lower than described threshold value, then enter the S2 step.
As a further improvement on the present invention, before described S2, further comprising the steps of:
Address date obtains: obtain the original address data;
Generate language material: some described original address data are become language material according to the normal form standard participle of formulating;
Study language material:, make up described participle model by the machine learning mode based on described language material.
As a further improvement on the present invention, described machine learning mode is the condition random field type.
As a further improvement on the present invention, described machine learning mode is the support vector machine mode.
As a further improvement on the present invention, described machine learning mode is a hidden Markov model.
As a further improvement on the present invention, further comprising the steps of before the described S3:
Address base is set up step: the normal form address base of setting up a tree structure;
Address input step: receive described normal form address;
Address sort step: analyze described normal form address, and described normal form address is sorted out to described normal form address base according to described tree structure.
As a further improvement on the present invention, described normal form address base has some branches, and the end of each branch has at least one leaf node.
As a further improvement on the present invention, described address sort step also comprises described normal form address sort in the described standard normal form address base at least one leaf node.
As a further improvement on the present invention, the tree structure of described normal form address base comprises administrative region layer and the subaddressing layer based on the address logic level.
As a further improvement on the present invention, described administrative region layer comprises four levels: first level is province/autonomous region/municipality directly under the Central Government; Second level is city/autonomous prefecture; The 3rd level is district/county; The 4th level is township/town/street.
As a further improvement on the present invention, described subaddressing layer comprises one of them of road class address, regional class address and terrestrial reference class address at least.
As a further improvement on the present invention, described road class address is used to define the specific address with headed by the road.
As a further improvement on the present invention, described regional class address is used to define the specific address with headed by the sub-district.
As a further improvement on the present invention, described terrestrial reference class address is used to define a concrete location point.
Correspondingly, the address database construction device of one embodiment of the present invention comprises:
The raw data acquisition module is used to obtain the original address data;
The participle model module is used for described original address data qualification and produces the normal form address;
Normal form address generation module is used for described normal form address is sorted out into the normal form address database.
As a further improvement on the present invention, the original address information in the described raw data acquisition module comprises text message and coordinate information.
As a further improvement on the present invention, described address database construction device also comprises the address statistical analysis module, is used for the original address data are carried out statistical study, produces the normal form address.
As a further improvement on the present invention, described address database construction device also comprises:
Generate the language material module: be used for some described original address data are become language material according to the normal form standard participle of formulating;
Study language material module: be used for making up described participle model by the machine learning mode based on described language material.
As a further improvement on the present invention, described machine learning mode is the condition random field type.
As a further improvement on the present invention, described machine learning mode is the support vector machine mode.
As a further improvement on the present invention, described machine learning mode is a hidden Markov model.
As a further improvement on the present invention, described normal form address generation module also comprises:
Address base is set up the unit, is used to set up the normal form address base of a tree structure;
Address input unit is used to receive described candidate's normal form address;
The address sort unit is used to analyze described candidate's normal form address, and described candidate's normal form address is sorted out to described normal form address base according to described tree structure.
As a further improvement on the present invention, described normal form address base has some branches, and the end of each branch has at least one leaf node.
As a further improvement on the present invention, the tree structure of described normal form address base comprises administrative region layer and the subaddressing layer based on the address logic level.
As a further improvement on the present invention, described administrative region layer comprises four levels: first level is province/autonomous region/municipality directly under the Central Government; Second level is city/autonomous prefecture; The 3rd level is district/county; The 4th level is township/town/street.
As a further improvement on the present invention, described subaddressing layer comprises one of them of road class address, regional class address and terrestrial reference class address at least.
The address database construction device of another embodiment of the invention comprises:
The raw data acquisition module is used to obtain the original address data;
The participle model module, participle model is to described original address data qualification and produce candidate's normal form address:
Normal form address generation module is used for described candidate's normal form address is sorted out into the normal form address database.
As a further improvement on the present invention, the original address information in the described raw data acquisition module comprises text message and coordinate information.
As a further improvement on the present invention, described address database construction device also comprises the address statistical analysis module, is used for the original address data are carried out statistical study, produces candidate's normal form address.
As a further improvement on the present invention, described address database construction device also comprises:
Generate the language material module: be used for some described original address data are become language material according to the normal form standard participle of formulating;
Study language material module: be used for making up described participle model by the machine learning mode based on described language material.
As a further improvement on the present invention, described machine learning mode is the condition random field type.
As a further improvement on the present invention, described machine learning mode is the support vector machine mode.
As a further improvement on the present invention, described machine learning mode is a hidden Markov model.
As a further improvement on the present invention, described normal form address generation module also comprises:
Address base is set up the unit, is used to set up the normal form address base of a tree structure;
Address input unit is used to receive described normal form address;
The address sort unit is used to analyze described normal form address, and described normal form address is sorted out to described normal form address base according to described tree structure.
As a further improvement on the present invention, described normal form address base has some branches, and the end of each branch has at least one leaf node.
As a further improvement on the present invention, the tree structure of described normal form address base comprises administrative region layer and the subaddressing layer based on the address logic level.
As a further improvement on the present invention, described administrative region layer comprises four levels: first level is province/autonomous region/municipality directly under the Central Government; Second level is city/autonomous prefecture; The 3rd level is district/county; The 4th level is township/town/street.
As a further improvement on the present invention, described subaddressing layer comprises one of them of road class address, regional class address and terrestrial reference class address at least.
The invention has the beneficial effects as follows: the utilization participle model is treated Categories Address by address properties and is cut the speech classification, and is stored to the normal form address database, make that address database construction efficient of the present invention is higher, and accuracy rate is also higher.
[description of drawings]
Fig. 1 is the process flow diagram of the address database constructing method of one embodiment of the present invention.
Fig. 2 is the process flow diagram of the address database constructing method of another embodiment of the present invention.
Fig. 3 is the structural representation of the address database construction device of one embodiment of the present invention.
Fig. 4 is the process flow diagram of the address database constructing method of one embodiment of the present invention.
Fig. 5 is the process flow diagram of the address database constructing method of another embodiment of the present invention.
Fig. 6 is the structural representation of the address database construction device of another embodiment of the present invention.
Fig. 7 is the structural representation of normal form of the present invention address generation module.
Fig. 8 is the process flow diagram of normal form address generating method of the present invention.
Fig. 9 is the normal form address base configuration diagram of address base setup unit of the present invention.
Figure 10 is the process flow diagram of construction participle model of the present invention.
Figure 11 is the modular structure synoptic diagram of construction participle model of the present invention.
[embodiment]
Understand for technical characterictic, goal of the invention and technique effect to invention have more clearly, now contrast description of drawings the specific embodiment of the present invention, identical label is represented the part that step is identical in each figure.In this article, " schematically " expression " is served as example, example or explanation ", any diagram, the embodiment that is described to " schematically " in this article should be interpreted as a kind of preferred or have more the technical scheme of advantage.
At first with reference to figure 1, the address database constructing method of one embodiment of the present invention may further comprise the steps:
S1, obtain the original address data.Wherein, these original address data comprise the text message and the coordinate information of address, described text message refers to any specific address of one of them at least that can represent road class address, regional class address, terrestrial reference class address, and described coordinate information refers to the concrete coordinate points of these original address data.For example: the original address data are " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City+(x; y) ", and wherein, " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City " are the text message of these original address data, (x y) is the coordinate information of these original address data.
S2, participle model carry out participle and produce the normal form address described original address data.Wherein, how this participle model is set up, and it is to learn what kind of word segmentation regulation, will disclose at follow-up instructions.
S3, described normal form address is sorted out into the normal form address database.What deserves to be mentioned is: same original address data, when depositing the normal form address database in, may be a plurality of memory addresss, for example, the original address data are " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City+(x, y) ", and it obtains " Haidian District Beijing ", " upward No. 10, ten streets ", " Baidu's mansion " through behind participle, then when depositing this database in, this memory address then may be two: the one, " No. 10, ten streets, ShangDi, Haidian District, BeiJing City "; The 2nd, " Haidian District, Beijing City Baidu mansion ", it is to carry out classification and storage according to the rule of administrative region+road class address, administrative region+terrestrial reference class address.In above-mentioned example, the administrative region is that Haidian District, Beijing City, road class address are that No. 10, ten streets, last ground, terrestrial reference class address are Baidu's mansion.Described storage mode will disclose in follow-up instructions in detail.
At first with reference to figure 2, the address database constructing method of another embodiment of the invention may further comprise the steps:
S1 ', obtain the original address data.Wherein, these original address data comprise the text message and the coordinate information of address, described text message refers to any specific address of one of them at least that can represent road class address, regional class address, terrestrial reference class address, and described coordinate information refers to the concrete coordinate points of these original address data.For example: the original address data are " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City+(x; y) ", and wherein, " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City " are the text message of these original address data, (x y) is the coordinate information of these original address data.
S2 ', participle model carry out participle and produce candidate's normal form address described original address data.This candidate's normal form address will be handled it in thereafter S3 ' step, and classification deposits in to the normal form address database.Wherein, how this participle model is set up, and it is to learn what kind of word segmentation regulation, will disclose at follow-up instructions.
S3 ', with described candidate's normal form address process and sort out the normal form address database.Described processing refers to the tree derivation with the corresponding normal form address database in this candidate's normal form address, its form is adjusted into branch or the leaf node that meets fully in this normal form address tree derivation.What deserves to be mentioned is: same original address data, when depositing the normal form address database in, may be a plurality of memory addresss, for example, the original address data are " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City+(x, y) ", and it obtains " Haidian District Beijing ", " upward No. 10, ten streets ", " Baidu's mansion " through behind participle, then when depositing this database in, this memory address then may be two: the one, " No. 10, ten streets, ShangDi, Haidian District, BeiJing City "; The 2nd, " Haidian District, Beijing City Baidu mansion ", it is to carry out classification and storage according to the rule of administrative region+road class address, administrative region+terrestrial reference class address.In above-mentioned example, the administrative region is that Haidian District, Beijing City, road class address are that No. 10, ten streets, last ground, terrestrial reference class address are Baidu's mansion.Described storage mode will disclose in follow-up instructions in detail.
Correspondingly, please refer to Fig. 3, be the address database construction device of one embodiment of the present invention, it comprises raw data acquisition module 1, participle model module 2, and normal form address generation module 4.
Wherein, raw data acquisition module 1 is used to obtain the original address data that comprise a large amount of address informations.Wherein, these original address data comprise the text message and the coordinate information of address, described text message refers to any specific address of one of them at least that can represent road class address, regional class address, terrestrial reference class address, and described coordinate information refers to the concrete coordinate points of these original address data.For example: the original address data are " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City+(x; y) ", and wherein, " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City " are the text message of these original address data, (x y) is the coordinate information of these original address data.
Normal form address generation module 4 is used for described normal form address is sorted out into the normal form address database.It is pointed out that in another embodiment of the present invention what this step received is candidate's normal form address, this step needs this candidate's normal form address is handled, and restores to the normal form address database.Described processing refers to the tree derivation with the corresponding normal form address database in this candidate's normal form address, its form is adjusted into branch or the leaf node that meets fully in this normal form address tree derivation.The address information that meets the normal form database format that described " normal form address " refer to obtains by raw data acquisition module 1, participle model module 2, normal form address generation module 4.These address informations will be classified in the address style below the corresponding subaddressing layer according to the described call format of Fig. 9 of the present invention and go, and this part will have detailed introduction when back segment text description Fig. 9.What deserves to be mentioned is: same original address data, when depositing address database in, may be a plurality of memory addresss, for example, the original address data are " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City+(x, y) ", and it obtains " Haidian District Beijing ", " upward No. 10, ten streets ", " Baidu's mansion " through behind participle, then when depositing this database in, this memory address then may be two: the one, " No. 10, ten streets, ShangDi, Haidian District, BeiJing City "; The 2nd, " Haidian District, Beijing City Baidu mansion ", it is to carry out classification and storage according to the rule of administrative region+road class address, administrative region+terrestrial reference class address.In above-mentioned example, the administrative region is that Haidian District, Beijing City, road class address are that No. 10, ten streets, last ground, terrestrial reference class address are Baidu's mansion.
With reference to figure 4, as one embodiment of the present invention, the constructing method of this address database also can be expanded by above-mentioned steps again, is deformed into following detailed operation flow process:
Step S10: obtain the original address data.These original address data comprise the text message and the coordinate information of address, described text message refers to any specific address of one of them at least that can represent road class address, regional class address, terrestrial reference class address, and described coordinate information refers to the concrete coordinate points of these original address data.For example: the original address data are " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City+(x; y) ", and wherein, " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City " are the text message of these original address data, (x y) is the coordinate information of these original address data.
Step S11: at certain concrete address information, judge whether described address information meets the requirement of normal form address,, then directly enter step S16, if the undesirable step S12 that then enters if meet the requirements.
Step S12: i.e. address statistical study step is used for described a large amount of address information and carries out statistical study based on existing address date resources bank, and based on the frequency that certain address information occurs, produce the normal form address in all address informations.Need the reason of this step to be, described original address information might not all be the complete normal form address that can be directly applied for step S16.Very common may be, also imperfect by the original address information that all multipaths (for example internet data collection approach) get access to, described imperfect address information does not also meet the call format of step S16 normal form address, need further handle based on the method for statistical study, described statistical analysis technique is: first address information before the identification unknown address information; Second address information after the identification unknown address information; Address style information in the address date resources bank in the middle of described first address information of statistics and second address information, and calculate the probability that described address style information occurs; Address style information that probability is the highest and preset threshold are relatively.Illustrate: if original address information is " No. 13, Xisi, famous beauty in the late Spring and Autumn Period alleyway, Zongguancun Street, Haidian District, Beijing City ", then this address is discerned in the past backward, when " Haidian District, Beijing City ", " street, Zhong Guan-cun " all can identify it by the address date resources bank is address, administrative region and road class address, and " Xisi, famous beauty in the late Spring and Autumn Period alleyway " is in the time of can not discerning, then carry out reversal of identification, promptly from after forward identification, when " No. 13 " are identified is " during the doorplate address ", then in described address date resources bank, add up, which kind of address style information middle address of inserting be to doorplate class address in statistics road class address, if after statistics, the probability of finding class address, alleyway is the highest, and relatively this probability and pre-set threshold, enter the S13 step.
Step S13: if described probability is higher than preset threshold, then described address information is used as the normal form address, and directly enters step S16; If described probability is lower than preset threshold, then this address information not can be used as the use of normal form address, and enters step S14.
Step S14: participle model participle step is used for the described address information that still can't handle through step S13 is analyzed, and based on predefined participle model, produces the normal form address.In an embodiment of the invention, be based on condition random field (conditional random field, CRF) method of study expectation produces described " participle model ", carry out participle and produce the normal form address by this participle model, can export the participle and the attribute labeling information of described normal form address simultaneously.
Step S16: the normal form address generates step, is used for processings of classifying of described normal form address, and is referred in the normal form address database of correspondence.The address information that meets the normal form database format that described " normal form address " refer to obtains by step S11, step S13, step S14.These address informations will be classified in the address style below the corresponding subaddressing layer according to the described call format of Fig. 9 of the present invention and go, and this part will have detailed introduction when back segment text description Fig. 9.
What deserves to be mentioned is: in another embodiment of the present invention, if undesirable in the S11 step, also can directly enter the S14 step, its specifically judge or processing procedure in the above-mentioned steps unanimity, do not repeat them here.
With reference to figure 5, as another embodiment of the invention, the constructing method of this address database also can be expanded by above-mentioned steps again, is deformed into following detailed operation flow process:
Step S10 ': obtain the original address data.These original address data comprise the text message and the coordinate information of address, described text message refers to any specific address of one of them at least that can represent road class address, regional class address, terrestrial reference class address, and described coordinate information refers to the concrete coordinate points of these original address data.For example: the original address data are " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City+(x; y) ", and wherein, " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City " are the text message of these original address data, (x y) is the coordinate information of these original address data.
Step S11 ': at certain concrete address information, judge whether described address information meets the requirement of candidate's normal form address,, then directly enter step S15 ', if the undesirable step S12 ' that then enters if meet the requirements.
Step S12 ': i.e. address statistical study step is used for described a large amount of address information and carries out statistical study based on existing address date resources bank, and based on the frequency that certain address information occurs, produce candidate's normal form address in all address informations.Need the reason of this step to be, described original address information might not all be complete candidate's normal form address that can be directly applied for step S15 '.Very common may be, also imperfect by the original address information that all multipaths (for example internet data collection approach) get access to, described imperfect address information does not also meet the call format of candidate's normal form address of step S15 ', need further handle based on the method for statistical study, described statistical analysis technique is: first address information before the identification unknown address information; Second address information after the identification unknown address information; Address style information in the address date resources bank in the middle of described first address information of statistics and second address information, and calculate the probability that described address style information occurs; Address style information that probability is the highest and preset threshold are relatively.Illustrate: if original address information is " No. 13, Xisi, famous beauty in the late Spring and Autumn Period alleyway, Zongguancun Street, Haidian District, Beijing City ", then this address is discerned in the past backward, when " Haidian District, Beijing City ", " street, Zhong Guan-cun " all can identify it by the address date resources bank is address, administrative region and road class address, and " Xisi, famous beauty in the late Spring and Autumn Period alleyway " is in the time of can not discerning, then carry out reversal of identification, promptly from after forward identification, when " No. 13 " are identified is " during the doorplate address ", then in described address date resources bank, add up, which kind of address style information middle address of inserting be to doorplate class address in statistics road class address, if after statistics, the probability of finding class address, alleyway is the highest, and relatively this probability and pre-set threshold, enter S13 ' step.
Step S13 ': if described probability is higher than preset threshold, then described address information is used as candidate's normal form address, and directly enters step S15 '; If described probability is lower than preset threshold, then this address information not can be used as the use of candidate's normal form address, and enters step S14 '.
Step S14 ': participle model participle step is used for the described address information that still can't handle through step S13 ' is analyzed, and based on predefined participle model, produces candidate's normal form address.In an embodiment of the invention, be based on condition random field (conditional random field, CRF) method of study expectation produces described " participle model ", carry out participle and produce candidate's normal form address by this participle model, can export the participle and the attribute labeling information of described candidate's normal form address simultaneously.
Step S15 ': collect the candidate's normal form address information that produces by step S11 ', step S13 ', step S14 '.What deserves to be mentioned is: same original address data, the candidate's normal form address that produces may be a plurality of, the form of described candidate's normal form address comprises text message and coordinate information, for example: candidate's normal form address of complete original address data " No. 3 HaiLong Building Building B, Zongguancun Street, Haidian District, Beijing City 213-406 (x; y) " output after treatment may comprise two: one, road class candidate normal form address, comprise text message " No. 3, Zongguancun Street, Haidian District, Beijing City " and coordinate information (x, y); Its two, terrestrial reference class candidate normal form address comprises text message " Haidian District, Beijing City dragon mansion " and coordinate information (x, y), wherein (x is constant y), is representing above-mentioned road class candidate normal form address and terrestrial reference class candidate normal form address to come down to same specific address.
Step S16 ': the normal form address generates step, is used for processings of classifying of described candidate's normal form address, and is referred in the normal form address database of correspondence.The address information that meets the normal form database format that described " candidate's normal form address " refer to obtains by step S11 ', step S13 ', step S14 '.These address informations will be classified in the address style below the corresponding subaddressing layer according to the described call format of Fig. 9 of the present invention and go, and this part will have detailed introduction when back segment text description Fig. 9.
What deserves to be mentioned is: in another embodiment of the present invention, if undesirable in S11 ' step, also can directly enter S14 ' step, its specifically judge or processing procedure in the above-mentioned steps unanimity, do not repeat them here.
Correspondingly, with reference to figure 6, the construction device of address database of the present invention can comprise in the ground expansion: raw data acquisition module 10, address statistical analysis module 11, participle model module 12, and normal form address generation module 13.
Raw data acquisition module 10 is used to obtain the original address data that comprise a large amount of address informations.Wherein, these original address data comprise the text message and the coordinate information of address, described text message refers to any specific address of one of them at least that can represent road class address, regional class address, terrestrial reference class address, and described coordinate information refers to the concrete coordinate points of these original address data.For example: the original address data are " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City+(x; y) ", and wherein, " No. 10 Baidu's mansions in ten streets, ShangDi, Haidian District, BeiJing City " are the text message of these original address data, (x y) is the coordinate information of these original address data.
It comprises statistical analysis unit and address date data bank unit (not shown) address statistical analysis module 11.And be used for described a large amount of address informations are carried out statistical study based on existing address date resources bank, and, produce normal form address or candidate's normal form address in all address informations based on the frequency that certain address information occurs.Need the reason of this module to be, described original address information might not all be complete candidate's normal form address that can directly be suitable for or normal form address.Very common may be, also imperfect by the original address information that all multipaths (for example internet data collection approach) get access to, described imperfect address information does not also meet candidate's normal form address or the call format of normal form address, need further handle based on statistical analysis module: first address information before the identification unknown address information; Second address information after the identification unknown address information; Address style information in the address date resources bank in the middle of described first address information of statistics and second address information, and calculate the probability that described address style information occurs; Address style information that probability is the highest and preset threshold are relatively.Illustrate: if original address information is " No. 13, Xisi, famous beauty in the late Spring and Autumn Period alleyway, Zongguancun Street, Haidian District, Beijing City ", then this address is discerned in the past backward, when " Haidian District, Beijing City ", " street, Zhong Guan-cun " all can identify it by the address date resources bank is address, administrative region and road class address, and " Xisi, famous beauty in the late Spring and Autumn Period alleyway " is in the time of can not discerning, then carry out reversal of identification, promptly from after forward identification, when " No. 13 " are identified is " during the doorplate address ", then in described address date resources bank, add up, which kind of address style information middle address of inserting be to doorplate class address in statistics road class address, if after statistics, the probability of finding class address, alleyway is the highest, and relatively this probability and pre-set threshold, judge whether this address information as candidate's normal form address or normal form address.
Normal form address generation module 13 is used for described word segmentation result is formed candidate normal form address or normal form address and deposited address database in.The address information that meets the normal form database format that described " candidate's normal form address " or " normal form address " refer to obtain by raw data acquisition module 10, address statistical analysis module 11, participle model module 12, normal form address generation module 13.These address informations will originally be classified in the address style below the corresponding subaddressing layer according to the described call format of Fig. 9 of the present invention and go, and this part will have detailed introduction when back segment text description Fig. 9.
With reference to figure 7, normal form of the present invention address generation module comprises that address base sets up unit 100, address receiving element 101, and address sort unit 102.
Address base sets up unit 100 to be used to set up the standard normal form address base of a tree structure, and this tree-shaped standard normal form address base has some branches, and the end of each branch has at least one leaf node.Concrete structure about described standard normal form address base can be done detailed description again in conjunction with Fig. 9 in subsequent paragraph.
Address receiving element 101 is used to receive normal form address or candidate's normal form address.After address base is set up the criteria for classification of having set up standard normal form address in the unit 100, any one candidate's normal form address or normal form address that receives and be input to through address receiving element 101 in the described standard normal form address base can find corresponding position to deposit in theory, judges that described deposit position finishes by described address sort unit 102.
Correspondingly, with reference to figure 8, the normal form address generating method of normal form address generation module correspondence can be decomposed into: address base is set up step S100, address input step S101, and address sort step S102.
Address base sets up step S100 to set up the standard normal form address base of a tree structure, and this tree-shaped standard normal form address base has some branches, and the end of each branch has at least one leaf node.Concrete structure about described standard normal form address base can be done detailed description again in conjunction with Fig. 9 in subsequent paragraph.Do not repeat them here.
Address input step S101 receives normal form address or candidate's normal form address.After address base is set up the criteria for classification of having set up standard normal form address in the unit 100, any one candidate's normal form address or normal form address that receives and be input to through address receiving element 101 in the described standard normal form address base can find corresponding position to deposit in theory, judges that described deposit position finishes by described address sort unit 102.
Address sort step S102 analyzes described normal form address or candidate's normal form address, and is classified into certain branch of described standard normal form address base.
With reference to figure 9, for more clearly being described, address base sets up the concrete structure in the storehouse, normal form normal address in the unit 100, and below to appoint the storehouse, normal form normal address of the electronic chart in the republic administrative region be that example is done detailed description to set up China.In general, Zhong Guo administrative division comprises four levels: first level is province/autonomous region/municipality directly under the Central Government; Second level is city/autonomous prefecture; The 3rd level is district/county; The 4th level is township/town/street.These four levels are relatively-stationary, and its quantity and title are can add up easily corresponding to the region name of various places to obtain.Therefore, in standard normal form address base, these four levels are merged the ground floor of the tree structure that is generically and collectively referred to as standard normal form address base, promptly territory, administrative area layer 90 correspondingly is designated first level and is province/autonomous region/municipality directly under the Central Government 91 in Fig. 9; Second level is city/autonomous prefecture 92; The 3rd level is district/county 93; The 4th level is township/town/street 94.Specific address title below the 4th level is numerous and complicated, vast as the open sea then, yet, this many specific address title can be summed up as three kinds of address styles: i.e. road class address 81, regional class address 82 and terrestrial reference class address 83, this three classes address is generically and collectively referred to as the second layer of the tree structure of standard normal form address base, and promptly the subaddressing layer 80.Certainly, this subaddressing layer 80 also can include only one of them of above-mentioned three kinds of addresses or wherein two.The specific address that described road class address 81 is used to define with headed by the road, for example: a road b number, a road b lane etc.The specific address that described regional class address 82 is used to define with headed by the sub-district, for example: b of a sub-district, a sub-district b phase.Described terrestrial reference class address 83 is used to define a concrete location point, for example: a mansion, b park etc.What deserves to be mentioned is: above-mentioned level is divided just based on an embodiment of the invention, promptly the level of address in the People's Republic of China (PRC) administrative region is divided, certainly, level for other countries or area is divided, can be different with above-mentioned division, it is so long as get final product based on the division of address logic level, and described address logic level can be regarded as, and is contracted to less address realm step by step from a bigger address realm.
With reference to shown in Figure 10, for participle model of the present invention obtains by the following method:
S1000, obtain the original address data;
S1001, some original address data are become language material according to the normal form standard participle of formulating, wherein, so-called " normal form standard " is described in above-mentioned Fig. 9.
S1002, based on described language material, the mode by machine learning makes up participle model.Wherein, the mode of machine learning can be condition random field (conditional random field, CRF) method of study language material produces described " predefined participle model ", carry out participle by this participle model, can export the participle and the attribute labeling information of described normal form address or candidate's normal form address simultaneously.Please refer to introduction in Baidu's encyclopaedia (http://baike.baidu.com/view/2510459.htm) about the principle of work of CRF, do not repeat them here.It should be noted that, in other embodiments of the present invention, described address learning model also can pass through support vector machine (Support Vector Machine, SVM) or hidden Markov model (Hidden Markov Model, HMM) method is set up, the principle of these methods all in the industry cycle is applied, and does not repeat them here.
Correspondingly, with reference to shown in Figure 11, for construction participle model of the present invention comprises with lower module:
Address date acquisition module 1000: be used to obtain the original address data;
Generate language material module 1001: be used for some original address data are become language material according to the normal form standard participle of formulating, wherein, so-called " normal form standard " is described in above-mentioned Fig. 9.
Study language material module 1002: based on described language material, the mode by machine learning makes up this participle model.Wherein, the mode of machine learning can be condition random field (conditional random field, CRF) method of study language material produces described " predefined participle model ", carry out participle by this participle model, can export the participle and the attribute labeling information of described normal form address or candidate's normal form address simultaneously.Please refer to introduction in Baidu's encyclopaedia (http://baike.baidu.com/view/2510459.htm) about the principle of work of CRF, do not repeat them here.It should be noted that, in other embodiments of the present invention, described address learning model also can pass through support vector machine (Support Vector Machine, SVM) or hidden Markov model (Hidden Markov Model, HMM) method is set up, the principle of these methods all in the industry cycle is applied, and does not repeat them here.
By above description, can draw, the utilization participle model is treated Categories Address by address properties and is cut speech, and is stored to standard normal form address database, make that address database construction efficient of the present invention is higher, and accuracy rate is also higher.
Be to be understood that, though this instructions is described according to embodiment, but be not that each embodiment only comprises an independently technical scheme, this narrating mode of instructions only is for clarity sake, those skilled in the art should make instructions as a whole, technical scheme among each embodiment also can form other embodiments that it will be appreciated by those skilled in the art that through appropriate combination.
Above listed a series of detailed description only is specifying at feasibility embodiment of the present invention; they are not in order to restriction protection scope of the present invention, allly do not break away from equivalent embodiment or the change that skill spirit of the present invention done and all should be included within protection scope of the present invention.
Claims (67)
1. the constructing method of a normal form address database is characterized in that, this method comprises:
S1, obtain the original address data;
S2, participle model is to described original address data qualification and produce the normal form address;
S3, described normal form address is sorted out into the normal form address database.
2. the method for claim 1 is characterized in that, described S2 may further comprise the steps:
Described participle model carries out participle to described original address;
Produce described normal form address by described participle.
3. the method for claim 1 is characterized in that, described S1 comprises:
Judge described original address data whether with the format match of normal form address;
If coupling is then directly exported described original address data as the normal form address.
4. the method for claim 1 is characterized in that, described S1 comprises:
Judge described original address data whether with the format match of normal form address;
If do not match, then enter S2.
5. the method for claim 1 is characterized in that, also comprises statistical study step in address behind described S1: described address statistical study step is carried out statistical study to the original address data, produces the normal form address.
6. method as claimed in claim 5 is characterized in that, described S1 comprises:
Judge described original address data whether with the format match of normal form address;
If do not match, then enter address statistical study step.
7. method as claimed in claim 5 is characterized in that, described address statistical study step comprises:
First address information before the identification unknown address information;
Second address information after the identification unknown address information;
Address style information in the address date resources bank in the middle of described first address information of statistics and second address information, and calculate the probability that described address style information occurs;
Address style information that probability is the highest and preset threshold relatively if be higher than described threshold value, then produce the normal form address with described address style information in conjunction with first address information and second address information.
8. method as claimed in claim 7 is characterized in that, described address statistical study step comprises:
If be lower than described threshold value, then enter the S2 step.
9. the method for claim 1 is characterized in that, and is before described S2, further comprising the steps of:
Address date obtains: obtain the original address data;
Generate language material: some described original address data are become language material according to the normal form standard participle of formulating;
Study language material:, make up described participle model by the machine learning mode based on described language material.
10. method as claimed in claim 9 is characterized in that, described machine learning mode is the condition random field type.
11. method as claimed in claim 9 is characterized in that, described machine learning mode is the support vector machine mode.
12. method as claimed in claim 9 is characterized in that, described machine learning mode is a hidden Markov model.
13. the method for claim 1 is characterized in that, described S3 specifically may further comprise the steps:
Address base is set up step: the normal form address base of setting up a tree structure;
Address input step: receive described normal form address;
Address sort step: analyze described normal form address, and described normal form address is sorted out to described normal form address base according to described tree structure.
14. method as claimed in claim 13 is characterized in that, described normal form address base has some branches, and the end of each branch has at least one leaf node.
15. method as claimed in claim 14 is characterized in that, described address sort step also comprises described normal form address sort in the described standard normal form address base at least one leaf node.
16. method as claimed in claim 13 is characterized in that, the tree structure of described normal form address base comprises based on the administrative region layer of address logic level and subaddressing layer.
17. method as claimed in claim 16 is characterized in that, described administrative region layer comprises four levels: first level is province/autonomous region/municipality directly under the Central Government; Second level is city/autonomous prefecture; The 3rd level is district/county; The 4th level is township/town/street.
18. method as claimed in claim 16 is characterized in that, described subaddressing layer comprises one of them of road class address, regional class address and terrestrial reference class address at least.
19. method as claimed in claim 18 is characterized in that, described road class address is used to define the specific address with headed by the road.
20. method as claimed in claim 18 is characterized in that, described regional class address is used to define the specific address with headed by the sub-district.
21. method as claimed in claim 18 is characterized in that, described terrestrial reference class address is used to define a concrete location point.
22. the constructing method of a normal form address database is characterized in that, this method comprises:
S1, obtain the original address data;
S2, participle model is to described original address data qualification and produce candidate's normal form address;
S3, described candidate's normal form address is sorted out into the normal form address database.
23. method as claimed in claim 22 is characterized in that, described S2 may further comprise the steps:
Described participle model carries out participle to described original address;
Produce candidate's normal form address by described participle.
24., it is characterized in that described S3 may further comprise the steps as claim 22 or 23 described methods:
With described candidate's normal form address process is the normal form address;
Described normal form address is sorted out into the normal form address database.
25. method as claimed in claim 22 is characterized in that, described S1 comprises:
Judge described original address data whether with the format match of candidate's normal form address;
If coupling is then directly exported described original address data as candidate's normal form address.
26. method as claimed in claim 22 is characterized in that, described S1 comprises:
Judge described original address data whether with the format match of candidate's normal form address;
If do not match, then enter S2.
27. method as claimed in claim 22 is characterized in that, also comprises statistical study step in address behind described S1: described address statistical study step is carried out statistical study to the original address data, produces the normal form address.
28. method as claimed in claim 27 is characterized in that, described S1 comprises:
Judge described original address data whether with the format match of candidate's normal form address;
If do not match, then enter described address statistical study step.
29. method as claimed in claim 27 is characterized in that, described address statistical study step comprises:
First address information before the identification unknown address information;
Second address information after the identification unknown address information;
Address style information in the address date resources bank in the middle of described first address information of statistics and second address information, and calculate the probability that described address style information occurs;
Address style information that probability is the highest and preset threshold relatively if be higher than described threshold value, then produce candidate's normal form address with described address style information in conjunction with first address information and second address information.
30. method as claimed in claim 29 is characterized in that, described address statistical study step comprises:
If be lower than described threshold value, then enter the S2 step.
31. method as claimed in claim 22 is characterized in that, and is before described S2, further comprising the steps of:
Address date obtains: obtain the original address data;
Generate language material: some described original address data are become language material according to the normal form standard participle of formulating;
Study language material:, make up described participle model by the machine learning mode based on described language material.
32. method as claimed in claim 31 is characterized in that, described machine learning mode is the condition random field type.
33. method as claimed in claim 31 is characterized in that, described machine learning mode is the support vector machine mode.
34. method as claimed in claim 31 is characterized in that, described machine learning mode is a hidden Markov model.
35. method as claimed in claim 22 is characterized in that, and is further comprising the steps of before the described S3:
Address base is set up step: the normal form address base of setting up a tree structure;
Address input step: receive described normal form address;
Address sort step: analyze described normal form address, and described normal form address is sorted out to described normal form address base according to described tree structure.
36. method as claimed in claim 35 is characterized in that, described normal form address base has some branches, and the end of each branch has at least one leaf node.
37. method as claimed in claim 36 is characterized in that, described address sort step also comprises described normal form address sort in the described standard normal form address base at least one leaf node.
38. method as claimed in claim 35 is characterized in that, the tree structure of described normal form address base comprises based on the administrative region layer of address logic level and subaddressing layer.
39. method as claimed in claim 38 is characterized in that, described administrative region layer comprises four levels: first level is province/autonomous region/municipality directly under the Central Government; Second level is city/autonomous prefecture; The 3rd level is district/county; The 4th level is township/town/street.
40. method as claimed in claim 38 is characterized in that, described subaddressing layer comprises one of them of road class address, regional class address and terrestrial reference class address at least.
41. method as claimed in claim 40 is characterized in that, described road class address is used to define the specific address with headed by the road.
42. method as claimed in claim 40 is characterized in that, described regional class address is used to define the specific address with headed by the sub-district.
43. method as claimed in claim 40 is characterized in that, described terrestrial reference class address is used to define a concrete location point.
44. an address database construction device is characterized in that this device comprises:
The raw data acquisition module is used to obtain the original address data;
The participle model module is used for described original address data qualification and produces the normal form address;
Normal form address generation module is used for described normal form address is sorted out into the normal form address database.
45. device as claimed in claim 44 is characterized in that, the original address information in the described raw data acquisition module comprises text message and coordinate information.
46. device as claimed in claim 44 is characterized in that, described address database construction device also comprises the address statistical analysis module, is used for the original address data are carried out statistical study, produces the normal form address.
47. device as claimed in claim 44 is characterized in that, described address database construction device also comprises:
Generate the language material module: be used for some described original address data are become language material according to the normal form standard participle of formulating;
Study language material module: be used for making up described participle model by the machine learning mode based on described language material.
48. device as claimed in claim 47 is characterized in that, described machine learning mode is the condition random field type.
49. device as claimed in claim 47 is characterized in that, described machine learning mode is the support vector machine mode.
50. device as claimed in claim 47 is characterized in that, described machine learning mode is a hidden Markov model.
51. device as claimed in claim 44 is characterized in that, described normal form address generation module comprises:
Address base is set up the unit, is used to set up the normal form address base of a tree structure;
Address input unit is used to receive described normal form address;
The address sort unit is used to analyze described normal form address, and described normal form address is sorted out to described normal form address base according to described tree structure.
52. device as claimed in claim 51 is characterized in that, described normal form address base has some branches, and the end of each branch has at least one leaf node.
53. device as claimed in claim 51 is characterized in that, the tree structure of described normal form address base comprises based on the administrative region layer of address logic level and subaddressing layer.
54. device as claimed in claim 53 is characterized in that, described administrative region layer comprises four levels: first level is province/autonomous region/municipality directly under the Central Government; Second level is city/autonomous prefecture; The 3rd level is district/county; The 4th level is township/town/street.
55. device as claimed in claim 53 is characterized in that, described subaddressing layer comprises one of them of road class address, regional class address and terrestrial reference class address at least.
56. an address database construction device is characterized in that this device comprises:
The raw data acquisition module is used to obtain the original address data;
The participle model module, participle model is to described original address data qualification and produce candidate's normal form address;
Normal form address generation module is used for described candidate's normal form address is sorted out into the normal form address database.
57. device as claimed in claim 56 is characterized in that, the original address information in the described raw data acquisition module comprises text message and coordinate information.
58. device as claimed in claim 56 is characterized in that, described address database construction device also comprises the address statistical analysis module, is used for the original address data are carried out statistical study, produces candidate's normal form address.
59. device as claimed in claim 56 is characterized in that, described address database construction device also comprises:
Generate the language material module: be used for some described original address data are become language material according to the normal form standard participle of formulating;
Study language material module: be used for making up described participle model by the machine learning mode based on described language material.
60. device as claimed in claim 59 is characterized in that, described machine learning mode is the condition random field type.
61. device as claimed in claim 59 is characterized in that, described machine learning mode is the support vector machine mode.
62. device as claimed in claim 59 is characterized in that, described machine learning mode is a hidden Markov model.
63. device as claimed in claim 56 is characterized in that, described normal form address generation module comprises:
Address base is set up the unit, is used to set up the normal form address base of a tree structure;
Address input unit is used to receive described candidate's normal form address;
The address sort unit is used to analyze described candidate's normal form address, and described candidate's normal form address is sorted out to described normal form address base according to described tree structure.
64., it is characterized in that described normal form address base has some branches as the described device of claim 63, the end of each branch has at least one leaf node.
65., it is characterized in that the tree structure of described normal form address base comprises based on the administrative region layer of address logic level and subaddressing layer as the described device of claim 63.
66., it is characterized in that described administrative region layer comprises four levels as the described device of claim 65: first level is province/autonomous region/municipality directly under the Central Government; Second level is city/autonomous prefecture; The 3rd level is district/county; The 4th level is township/town/street.
67., it is characterized in that described subaddressing layer comprises one of them of road class address, regional class address and terrestrial reference class address at least as the described device of claim 65.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010540110 CN102024024B (en) | 2010-11-10 | 2010-11-10 | Method and device for constructing address database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010540110 CN102024024B (en) | 2010-11-10 | 2010-11-10 | Method and device for constructing address database |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102024024A true CN102024024A (en) | 2011-04-20 |
CN102024024B CN102024024B (en) | 2013-07-10 |
Family
ID=43865322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201010540110 Active CN102024024B (en) | 2010-11-10 | 2010-11-10 | Method and device for constructing address database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102024024B (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102841920A (en) * | 2012-06-30 | 2012-12-26 | 北京百度网讯科技有限公司 | Method and device for extracting webpage frame information |
CN103514234A (en) * | 2012-06-30 | 2014-01-15 | 北京百度网讯科技有限公司 | Method and device for extracting page information |
CN103678708A (en) * | 2013-12-30 | 2014-03-26 | 小米科技有限责任公司 | Method and device for recognizing preset addresses |
CN105630933A (en) * | 2015-12-22 | 2016-06-01 | 安徽瑞信软件有限公司 | Address data management method |
WO2016127904A1 (en) * | 2015-02-13 | 2016-08-18 | 阿里巴巴集团控股有限公司 | Text address processing method and apparatus |
CN106296344A (en) * | 2016-07-29 | 2017-01-04 | 北京小米移动软件有限公司 | Maliciously address recognition methods and device |
CN106708898A (en) * | 2015-11-17 | 2017-05-24 | 方正国际软件(北京)有限公司 | Method and device for showing building structures |
CN106875264A (en) * | 2017-03-31 | 2017-06-20 | 北京京东尚科信息技术有限公司 | Sequence information management method, device and order sorting system |
CN107423295A (en) * | 2016-05-24 | 2017-12-01 | 张向利 | A kind of magnanimity address date intelligence fast matching method |
CN107527312A (en) * | 2016-06-22 | 2017-12-29 | 顺丰科技有限公司 | Express mail address process system and method |
CN107577744A (en) * | 2017-08-28 | 2018-01-12 | 苏州科技大学 | Nonstandard Address automatic matching model, matching process and method for establishing model |
CN108204816A (en) * | 2016-12-20 | 2018-06-26 | 北京四维图新科技股份有限公司 | Address process of refinement method and device, logistics navigation system and the terminal of location navigation |
CN109255565A (en) * | 2017-07-14 | 2019-01-22 | 菜鸟智能物流控股有限公司 | Address attribution identification and logistics task distribution method and device |
CN109960795A (en) * | 2019-02-18 | 2019-07-02 | 平安科技(深圳)有限公司 | A kind of address information standardized method, device, computer equipment and storage medium |
CN110832476A (en) * | 2017-07-24 | 2020-02-21 | 北京嘀嘀无限科技发展有限公司 | System and method for providing information for on-demand services |
CN110889769A (en) * | 2018-08-21 | 2020-03-17 | 湖南共睹互联网科技有限责任公司 | Transaction guarantee association method, computer device and computer readable storage medium |
CN111274802A (en) * | 2018-11-19 | 2020-06-12 | 阿里巴巴集团控股有限公司 | Validity judgment method and device for address data |
CN111353309A (en) * | 2019-12-25 | 2020-06-30 | 北京合力亿捷科技股份有限公司 | Method and system for processing communication quality complaint address based on text analysis |
CN111353011A (en) * | 2020-02-27 | 2020-06-30 | 北京市商汤科技开发有限公司 | Location data set, building method and device thereof, and data processing method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101393544A (en) * | 2008-10-07 | 2009-03-25 | 南京师范大学 | Chinese address semantic parsing method facing address encode |
CN101458702A (en) * | 2007-12-13 | 2009-06-17 | 韩国电子通信研究院 | Apparatus for building address database and method thereof |
CN101719128A (en) * | 2009-12-31 | 2010-06-02 | 浙江工业大学 | Fuzzy matching-based Chinese geo-code determination method |
-
2010
- 2010-11-10 CN CN 201010540110 patent/CN102024024B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101458702A (en) * | 2007-12-13 | 2009-06-17 | 韩国电子通信研究院 | Apparatus for building address database and method thereof |
CN101393544A (en) * | 2008-10-07 | 2009-03-25 | 南京师范大学 | Chinese address semantic parsing method facing address encode |
CN101719128A (en) * | 2009-12-31 | 2010-06-02 | 浙江工业大学 | Fuzzy matching-based Chinese geo-code determination method |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103514234A (en) * | 2012-06-30 | 2014-01-15 | 北京百度网讯科技有限公司 | Method and device for extracting page information |
CN103514234B (en) * | 2012-06-30 | 2018-10-16 | 北京百度网讯科技有限公司 | A kind of page info extracting method and device |
CN102841920A (en) * | 2012-06-30 | 2012-12-26 | 北京百度网讯科技有限公司 | Method and device for extracting webpage frame information |
CN102841920B (en) * | 2012-06-30 | 2017-05-10 | 北京百度网讯科技有限公司 | Method and device for extracting webpage frame information |
CN103678708B (en) * | 2013-12-30 | 2017-01-18 | 小米科技有限责任公司 | Method and device for recognizing preset addresses |
CN103678708A (en) * | 2013-12-30 | 2014-03-26 | 小米科技有限责任公司 | Method and device for recognizing preset addresses |
WO2016127904A1 (en) * | 2015-02-13 | 2016-08-18 | 阿里巴巴集团控股有限公司 | Text address processing method and apparatus |
CN105988988A (en) * | 2015-02-13 | 2016-10-05 | 阿里巴巴集团控股有限公司 | Method and device for processing text address |
US10795964B2 (en) | 2015-02-13 | 2020-10-06 | Alibaba Group Holding Limited | Text address processing method and apparatus |
EP3258397A4 (en) * | 2015-02-13 | 2017-12-20 | Alibaba Group Holding Limited | Text address processing method and apparatus |
CN106708898A (en) * | 2015-11-17 | 2017-05-24 | 方正国际软件(北京)有限公司 | Method and device for showing building structures |
CN106708898B (en) * | 2015-11-17 | 2021-03-19 | 方正国际软件(北京)有限公司 | Method and device for showing building structure |
CN105630933A (en) * | 2015-12-22 | 2016-06-01 | 安徽瑞信软件有限公司 | Address data management method |
CN107423295A (en) * | 2016-05-24 | 2017-12-01 | 张向利 | A kind of magnanimity address date intelligence fast matching method |
CN107527312A (en) * | 2016-06-22 | 2017-12-29 | 顺丰科技有限公司 | Express mail address process system and method |
CN106296344A (en) * | 2016-07-29 | 2017-01-04 | 北京小米移动软件有限公司 | Maliciously address recognition methods and device |
CN106296344B (en) * | 2016-07-29 | 2020-01-07 | 北京小米移动软件有限公司 | Malicious address identification method and device |
CN108204816A (en) * | 2016-12-20 | 2018-06-26 | 北京四维图新科技股份有限公司 | Address process of refinement method and device, logistics navigation system and the terminal of location navigation |
CN108204816B (en) * | 2016-12-20 | 2020-06-02 | 北京四维图新科技股份有限公司 | Address refinement processing method and device for positioning navigation, logistics navigation system and terminal |
CN106875264A (en) * | 2017-03-31 | 2017-06-20 | 北京京东尚科信息技术有限公司 | Sequence information management method, device and order sorting system |
CN109255565A (en) * | 2017-07-14 | 2019-01-22 | 菜鸟智能物流控股有限公司 | Address attribution identification and logistics task distribution method and device |
CN110832476A (en) * | 2017-07-24 | 2020-02-21 | 北京嘀嘀无限科技发展有限公司 | System and method for providing information for on-demand services |
CN107577744A (en) * | 2017-08-28 | 2018-01-12 | 苏州科技大学 | Nonstandard Address automatic matching model, matching process and method for establishing model |
CN110889769A (en) * | 2018-08-21 | 2020-03-17 | 湖南共睹互联网科技有限责任公司 | Transaction guarantee association method, computer device and computer readable storage medium |
CN111274802A (en) * | 2018-11-19 | 2020-06-12 | 阿里巴巴集团控股有限公司 | Validity judgment method and device for address data |
CN111274802B (en) * | 2018-11-19 | 2023-04-18 | 阿里巴巴集团控股有限公司 | Validity judgment method and device for address data |
CN109960795A (en) * | 2019-02-18 | 2019-07-02 | 平安科技(深圳)有限公司 | A kind of address information standardized method, device, computer equipment and storage medium |
CN109960795B (en) * | 2019-02-18 | 2024-05-07 | 平安科技(深圳)有限公司 | Address information standardization method and device, computer equipment and storage medium |
CN111353309A (en) * | 2019-12-25 | 2020-06-30 | 北京合力亿捷科技股份有限公司 | Method and system for processing communication quality complaint address based on text analysis |
CN111353011A (en) * | 2020-02-27 | 2020-06-30 | 北京市商汤科技开发有限公司 | Location data set, building method and device thereof, and data processing method and device |
CN111353011B (en) * | 2020-02-27 | 2024-05-17 | 北京市商汤科技开发有限公司 | Site data set, establishing method and device thereof, and data processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN102024024B (en) | 2013-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102024024B (en) | Method and device for constructing address database | |
CN101996247B (en) | Method and device for constructing address database | |
CN105718579B (en) | A kind of information-pushing method excavated based on internet log and User Activity identifies | |
CN107291783B (en) | Semantic matching method and intelligent equipment | |
CN109492077A (en) | The petrochemical field answering method and system of knowledge based map | |
CN102419778B (en) | Information searching method for discovering and clustering sub-topics of query statement | |
CN103020293B (en) | A kind of construction method and system of the ontology library of mobile application | |
CN102508859A (en) | Advertisement classification method and device based on webpage characteristic | |
CN105095187A (en) | Search intention identification method and device | |
CN104199857A (en) | Tax document hierarchical classification method based on multi-tag classification | |
CN110781670B (en) | Chinese place name semantic disambiguation method based on encyclopedic knowledge base and word vectors | |
CN109308321A (en) | A kind of knowledge question answering method, knowledge Q-A system and computer readable storage medium | |
CN102495892A (en) | Webpage information extraction method | |
CN103823893A (en) | User comment-based product search method and system | |
CN103268313A (en) | Method and device for semantic analysis of natural language | |
CN104112026A (en) | Short message text classifying method and system | |
CN101980208A (en) | Address query method and system | |
CN103942220A (en) | Method used for intelligently linking work orders with knowledge of knowledge base and suitable for IT operation and maintenance system | |
CN103440287A (en) | Web question-answering retrieval system based on product information structuring | |
CN101984432A (en) | Method and device for constructing address database | |
CN112256845A (en) | Intention recognition method, device, electronic equipment and computer readable storage medium | |
CN103853746A (en) | Word bank generation method and system, input method and input system | |
CN113177101B (en) | User track identification method, device, equipment and storage medium | |
CN108021715A (en) | Isomery tag fusion system based on semantic structure signature analysis | |
US20150012543A1 (en) | Region labeling method and device of data documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |