CN106709065A - Standardization processing method and standardized processing device for address information - Google Patents

Standardization processing method and standardized processing device for address information Download PDF

Info

Publication number
CN106709065A
CN106709065A CN201710038482.3A CN201710038482A CN106709065A CN 106709065 A CN106709065 A CN 106709065A CN 201710038482 A CN201710038482 A CN 201710038482A CN 106709065 A CN106709065 A CN 106709065A
Authority
CN
China
Prior art keywords
address
matching
matched
match
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710038482.3A
Other languages
Chinese (zh)
Other versions
CN106709065B (en
Inventor
许鑫
孙志杰
王莉
巩冬梅
张凌宇
刘晓伟
傅军
朱天博
汤佩霖
秦风圆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
North China Electric Power Research Institute Co Ltd
Electric Power Research Institute of State Grid Jibei Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
North China Electric Power Research Institute Co Ltd
Electric Power Research Institute of State Grid Jibei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, North China Electric Power Research Institute Co Ltd, Electric Power Research Institute of State Grid Jibei Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201710038482.3A priority Critical patent/CN106709065B/en
Publication of CN106709065A publication Critical patent/CN106709065A/en
Application granted granted Critical
Publication of CN106709065B publication Critical patent/CN106709065B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Water Supply & Treatment (AREA)
  • Computational Linguistics (AREA)
  • Public Health (AREA)
  • Remote Sensing (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a standardization processing method and a standardized processing device for address information. The method includes determining a currently matched address range from an address structure data list of a standard address database according to a matching rule tree and administrative division successfully matched in last time; taking the maximum length corresponding to unmatched address fields in to-be-standardized addresses as the maximum word length for each matching, and performing matching in the currently matched address range according to the incidence relations among administrative divisions; if matching fails, performing matching according to the incidence relations among ambiguous addresses, administrative division title incompleteness and standard addresses from an ambiguous address matching list on the basis of the currently matched address range; if matching fails, changing the matching rule tree, re-determining a currently matched address range according to the changed matching rule tree, and performing matching in the currently matched address range; acquiring standard address information corresponding to the address fields according to matching results.

Description

A kind of address information standardization processing method and device
Technical field
The present invention relates to address information processing technology field, more particularly to a kind of address information standardization processing method and dress Put.
Background technology
The construction and grid equipment periodic maintenance and breakdown maintenance of power network can all cause the generation of brown-outs phenomenon, influence Normally production and the daily electricity consumption of resident of electricity consumption enterprise, if frequency of power cut is excessive, can cause to complain the generation of demand. According to the complaint analyze data that Customer Service Center of Electric Power Research Institute of State Grid Jibei Electric Power Co., Ltd. provides, in five major classes The complaint of frequent power failure classification has accounted for complaining 30 or so the percent of total amount in complaint, and indivedual times have even accounted for complaining 40 the percent of total amount, has frequently had a power failure into the main inducing for complaining generation and has complained focus.Therefore, power distribution network frequently has a power failure Phenomenon has turned into block the way stone of the power supply enterprise during lifting service level and CSAT.
At present, for the matching of address information in 95598 fault tickets, manual analysis aspect is rested on, when user is to stopping Electric number of times is excessively produced to be discontented with, and when being complained, business personnel is reported for repairment by the two months internal faults in system queries this area and stopped Electricity and outage information in frequency of power cut with determine whether be frequently have a power failure complain.Do not only existed using artificial enquiry frequency of power cut The problems such as inefficiency, normative difference, and it is higher to staff's skill requirement, it is unfavorable for the transmission and duplication of experience.
The content of the invention
To solve problem of the prior art, the present invention proposes a kind of address information standardization processing method and device, this skill Art scheme accurately identifies village level power unit address, and will complain early warning for frequent power failure class after address specificationsization treatment. Address lack of standardization and fuzzy address are standardized by the method, realizes the power units such as accurate count village, cell, school Frequency of power cut, so as to be applied to the class complaint early warning that frequently has a power failure.
To achieve the above object, the invention provides a kind of address information standardization processing method, including:
According to matched rule tree and the last administrative division that the match is successful from the address structure number of normal address database According to the address realm that current matching is determined in table;
The maximum length corresponding to address field that will do not matched in the address for the treatment of to be normalized is used as each matching Most major term is long, is matched according to the incidence relation of each administrative division in the address realm of the current matching, if matching It is unsuccessful, then the address realm according to current matching from the matching list of ambiguity address according to ambiguity address, administrative division appellation not The incidence relation between normal address is matched entirely, if matching is unsuccessful, matched rule tree is changed, after replacing Matched rule tree and the last administrative division that the match is successful from the address structure tables of data of normal address database again Determine the address realm of current matching, the maximum length corresponding to address field that will do not matched in the address for the treatment of to be normalized Most major term as each matching is long, and the association according to each administrative division in the address realm of the current matching for redefining is closed System is matched;
Result according to each matching obtains the normal address information of corresponding address field.
Preferably, the step of result that the basis is matched every time obtains the normal address information of corresponding field includes:
If every time, the match is successful, the normal address information combination corresponding to the address field that the match is successful one Rise, obtain the normal address corresponding to the address for the treatment of to be normalized.
Preferably, the step of result that the basis is matched every time obtains the normal address information of corresponding field includes:
If without the match is successful in the case of changing matched rule tree, according to the normal address database and ambiguity Be analyzed for the address field not matched in the address for the treatment of to be normalized by address matching list, obtains the mark of corresponding address field Quasi- address information, and the normal address database and ambiguity address matching list are modified.
Preferably, the address format of the matched rule tree is:Province, city, area/county, village/cell;Province, city, area/county, township/ Town, village/cell;Province, city, township/town, village/cell.
Preferably, the data in the normal address database are with the business region table of comparisons with administrative division in knowledge base Basis obtains.
To achieve the above object, present invention also offers a kind of address information standardization device, including:
Match address scope determining unit, for according to matched rule tree and the last administrative division that the match is successful from mark The address realm of current matching is determined in the address structure tables of data of quasi- address database;
Matching unit, makees for the maximum length corresponding to the address field that will not matched in the address for the treatment of to be normalized For the most major term of each matching is long, carried out according to the incidence relation of each administrative division in the address realm of the current matching Match somebody with somebody, if matching is unsuccessful, the address realm according to current matching is from the matching list of ambiguity address according to ambiguity address, administration Incidence relation of the zoning appellation not entirely between normal address is matched, if matching is unsuccessful, changes matched rule tree, According to the matched rule tree after replacing and the last administrative division that the match is successful from the address structure number of normal address database According to the address realm that current matching is redefined in table, corresponding to the address field that will do not matched in the address for the treatment of to be normalized Maximum length it is long as the most major term of each matching, according to each administrative area in the address realm of the current matching for redefining The incidence relation drawn is matched;
Standardisation Cell, the normal address information of corresponding address field is obtained for the result according to matching every time.
Preferably, the Standardisation Cell includes:
First Standardisation Cell, if for the match is successful every time, the mark corresponding to the address field that the match is successful Quasi- address information is combined, and obtains the normal address corresponding to the address for the treatment of to be normalized.
Preferably, the Standardisation Cell includes:
Second Standardisation Cell, if without the match is successful in the case of for changing matched rule tree, according to described Be analyzed for the address field not matched in the address for the treatment of to be normalized by normal address database and ambiguity address matching list, The normal address information of corresponding address field is obtained, and the normal address database and ambiguity address matching list are repaiied Just.
Preferably, the address format of the matched rule tree is:Province, city, area/county, village/cell;Province, city, area/county, township/ Town, village/cell;Province, city, township/town, village/cell.
Preferably, the data in the normal address database are with the business region table of comparisons with administrative division in knowledge base Basis obtains.
Above-mentioned technical proposal has the advantages that:
1st, effective participle of fail address is realized
The address matching process is according to matched rule tree and the last administrative division that the match is successful from normal address number According to the address realm that current matching is determined in the address structure tables of data in storehouse, the design of multistage vocabulary is realized.So as to solve The excessive situation of matching word amount caused by single vocabulary during matching somebody with somebody, and can utilizing the incidence relation of multistage vocabulary, realize will Standard lexicon matching range is minimized.Using the address structure tables of data of normal address database on fuzzy address matched design With the incidence relation of data in the matching list of ambiguity address, the corresponding normal address in quick location ambiguity address effectively solves The matching of fuzzy address.Rules guide address matching process is utilized during address matches, matching times are reduced, improved Matching efficiency.
2nd, it is beneficial to the development of early warning work
The realization of address specificationsization treatment will be helpful to power department and realize that address is tieed up in the statistical analysis of work order data The data statistic analysis of degree, such as:For somewhere power failure data analysis, the analysis of somewhere user preference, somewhere number of services system Meter analysis etc..In addition, the extension of the technology also will be helpful to realize all kinds of outage informations, the analysis of complaint work order information and address Treatment.
Frequently have a power failure the management and data analysis complained, and difficult point is that address fills in lack of standardization, and the technical program is solved The address problem lack of standardization that is run into frequent power failure complaint management and data analysis, is to complain early warning and the reach of service critical point Create condition.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is that the embodiment of the present invention proposes a kind of address information standardization processing method flow chart;
Fig. 2 is address hierarchical structure schematic diagram in the present embodiment;
Fig. 3 is the address structure tables of data schematic diagram of the normal address database of the present embodiment;
Fig. 4 is total algorithm flow chart of the technical program;
Fig. 5 is a kind of address information standardization apparatus function block diagram that the embodiment of the present invention is proposed;
Fig. 6 is a kind of address information standardization device Plays Elementary Function frame that the embodiment of the present invention is proposed Figure.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
The operation principle of the technical program is:The technical program is by building the address structure based on normal address database The Chinese word segmentation matching process of tables of data, realizes the participle of address, matches, collects, final accurate count frequency of power cut.In number According to processing procedure, word restriction long is carried out using the field length of the address structure tables of data of the normal address database for building, The address realm that will be matched next time is reduced by the administrative division of matched rule tree and last matching, will be to be normalized The address for the treatment of is matched in the address structure tables of data of corresponding normal address database, if it fails to match for address, Then the address for the treatment of to be normalized is matched in the matching list of corresponding ambiguity address, if can't the match is successful, Pending storehouse is exported, artificial correction is treated;Manually carry out analysis of causes amendment and improve standard database or addition ambiguity table data, Terminate computing after all the match is successful, return to the address that the match is successful, finally make address information consolidation form, form early warning number According to source, and it is pushed to warning data storehouse.
Based on above-mentioned operation principle, the embodiment of the present invention proposes a kind of address information standardization processing method, such as Fig. 1 institutes Show.Including:
Step 101):According to matched rule tree and the last administrative division that the match is successful from the ground of normal address database The address realm of current matching is determined in the structured data table of location;
In the present embodiment, normal address database is mainly as participle matching is provided that standard words are long and matching value, therefore The address structure of the current treatment to be normalized of analysis is needed, the division of each administrative region is specified, then classification builds corresponding Normal data table.
As shown in Fig. 2 being address hierarchical structure schematic diagram in the present embodiment.Address is analyzed, it is current to be normalized With province, city, area/county, township/town/subdistrict office, village/cell as structure in the address information for the treatment of.Therefore, for above number Address hierarchical structure schematic diagram is built according to structure.Wherein, province to city, city to area/county, area/county to township/town/subdistrict office, Township/town/subdistrict office is analyzed by above address structure and builds standard as shown in Figure 3 to village/be all one-to-many relation The address structure tables of data of address database.The data of normal address database are with administrative region in knowledge base and business region pair Based on according to table, constantly carried out according to match condition improving supplement in matching process is carried out.
When client carries out troublshooting, it is contemplated that 95598 seat personnel format write problems, in order to improve matching effect Rate, is easy to be matched according to current address, the address in combing 2015 and 2 years 2016 troublshooting work order data Information, has sorted out all format writes of address, see the table below shown in 1.
Table 1
For the ease of representing, from table 2 below, by the province in the database of normal address, city, area/county, township/town/neighbourhood committee At thing, village/cell is numbered.
Table 2
Table name Save table City's table Area/county Power supply unit Township/town/subdistrict office Village/cell table
Numbering 1 2 3 4 5 6
It is defined to reporting the single address matched rule of work for repairment using numbering, obtains three kinds of matched rule trees, see the table below 3. Matched rule tree is respectively 12356,1236,1256.It is first when being matched to address by taking matched rule tree one in table 3 as an example First, data in province's table are carried out with matching operation, after saving table the match is successful, then matches city's table, matched successively, after the completion of matching Terminate computing, return to the address of specification.But when area/county (numbering 3) is matched, it fails to match when matched rule tree one, then should Matching process directly continues to match according to matched rule tree two, until matching is completed.If run into computing implementation procedure many Individual branch then give tacit consent to by matched rule tree one, matched rule tree two, matched rule tree three sort before and after perform successively.
Table 3
Matched rule tree Matched rule tree one Matched rule tree two Matched rule tree three
Matching list is numbered 12356 1236 1256
Step 102):The maximum length corresponding to address field that will do not matched in the address for the treatment of to be normalized is used as every The most major term of secondary matching is long, is matched according to the incidence relation of each administrative division in the address realm of the current matching, If matching is unsuccessful, the address realm according to current matching is from the matching list of ambiguity address according to ambiguity address, administrative area Draw incidence relation of the appellation not entirely between normal address to be matched, if matching is unsuccessful, change matched rule tree, root According to the matched rule tree after replacing and the last administrative division that the match is successful from the address structure data of normal address database The address realm of current matching is redefined in table, corresponding to the address field that will do not matched in the address for the treatment of to be normalized Maximum length is long as the most major term of each matching, according to each administrative division in the address realm of the current matching for redefining Incidence relation matched;
In this step, because in routine duties, 95598 customer service employees are directly made a report on according to user's oral account Address information, thus the address date for obtaining some have that expression is fuzzy, the incomplete situation of address preparation.By to the registered permanent residence State and make a report on address information and be analyzed, fuzzy address can be divided into the fuzzy address that can be matched and the fuzzy address two that can not be matched Class.For the fuzzy address that can be matched, the rate that can improve that the match is successful by adding some matched rules.The mould that can be matched Paste address is broadly divided into:Ambiguity address and administrative division appellation be not complete.Proposed for both addresses, in the matching algorithm as follows Method:
One is, builds ambiguity address matching list.According to ambiguity address, administrative division appellation not entirely between normal address Incidence relation builds tables of data.When address matches corresponding administrative division and cannot be in the address structure of normal address database When the match is successful in tables of data, the data that there is incidence relation in the matching list of ambiguity address can be matched, according to matching feelings Condition is obtained a result.Two are, manually improve ambiguity address matching list content.In the matching process, if matched using ambiguity address Table is no, and the match is successful, then artificial carding process gradual perfection ambiguity address matching list.
Can be to solve the problems, such as the fuzzy address of matching although some redundancies in the setting of ambiguity address matching list, so that The match is successful the rate of raising.
For example:" the prosperous home in Chengde City, Hebei Province Shuangluan District double tower mountain hundred " " Chengde City, Hebei Province is double relative to normal address The prosperous home in Luan area double tower mountain town hundred " has lacked " town " this administrative division appellation, belongs to administrative division appellation incomplete.By over the ground The analysis of location, when the administrative division in " town " is matched, " town " information to being associated in the matching list of ambiguity address is matched, So as to the match is successful, will not cause that it fails to match because of administrative division appellation.
Step 103):Result according to each matching obtains the normal address information of corresponding address field.Wherein, described The step of obtaining the normal address information of corresponding field according to the result of each matching includes:
If every time, the match is successful, the normal address information combination corresponding to the address field that the match is successful one Rise, obtain the normal address corresponding to the address for the treatment of to be normalized;
If without the match is successful in the case of changing matched rule tree, according to the normal address database and ambiguity Be analyzed for the address field not matched in the address for the treatment of to be normalized by address matching list, obtains the mark of corresponding address field Quasi- address information, and the normal address database and ambiguity address matching list are modified.
As shown in figure 4, being total algorithm flow chart of the technical program.If:Whole address character string to be matched is determined Justice is S (0), and canonical address is SS after the completion of matching.I and Tmaxlen (i) is defined using matching times, wherein, i=0,1,2, 3,4.Remaining character strings of the S (i) when being matching every time after corresponding administrative division interception, standard when Tmaxlen (i) is matched every time The length value of field most long in address database corresponding table, and define a length of MaxLen of positive matching word.Matching algorithm flow is such as Under:
Step one:It is loaded into address date S (0) and saves table data, makes MaxLen=Tmaxlen (0) and matched.If The match is successful then enters step 2, and step 9 is entered if it fails to match.
Step 2:The address date that interception S (0) is matched, obtains remaining character string S (1), is looked into as restrictive condition using saving The address structure tables of data of corresponding normal address database is ask, all city's titles in current province are filtered out, and to filter out City's information data in the most long word length as MaxLen long, i.e. MaxLen=Tmaxlen (1), tied using corresponding address Structure tables of data is matched, if the match is successful for S (1), into step 3, step 9 is entered if it fails to match.
Step 3:The interception address dates that match of S1, obtain remaining character string S (2), using the city that is truncated to as limitation Area/county under the conditional filtering city as normal address database address structure tables of data, and the area/county's information to filter out The most long word length as MaxLen long in data, i.e. MaxLen=Tmaxlen (2), use corresponding address structure data Table is matched, and step 4 is entered if the match is successful if S (2), and step 6 is entered if it fails to match.
Step 4:The address date that interception S (2) is matched, obtains remaining character string S (3), is made with the area/county being truncated to It is the address structure tables of data for limiting the township/town/neighbourhood committee's information under the conditional filtering area/county as normal address database, And using the most long word length as MaxLen long, i.e. MaxLen=Tmaxlen in the township/town/neighbourhood committee's information data for filtering out (3), matched using corresponding address structure tables of data, step 5 is entered if the match is successful if S (3), if matching is lost Lose and then enter step 7.
Step 5:Interception S (3) matching township/town/neighbourhood committee's information obtain remaining character string S (4), be truncated to township/ Town/neighbourhood committee's information screens address structure number of the village/cell under the area/county as normal address database as restrictive condition According to table, and using the length as MaxLen long of most long word in the village/cell for filtering out, i.e. MaxLen=Tmaxlen (4) makes Forward Maximum Method is carried out with corresponding address structure tables of data, if the match is successful for S (4), village/cell letter in S (3) is intercepted Breath, generates character string SS and exports, while terminating algorithm flow, step 8 is entered if it fails to match.
Step 6:With same restrictive condition inquiry ambiguity address matching list, corrected if the match is successful area in S (2)/ County's information, subsequently into step 4, if matching is unsuccessful, the matched rule tree in inquiry table 3, by one turn of matched rule tree Matched rule tree two is changed to, and is performed according to matched rule tree two, the city's information completed with matching is wrapped by limitation condition query Township/town/neighbourhood committee's the information for containing, and township/town/neighbourhood committee's information word most long is long as MaxLen, continue from step 4 " matching township, town, neighbourhood committee's table " process is performed.
Step 7:With same restrictive condition inquiry ambiguity address matching list, corrected if the match is successful township in S (3)/ Town/neighbourhood committee's information, subsequently into step 5, if matching is unsuccessful, the matched rule tree in inquiry table 3 is advised by matching Then tree two is converted to matched rule tree three, and is performed according to matched rule tree three, is limitation bar with the area/county's information for matching completion Part inquires about included village/cell information, and village/cell information word most long is long as MaxLen, continues from step 5 " matching village/cell table " process is performed.
Step 8:With same restrictive condition inquiry ambiguity address matching list, corrected if the match is successful village in S (4)/ Cell information, generates character string SS and exports, while terminating algorithm flow.Enter step 9 if it fails to match.
Step 9:Manual analysis it fails to match reason, will according to the normal address database and ambiguity address matching list The address field not matched in the address for the treatment of to be normalized is analyzed, and obtains the normal address information of corresponding address field, And the normal address database and ambiguity address matching list are modified.Artificial treatment character string S (0) simultaneously exports SS, knot Line journey.
It should be noted that although the operation of the inventive method is described with particular order in the accompanying drawings, this is not required that Or imply that these must be performed according to the particular order operates, or the operation having to carry out shown in whole could realize the phase The result of prestige.Additionally or alternatively, it is convenient to omit some steps, multiple steps are merged into a step to perform, and/or will One step is decomposed into execution of multiple steps.
As shown in figure 5, being a kind of address information standardization apparatus function block diagram of embodiment of the present invention proposition.Bag Include:
Match address scope determining unit 501, for according to matched rule tree and the last administrative division that the match is successful The address realm of current matching is determined from the address structure tables of data of normal address database;
Matching unit 502, for by most greatly enhancing corresponding to the address field not matched in the address for the treatment of to be normalized Degree is long as the most major term of each matching, enters according to the incidence relation of each administrative division in the address realm of the current matching Row matching, if matching is unsuccessful, address realm according to current matching from the matching list of ambiguity address according to ambiguity address, Incidence relation of the administrative division appellation not entirely between normal address is matched, if matching is unsuccessful, changes matching rule Then set, tied from the address of normal address database according to the matched rule tree after replacing and the last administrative division that the match is successful The address realm of current matching is redefined in structure tables of data, the address field institute that will do not matched in the address for the treatment of to be normalized Corresponding maximum length is long as the most major term of each matching, according to each row in the address realm of the current matching for redefining The incidence relation that administrative division is drawn is matched;
Standardisation Cell 503, the normal address information of corresponding address field is obtained for the result according to matching every time.
As shown in fig. 6, being a kind of address information standardization device Plays unit of embodiment of the present invention proposition Functional block diagram.The Standardisation Cell 503 includes:
First Standardisation Cell 5031, if for the match is successful every time, corresponding to the address field that the match is successful Normal address information combination together, obtain the normal address corresponding to the address for the treatment of to be normalized.
Second Standardisation Cell 5032, if without the match is successful in the case of for changing matched rule tree, basis The address field that the normal address database and ambiguity address matching list will not matched in the address for the treatment of to be normalized is carried out Analysis, obtains the normal address information of corresponding address field, and the normal address database and ambiguity address matching list are entered Row amendment.
Although additionally, being referred to some units of device in above-detailed, this division is only not strong Property processed.In fact, according to the embodiment of the present invention, the feature and function of above-described two or more units can be Embodied in one unit.Equally, the feature and function of an above-described unit can also be further divided into by multiple Unit embodies.
Using " the sub- day Dou Jia cities cell 4A901 of Chengde City, Hebei Province Shuangqiao District Feng Ying " as input address S0, according to the above Technical scheme matching step includes:
Step one:Take field conduct most long in the address structure tables of data on " province " in the database of normal address MaxLen, even if MaxLen=Tmaxlen (0), such as province up to Heilongjiang Province, then makes MaxLen=TmaxLen (0)=4. For the present embodiment, the match is successful " Hebei province ".
Step 2:Province's name information that interception S0 is matched, obtains remaining character string S1 (the Chengde Shuangqiao District sub- days of Feng Ying Dou Jia cities cell 4A901), the address structure tables of data on " city " is inquired about to save as restrictive condition, filter out current province All city's titles, and using the most long word length as MaxLen long, i.e. MaxLen=in the city's information data for filtering out Tmaxlen (1), matches " Chengde ".
Step 3:City's information of interception S1 (the sub- day Dou Jia cities cell 4A901 of Chengde Shuangqiao District Feng Ying) matching is remained Remaining character string S2 (the sub- day Dou Jia cities cell 4A901 of Shuangqiao District Feng Ying), is screened under the city using the city that is truncated to as restrictive condition Area/county inquire about corresponding address structure tables of data, and using the most long word in the area/county's information data for filtering out it is long as The length of MaxLen, i.e. MaxLen=Tmaxlen (2), the match is successful " Shuangqiao District ".
Step 4:Area/county's information of interception S2 (the sub- day Dou Jia cities cell 4A901 of Shuangqiao District Feng Ying) matching obtains remaining word Symbol string S3 (the sub- day Dou Jia cities cell 4A901 of Feng Ying), using area/county for being truncated to as restrictive condition screen township under the area/county/ The corresponding address structure tables of data of town/neighbourhood committee's information inquiry, and with most long in township/town/neighbourhood committee's information data for filtering out The word length as MaxLen long, i.e. MaxLen=Tmaxlen (3), herein because " Feng Yingzi " administrative division without belonging to claims Meaning, its full name is " the sub- towns of Feng Ying ", therefore cannot find data in the database of normal address.Hence into step 6, carry out Ambiguity address matches table search.
Step 5:Township/town/neighbourhood committee's information of interception S3 (the sub- day Dou Jia cities cell 4A901 of Feng Ying) matching obtains residue Character string S4 (day Dou Jia cities cell 4A901), the area/county is screened using township/town/neighbourhood committee's information for being truncated to as restrictive condition Under the corresponding address structure tables of data of village/cell queries, and using most long word in the village/cell for filtering out it is long as MaxLen Length, i.e. MaxLen=Tmaxlen (4), the match is successful be " day Dou Jia cities cell ", generate character string SS (Diseasein Chengdes The sub- town Tian Doujia cities cells of city Shuangqiao District Feng Ying), and export, while terminating algorithm flow.
Step 6:Ambiguity table is inquired about with same restrictive condition, the match is successful " Feng Yingzi " then corrects area/county's letter in S3 Breath, is as a result output as " the sub- towns of Feng Ying ", subsequently into step 5.
From above-described embodiment, this case not only carries out effective word segmentation processing to address information, additionally aids power department Address visualization positioning is realized using other information technology.For example:Using Baidu map interface, with the data acquisition of standardization The coordinate information of the address, then according to coordinate information by the power failure data markers of this area on map, realize visual Represent effect.This not only improves and tabular in statistical analysis outmoded represents pattern, data more related to address from now on Analyze and represent and widened technical thought.
Above-described specific embodiment, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect Describe in detail, should be understood that and the foregoing is only specific embodiment of the invention, be not intended to limit the present invention Protection domain, all any modification, equivalent substitution and improvements within the spirit and principles in the present invention, done etc. all should include Within protection scope of the present invention.

Claims (10)

1. a kind of address information standardization processing method, it is characterised in that including:
According to matched rule tree and the last administrative division that the match is successful from the address structure tables of data of normal address database The middle address realm for determining current matching;
Maximum of the maximum length corresponding to address field that will do not matched in the address for the treatment of to be normalized as each matching Word is long, is matched according to the incidence relation of each administrative division in the address realm of the current matching, if matching not into Work(, then the address realm according to current matching from the matching list of ambiguity address according to ambiguity address, administrative division appellation not entirely with Incidence relation between normal address is matched, if matching is unsuccessful, change matched rule tree, according to replacing after Redefined from the address structure tables of data of normal address database with rule tree and the last administrative division that the match is successful The address realm of current matching, the maximum length corresponding to address field that will do not matched in the address for the treatment of to be normalized as The most major term of matching is long every time, enters according to the incidence relation of each administrative division in the address realm of the current matching for redefining Row matching;
Result according to each matching obtains the normal address information of corresponding address field.
2. the method for claim 1, it is characterised in that the result of basis matching every time obtains the mark of corresponding field The step of quasi- address information, includes:
If every time, the match is successful, the normal address information combination corresponding to the address field that the match is successful together, obtains Obtain the normal address corresponding to the address for the treatment of to be normalized.
3. the method for claim 1, it is characterised in that the result of basis matching every time obtains the mark of corresponding field The step of quasi- address information, includes:
If without the match is successful in the case of changing matched rule tree, according to the normal address database and ambiguity address Be analyzed for the address field not matched in the address for the treatment of to be normalized by matching list, obtains the study plot of corresponding address field Location information, and the normal address database and ambiguity address matching list are modified.
4. the method as described in claim 1,2 or 3, it is characterised in that the address format of the matched rule tree is:Province, city, Area/county, village/cell;Province, city, area/county, township/town, village/cell;Province, city, township/town, village/cell.
5. the method as described in claim 1,2 or 3, it is characterised in that the data in the normal address database are with knowledge Obtained based on administrative division and the business region table of comparisons in storehouse.
6. a kind of address information standardization device, it is characterised in that including:
Match address scope determining unit, for according to matched rule tree and the last administrative division that the match is successful from study plot The address realm of current matching is determined in the address structure tables of data of location database;
Matching unit, for the maximum length corresponding to the address field that will not matched in the address for the treatment of to be normalized as every The most major term of secondary matching is long, is matched according to the incidence relation of each administrative division in the address realm of the current matching, If matching is unsuccessful, the address realm according to current matching is from the matching list of ambiguity address according to ambiguity address, administrative area Draw incidence relation of the appellation not entirely between normal address to be matched, if matching is unsuccessful, change matched rule tree, root According to the matched rule tree after replacing and the last administrative division that the match is successful from the address structure data of normal address database The address realm of current matching is redefined in table, corresponding to the address field that will do not matched in the address for the treatment of to be normalized Maximum length is long as the most major term of each matching, according to each administrative division in the address realm of the current matching for redefining Incidence relation matched;
Standardisation Cell, the normal address information of corresponding address field is obtained for the result according to matching every time.
7. device as claimed in claim 6, it is characterised in that the Standardisation Cell includes:
First Standardisation Cell, if for the match is successful every time, the study plot corresponding to the address field that the match is successful Location information combination together, obtains the normal address corresponding to the address for the treatment of to be normalized.
8. device as claimed in claim 6, it is characterised in that the Standardisation Cell includes:
Second Standardisation Cell, if without the match is successful in the case of for changing matched rule tree, according to the standard Be analyzed for the address field not matched in the address for the treatment of to be normalized by address database and ambiguity address matching list, obtains The normal address information of corresponding address field, and the normal address database and ambiguity address matching list are modified.
9. the device as described in claim 6,7 or 8, it is characterised in that the address format of the matched rule tree is:Province, city, Area/county, village/cell;Province, city, area/county, township/town, village/cell;Province, city, township/town, village/cell.
10. the device as described in claim 6,7 or 8, it is characterised in that the data in the normal address database are with knowledge Obtained based on administrative division and the business region table of comparisons in storehouse.
CN201710038482.3A 2017-01-19 2017-01-19 Address information standardization processing method and device Active CN106709065B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710038482.3A CN106709065B (en) 2017-01-19 2017-01-19 Address information standardization processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710038482.3A CN106709065B (en) 2017-01-19 2017-01-19 Address information standardization processing method and device

Publications (2)

Publication Number Publication Date
CN106709065A true CN106709065A (en) 2017-05-24
CN106709065B CN106709065B (en) 2020-08-04

Family

ID=58908793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710038482.3A Active CN106709065B (en) 2017-01-19 2017-01-19 Address information standardization processing method and device

Country Status (1)

Country Link
CN (1) CN106709065B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109254964A (en) * 2018-08-20 2019-01-22 中国平安人寿保险股份有限公司 Address Standardization method, apparatus, computer equipment and storage medium
CN109614396A (en) * 2018-12-17 2019-04-12 广东电网有限责任公司 A kind of method for cleaning of address data structure and standardization
CN109829028A (en) * 2019-01-30 2019-05-31 广州供电局有限公司 A kind of electric power data management system based on normal address
CN110046343A (en) * 2019-03-01 2019-07-23 江苏横云智慧科技有限公司 Non-standard address conversion is the method that canonical address and canonical address encode
CN110147420A (en) * 2019-05-07 2019-08-20 武大吉奥信息技术有限公司 A kind of place name address matching querying method and system based on spectrum model
CN110852556A (en) * 2019-09-20 2020-02-28 国网浙江省电力有限公司 95598 automatic order dispatching method for first-aid repair work order
CN111160011A (en) * 2019-12-17 2020-05-15 浙江大华技术股份有限公司 Organization unit standardization method, device, equipment and storage medium
CN111460054A (en) * 2019-01-21 2020-07-28 阿里巴巴集团控股有限公司 Address data processing method and device, equipment and storage medium
CN111753515A (en) * 2020-06-24 2020-10-09 广东科杰通信息科技有限公司 Address information extraction and matching method for realizing entity positioning
CN111859849A (en) * 2020-07-01 2020-10-30 邦道科技有限公司 Power utilization address management method and device
CN112100161A (en) * 2019-09-17 2020-12-18 上海寻梦信息技术有限公司 Data processing method and system, electronic device and storage medium
CN112115214A (en) * 2019-06-20 2020-12-22 中科聚信信息技术(北京)有限公司 Address standardization method, address standardization device and electronic equipment
CN112330281A (en) * 2020-11-05 2021-02-05 南京师范大学 Chinese administrative division association method for leather-following data
CN112487122A (en) * 2020-12-02 2021-03-12 电信科学技术第十研究所有限公司 Address normalization processing method and device
CN112835897A (en) * 2021-01-29 2021-05-25 上海寻梦信息技术有限公司 Geographic region division management method, data conversion method and related equipment
CN113157762A (en) * 2020-12-31 2021-07-23 南威软件股份有限公司 Normalization method and system based on fuzzy hierarchy geographical position and data processing terminal

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070127555A1 (en) * 2005-09-07 2007-06-07 Lynch Henry T Methods of geographically storing and publishing electronic content
CN101005461A (en) * 2007-01-16 2007-07-25 中兴通讯股份有限公司 IPv6 route list checking and repeating method
CN101043421A (en) * 2006-03-21 2007-09-26 上海激动通信有限公司 Memory based method for searching quickly the longest matching of IP address
CN101127050A (en) * 2007-07-03 2008-02-20 北京大学 Method for automatically extracting website owner administrative apanage information from web page
CN101299217A (en) * 2008-06-06 2008-11-05 北京搜狗科技发展有限公司 Method, apparatus and system for processing map information
CN101458694A (en) * 2008-10-09 2009-06-17 浙江大学 Chinese participle method based on tree thesaurus
CN101719128A (en) * 2009-12-31 2010-06-02 浙江工业大学 Fuzzy matching-based Chinese geo-code determination method
CN101882163A (en) * 2010-06-30 2010-11-10 中国科学院地理科学与资源研究所 Fuzzy Chinese address geographic evaluation method based on matching rule
CN101887462A (en) * 2010-07-14 2010-11-17 厦门精图信息技术有限公司 Rapid classification and registration method capable of continuously optimizing geographical name database
CN101980208A (en) * 2010-11-10 2011-02-23 百度在线网络技术(北京)有限公司 Address query method and system
CN102314645A (en) * 2011-09-26 2012-01-11 深圳市络道科技有限公司 Address matching method and system
CN103065267A (en) * 2012-12-26 2013-04-24 天津市电力公司 Marketing, production data sharing and service convergence method based on user data set
CN103473289A (en) * 2013-08-30 2013-12-25 深圳市华傲数据技术有限公司 Device and method for completing communication addresses
CN103605752A (en) * 2013-11-21 2014-02-26 武大吉奥信息技术有限公司 Address matching method based on semantic recognition
CN103699623A (en) * 2013-12-19 2014-04-02 百度在线网络技术(北京)有限公司 Geo-coding realizing method and device
CN103914544A (en) * 2014-04-03 2014-07-09 浙江大学 Method for quickly matching Chinese addresses in multi-level manner on basis of address feature words
CN104166679A (en) * 2014-07-08 2014-11-26 北京迪威特科技有限公司 Address matching method for sorting
CN104537062A (en) * 2014-12-29 2015-04-22 北京牡丹电子集团有限责任公司数字电视技术中心 Address information extracting method and system
CN104615782A (en) * 2015-03-02 2015-05-13 武汉工程大学 Address matching method based on sliding window maximum matching algorithm
CN105677700A (en) * 2015-12-23 2016-06-15 武汉工程大学 Chinese address administrative division analytic method based on set operation
CN105786800A (en) * 2016-03-23 2016-07-20 苏州数字地图信息科技股份有限公司 Police standard address acquiring method and system

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070127555A1 (en) * 2005-09-07 2007-06-07 Lynch Henry T Methods of geographically storing and publishing electronic content
CN101043421A (en) * 2006-03-21 2007-09-26 上海激动通信有限公司 Memory based method for searching quickly the longest matching of IP address
CN101005461A (en) * 2007-01-16 2007-07-25 中兴通讯股份有限公司 IPv6 route list checking and repeating method
CN101127050A (en) * 2007-07-03 2008-02-20 北京大学 Method for automatically extracting website owner administrative apanage information from web page
CN101299217A (en) * 2008-06-06 2008-11-05 北京搜狗科技发展有限公司 Method, apparatus and system for processing map information
CN101458694A (en) * 2008-10-09 2009-06-17 浙江大学 Chinese participle method based on tree thesaurus
CN101719128A (en) * 2009-12-31 2010-06-02 浙江工业大学 Fuzzy matching-based Chinese geo-code determination method
CN101882163A (en) * 2010-06-30 2010-11-10 中国科学院地理科学与资源研究所 Fuzzy Chinese address geographic evaluation method based on matching rule
CN101887462A (en) * 2010-07-14 2010-11-17 厦门精图信息技术有限公司 Rapid classification and registration method capable of continuously optimizing geographical name database
CN101980208A (en) * 2010-11-10 2011-02-23 百度在线网络技术(北京)有限公司 Address query method and system
CN102314645A (en) * 2011-09-26 2012-01-11 深圳市络道科技有限公司 Address matching method and system
CN103065267A (en) * 2012-12-26 2013-04-24 天津市电力公司 Marketing, production data sharing and service convergence method based on user data set
CN103473289A (en) * 2013-08-30 2013-12-25 深圳市华傲数据技术有限公司 Device and method for completing communication addresses
CN103605752A (en) * 2013-11-21 2014-02-26 武大吉奥信息技术有限公司 Address matching method based on semantic recognition
CN103699623A (en) * 2013-12-19 2014-04-02 百度在线网络技术(北京)有限公司 Geo-coding realizing method and device
CN103914544A (en) * 2014-04-03 2014-07-09 浙江大学 Method for quickly matching Chinese addresses in multi-level manner on basis of address feature words
CN104166679A (en) * 2014-07-08 2014-11-26 北京迪威特科技有限公司 Address matching method for sorting
CN104537062A (en) * 2014-12-29 2015-04-22 北京牡丹电子集团有限责任公司数字电视技术中心 Address information extracting method and system
CN104615782A (en) * 2015-03-02 2015-05-13 武汉工程大学 Address matching method based on sliding window maximum matching algorithm
CN105677700A (en) * 2015-12-23 2016-06-15 武汉工程大学 Chinese address administrative division analytic method based on set operation
CN105786800A (en) * 2016-03-23 2016-07-20 苏州数字地图信息科技股份有限公司 Police standard address acquiring method and system

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109254964A (en) * 2018-08-20 2019-01-22 中国平安人寿保险股份有限公司 Address Standardization method, apparatus, computer equipment and storage medium
CN109614396A (en) * 2018-12-17 2019-04-12 广东电网有限责任公司 A kind of method for cleaning of address data structure and standardization
CN111460054B (en) * 2019-01-21 2023-06-30 阿里巴巴集团控股有限公司 Address data processing method and device, equipment and storage medium
CN111460054A (en) * 2019-01-21 2020-07-28 阿里巴巴集团控股有限公司 Address data processing method and device, equipment and storage medium
CN109829028A (en) * 2019-01-30 2019-05-31 广州供电局有限公司 A kind of electric power data management system based on normal address
CN110046343A (en) * 2019-03-01 2019-07-23 江苏横云智慧科技有限公司 Non-standard address conversion is the method that canonical address and canonical address encode
CN110147420A (en) * 2019-05-07 2019-08-20 武大吉奥信息技术有限公司 A kind of place name address matching querying method and system based on spectrum model
CN112115214A (en) * 2019-06-20 2020-12-22 中科聚信信息技术(北京)有限公司 Address standardization method, address standardization device and electronic equipment
CN112115214B (en) * 2019-06-20 2024-04-02 中科聚信信息技术(北京)有限公司 Address standardization method, address standardization device and electronic equipment
CN112100161B (en) * 2019-09-17 2021-05-28 上海寻梦信息技术有限公司 Data processing method and system, electronic device and storage medium
CN112100161A (en) * 2019-09-17 2020-12-18 上海寻梦信息技术有限公司 Data processing method and system, electronic device and storage medium
CN110852556A (en) * 2019-09-20 2020-02-28 国网浙江省电力有限公司 95598 automatic order dispatching method for first-aid repair work order
CN111160011A (en) * 2019-12-17 2020-05-15 浙江大华技术股份有限公司 Organization unit standardization method, device, equipment and storage medium
CN111160011B (en) * 2019-12-17 2023-06-27 浙江大华技术股份有限公司 Organization unit standardization method, device, equipment and storage medium
CN111753515A (en) * 2020-06-24 2020-10-09 广东科杰通信息科技有限公司 Address information extraction and matching method for realizing entity positioning
CN111859849A (en) * 2020-07-01 2020-10-30 邦道科技有限公司 Power utilization address management method and device
CN111859849B (en) * 2020-07-01 2023-11-24 邦道科技有限公司 Management method and device for electricity utilization address
CN112330281A (en) * 2020-11-05 2021-02-05 南京师范大学 Chinese administrative division association method for leather-following data
CN112487122A (en) * 2020-12-02 2021-03-12 电信科学技术第十研究所有限公司 Address normalization processing method and device
CN112487122B (en) * 2020-12-02 2024-05-17 电信科学技术第十研究所有限公司 Address normalization processing method and device
CN113157762A (en) * 2020-12-31 2021-07-23 南威软件股份有限公司 Normalization method and system based on fuzzy hierarchy geographical position and data processing terminal
CN112835897B (en) * 2021-01-29 2024-03-15 上海寻梦信息技术有限公司 Geographic area division management method, data conversion method and related equipment
CN112835897A (en) * 2021-01-29 2021-05-25 上海寻梦信息技术有限公司 Geographic region division management method, data conversion method and related equipment

Also Published As

Publication number Publication date
CN106709065B (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN106709065A (en) Standardization processing method and standardized processing device for address information
CN109635127B (en) Power equipment portrait knowledge map construction method based on big data technology
US20040158562A1 (en) Data quality system
CN107357940A (en) A kind of method and apparatus of real estate Data Integration
CN103325067B (en) The service push method and system segmented based on Electricity customers
CN107392748A (en) A kind of income index based on dimension map matching is efficiently entered an item of expenditure in the accounts system and method
CN109685567A (en) It is a kind of to be drawn a portrait new method based on convolutional neural networks and the Electricity customers of fuzzy clustering
CN111523433B (en) Standardized processing method, device and equipment for end address of express mail
CN113377758A (en) Data quality auditing engine and auditing method thereof
CN109033322A (en) A kind of test method and device of multidimensional data
CN105404608B (en) A kind of complicated index set computational methods and system based on Formula Parsing
CN117291000A (en) Auxiliary model for analyzing big data of homeland space planning
CN111581304A (en) Algorithm for automatically drawing family map based on social population familial relationship
CN104216986B (en) The device and method of pre-operation raising efficiency data query is carried out with the data update cycle
CN110175199A (en) Energy enterprise key user's identifying and analyzing method based on K mean cluster algorithm
CN115719289A (en) House data processing method, device, equipment and medium
CN105573984A (en) Socio-economic indicator identification method and device
CN109886557A (en) A kind of order distribution method and system based on big data and gridding
CN112801817B (en) Electric energy quality data center construction method and system thereof
CN104778253B (en) A kind of method and apparatus that data are provided
CN110866083B (en) Address auditing method for electric power standard structured address library
CN107395418A (en) Statistical processing methods, system and the server of network behavior data
CN109919811B (en) Insurance agent culture scheme generation method based on big data and related equipment
CN111078674A (en) Data identification and error correction method for distribution network equipment
CN111579928B (en) Distribution line loss analysis method and analysis device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant