CN101844135A - Method for sorting postal letters according to addresses driven by address information base - Google Patents

Method for sorting postal letters according to addresses driven by address information base Download PDF

Info

Publication number
CN101844135A
CN101844135A CN201010170949A CN201010170949A CN101844135A CN 101844135 A CN101844135 A CN 101844135A CN 201010170949 A CN201010170949 A CN 201010170949A CN 201010170949 A CN201010170949 A CN 201010170949A CN 101844135 A CN101844135 A CN 101844135A
Authority
CN
China
Prior art keywords
address
letter
sorting
information
road
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201010170949A
Other languages
Chinese (zh)
Other versions
CN101844135B (en
Inventor
吕岳
范生淼
吕淑静
屠晓
姚心宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Post Technology Co.,Ltd.
Original Assignee
SHANGHAI POST SCIENCE INST
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI POST SCIENCE INST filed Critical SHANGHAI POST SCIENCE INST
Priority to CN2010101709498A priority Critical patent/CN101844135B/en
Publication of CN101844135A publication Critical patent/CN101844135A/en
Application granted granted Critical
Publication of CN101844135B publication Critical patent/CN101844135B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

The invention discloses a method for sorting postal letters according to addresses driven by an address information base, wherein each delivery address in the address information base at least comprises a character expression containing address information for realizing transit sorting, local port sorting and road sections; the address character expression obtained by image acquisition and character identification is in transversal matching with the delivery addresses in the address information base; and sorting information of the postal letters can be obtained according to the matching degree, so that the transit sorting, the local port sorting and the road section delivery after the local port sorting can be realized. The invention is actually applied to an identification module for an information sorter; and under the conditions that the address base is relatively accurate and complete and the identification rate is basically ensured, the method can be effectively used for analyzing and correcting a letter address identification result to obtain accurate letter sorting information, i.e. the method can be used for sorting the letters in complete accordance with the letter address identification result.

Description

Postal letter under a kind of address information storehouse drives is by the location method for sorting
Technical field
The invention belongs to postal technical field, the postal letter under particularly a kind of address information storehouse drives is by the location method for sorting.
Background technology
Along with development economic and society, the letter amount constantly increases, and traditional hand-sorted can't satisfy actual needs, uses the automatically sorting letters machine to become a kind of trend.The automatically sorting letters machine is by gathering envelop image and it is discerned processing, according to recognition result letter being sorted then.Existing automatically sorting letters machine mainly is according to the postcode recognition result letter to be sorted, the method that while also useful address identifying information replenishes and revises postcode.
If the letter sorting target be with letter sorting to delivery office, only rely on 6 postcodes can realize sorting, if but still can't sort when not having postcode on the envelope.If the letter sorting target is the section of being sorted to, promptly needing letter sorting in same postcode scope is as the road section, and such letter sorting requirement can't rely on postcode to realize.
Summary of the invention
The purpose of this invention is to provide automatic identification the in letter address that a kind of address base drives by the location method for sorting, to solve efficient and the not good enough problem of accuracy in the present letter sorting technology.
Technical scheme of the present invention is, postal letter under a kind of address information storehouse drives is by the location method for sorting, each destination address in the described address information storehouse has a kind of realization transit letter sorting that comprises at least, this mouthful of letter sorting and the textual representation of delivering the address information of using for the road section, the address textual representation of the described postal letter that will obtain by IMAQ and character recognition and the destination address in the described address information storehouse travel through coupling, obtain the letter sorting information of described postal letter according to matching degree, realize the transit letter sorting of postal letter, road section after this mouthful of letter sorting and this mouthful letter sorting is delivered.
Further, the described character recognition step that obtains the address textual representation of described postal letter comprises:
The letter image is analyzed, obtained the zone of address of the addressee;
Chinese character to the address area adopts partitioning algorithm to cut apart, and obtains behind the multirow literal every style of writing word being divided into a plurality of individual characters again;
Adopt Chinese character to adopt recognizer to discern to each individual character, obtain the address textual representation of described postal letter.
Further, travel through coupling in the address textual representation of the described postal letter that will obtain by IMAQ and character recognition and the destination address in the described address information storehouse, when obtaining the letter sorting information of described postal letter according to matching degree, the verification of sorting information in conjunction with the postcode information of described postal letter.
Address information of the present invention storehouse drives and comprises: at first set up an address base that comprises the normal address data entries, and every width of cloth letter image is obtained a recognition result that comprises address information by mode identification technology, for every in address base inquire address data entries, in recognition result, mate, obtain the highest inquire address data entries of matching degree, analyze matching degree information, then export these clauses and subclauses as if meeting the demands reserved portion is chosen information, otherwise do not sort information.
The present invention is applied in the identification module of letter sorting machine, under relatively accurate complete, the situation that discrimination guarantees substantially of address base, this method can effectively be carried out analysis correction to recognition result, obtain result accurately, promptly can rely on fully letter Address Recognition result is carried out letter sorting.
Description of drawings
Fig. 1 is an embodiment of the invention interline counter address table structure
Fig. 2 is the Address Recognition flow chart that address base drives in the embodiment of the invention
Fig. 3 is that address base drives block diagram in the embodiment of the invention
Fig. 4 is an embodiment of the invention interline counter address court verdict flow chart
Fig. 5 is this port address identification process figure in the embodiment of the invention
Fig. 6 is the postcode of multiple tracks road in the embodiment of the invention, unit matching result, affiliated district information checking FB(flow block)
Fig. 7 is road and the mutual checking process figure of the coupling delivery information result of unit in the embodiment of the invention
The specific embodiment
Elaborate for the specific embodiment of the present invention below in conjunction with accompanying drawing.
Send the difference on ground to according to letter, letter sorting divides dual mode: this mouth letter sorting and transit are sorted.This mouthful of letter sorting is that letter is the letter of sorter location with sending to, and it need be accurate to the letter sorting of delivering suboffice or delivering the road section; The transit letter sorting is that letter is other the regional letters except that the sorter location with sending to, and it is accurate to letter sorting provincial, prefecture-level or counties and districts' level by the different needs of sort plan.For these two kinds of sorting mode, all need to set up the Chinese address identification that mutually deserved address base drives letter.
For transit address storage format, Biao Shi transit address table structure as shown in Figure 1, form by the three grades of administrative divisions in the whole nation, be respectively 31 provincial administrative divisions (as Zhejiang Province), each provincial administrative division is its prefecture-level administrative division (as the Hangzhou) down, and each prefecture-level administrative division comprises the administrative division (as Yuhang District) of a plurality of counties and districts level again.Each administrative division all marked should the zone in the scope of postcode, as being 3111 such as preceding four of the postcode of Yuhang District, back two uncertain, so represent with 3111xx, x represents any 0~9 numeral.What specify is the postcode of the postcode of prefecture-level city's mark for its inner city.Set up thus one cover gazetteer that the whole nation is accurate to the counties and districts administrative divisions with and corresponding postcode.Each address entry during the present invention will show is called inquire address.
For this port address storage format, this port address is divided into two of road address and organizations.
For road address information, adopt road to represent with the form of the relevant combination of number.And for organization information, its contents diversification can be various other address expression-forms except that the name of road such as certain sub-district, company, mansion, office, school, little place name.At first need to remove some redundant information in these address informations, such as this mouthful unit information for Zhejiang Hangzhou, be provided with in " Zhejiang ", " Zhejiang Province ", " Hangzhou " " Hangzhou " " company " everyday words such as " Co., Ltds " is as redundancy, removing these information does not influence the expression of address, so we are not influencing these information of the following removal of principle that express the address.Expression after the removal is called the abbreviation of this address.Such as:
The abbreviating as of " the Zhejiang Province People's Government " " provincial people's government "
The abbreviation of " Zhejiang University " is appointed and is " Zhejiang University "
The abbreviating as of " Hangzhou Co., Ltd of Alibaba " " Alibaba "
The advantage that adopt to be called for short is that because this mouthful unit information is longer, Pi Pei difficulty is very big fully, and very high as phrase frequency of occurrences in the address in " Zhejiang Province ", the differentiation of similar address that can have a strong impact on two.Adopt and be called for short and improve matching degree than the interference of going out of large program.
For removing redundant back address entry is that be called for short the address, next will extract keyword as retrieval, for follow-up coupling is prepared.Here the definition of keyword is: the speech that continuous three words in certain bar address are formed, and also the frequency that this speech occurs in other addresses is minimum.
The representation of road address information is as follows:
Sequence number The road name Parity flag Starting symbol Stop number Postcode Under the district and Deliver suboffice The road segment number Retrieval
??1 Chao Wanglu Very ??7 ??53 ??210014 Xiacheng District
??2 Moral wins the road Idol ??328 ??10000 ??310015 The Gongshu District
" road name ": i.e. the title of road.As Zhongshan North Road, century main road, Chang Qingjie, Zu Miaoxiang, peach blossom do or the like.Note any punctuation mark to occur in the title.
" parity flag ": the delivery number of indicating this section road only is odd number, only still is consecutive number for even number.
" starting symbol ": this section road is delivered the starting symbol of number.
" stop number ": this section road is delivered the termination number of number.If termination the unknown of this section road then is defined as " 9999 " (when odd number or all number) or " 9998 " (during even number).
" postcode ": the postcode of this section road.
" affiliated district ": the district at this section road place.Wherein, the district is meant districts under city administration, county-level city, county or the like.
" section road numbering ": section road numbering under this section road.
The representation of organization information is as follows:
Every unit information stores by row, comprises " unit name ", " postcode ", " affiliated district ", " actual address ", " being called for short 1 ", " being called for short 2 ", " being called for short 3 ", " section road numbering ", " remarks " 9 attributes, and every information stores by row.The delivery information of unit address as shown in the table:
??A ??B ??C ??D ??E ??F ??G ??H ??I
??1 The unit name Postcode Affiliated district Actual address Be called for short 1 Be called for short 2 Be called for short 3 Section road numbering Remarks
??2 Hangzhou Xihu District people's court ??310012 Xihu District No. 9, literary composition two West Roads Xihu District people's court Xihu District law court
" unit name ": i.e. the title of unit.For example Zhejiang University, perseverance encourage mansion, Hangzhou Xihu District people's court, in melt City Garden or the like.Title must be write full name, and for example " Zhejiang Province higher people's court " cannot be written as " provincial high people's courts ", but " provincial high people's courts " can write " being called for short a 1 " hurdle.Note any punctuation mark to occur in the title.
" postcode ": the postcode of this unit location.
" affiliated district ": the district at this unit place.Wherein, the district is meant districts under city administration, county-level city, county or the like.
" actual address ": the actual address at the unit place that this unit information is represented.As shown in figure 10.
" be called for short 1 ": the abbreviation of the unit that this unit information is represented.If it then is empty not having.As shown in figure 10.
" be called for short 2 ": the abbreviation of the unit that this unit information is represented.If it then is empty not having.
" be called for short 3 ": the abbreviation of the unit that this unit information is represented.If it then is empty not having.
" section road numbering ": section road numbering under this unit actual address.
" remarks ": remark information.
Special circumstances wherein have:
[A] in the XX, XX mill, XX garden, XX village, XX garden, XX sub-district and similar this residential quarters in most cases all leave in " unit " worksheet.But belong to different delivery offices if a certain residential quarters occur, then it is stored in " road " worksheet.For example belong to first delivery office garden, Baima 1-20 number, belong to second delivery office 21-40 number.Then garden, Baima is deposited in " road " worksheet rather than " unit " worksheet.It is as follows to deposit form:
" road " worksheet among the .xls of first delivery office
??A ??B ??C ??D ??E ??F ??G ??H
??1 The road name Odd even is complete Starting symbol Stop number Postcode Affiliated district Section road numbering Remarks
??2 Garden, Baima Entirely ??1 ??20 ??111111 Baiyun District
" road " worksheet among the .xls of second delivery office
??A ??B ??C ??D ??E ??F ??G ??H
??1 The road name Odd even is complete Starting symbol Stop number Postcode Affiliated district Section road numbering Remarks
??2 Garden, Baima Entirely ??21 ??40 ??111111 Baiyun District
[B] owing to punctuation mark can not occur in " unit name ", then need be expressed as " East Lake fragrant pavilion water bank " for the title of " East Lake fragrant pavilion water bank " this form.And for the title of " Shahu Lake village (residential district, former Shahu Lake) " this form, bracket can be removed, " residential district, former Shahu Lake " this annotation information is put into corresponding " remarks " hurdle.
[C] should deposit in " road " worksheet for postal private letter box, and it is as follows to deposit form.And other mailbox can not be deposited in " road " worksheet.
Postal private letter box is deposited form shfft
??A ??B ??C ??D ??E ??F ??G ??H
??1 The road name Odd even is complete Starting symbol Stop number Postcode Affiliated district Section road numbering Remarks
??2 The postal private letter box in XX city Very ??1521 ??1521 ??111111 Baiyun District
[D] for the army that the Arabic numerals designation is arranged, for example " 73022 army " should deposit in " road " worksheet, deposits the following table of form.
The army that the Arabic numerals designation is arranged
??A ??B ??C ??D ??E ??F ??G ??H
??1 The road name Odd even is complete Starting symbol Stop number Postcode Affiliated district Section road numbering Remarks
??2 Army Idol ??73022 ??73022 ??111111 Baiyun District
The army of [E] and other then deposits in " unit " worksheet as " the 8th squadron of People's Armed Police ", and is as shown in the table.
The army that does not contain the Arabic numerals designation
??A ??B ??C ??D ??E ??F ??G ??H ??I
??1 The unit name Postcode Affiliated district Actual address Be called for short 1 Be called for short 2 Be called for short 3 Section road numbering Remarks
??2 The 8th army of the Armed Police squadron ??111111 Baiyun District No. 222, Baiyun Road
For a width of cloth letter image,, need methods such as utilization graphical analysis, Chinese Character Recognition, data base querying that image is handled in order to obtain final letter sorting information.Fig. 2 is the basic step that image information is handled, and at first needs the letter image is analyzed, and obtains the zone of address of the addressee; Again the Chinese character of address area is cut apart by row, obtained the multirow literal; Then adopt the first and second, two kinds of Chinese character partitioning algorithms that every style of writing word is divided into individual character; For the individual character that first algorithm obtains, adopted L and W Chinese Character Recognition algorithm that individual character is discerned respectively, and, used H Chinese Character Recognition algorithm to carry out individual character identification for the individual character that second algorithm obtains; The FA final address storehouse drives algorithm and obtains final sorter in conjunction with the recognition result of L, W, three kinds of algorithms of H and address base information and need sort information.Here the first and second Chinese character partitioning algorithms can be a kind of in the Chinese character partitioning algorithm, and L, W, three kinds of algorithms of H can be a kind of in the Chinese Character Recognition algorithm.
The technical scheme core that the present invention proposes is that address base drives, its basic thought is at first to set up an address base that comprises the normal address data entries, and every width of cloth letter image is obtained a recognition result that comprises address information by mode identification technology, for every in address base inquire address data entries, in recognition result, mate, obtain the highest inquire address data entries of matching degree, analyze information such as matching degree, then export these clauses and subclauses if meet the demands reserved portion is chosen information, otherwise do not have information.Drive its basic procedure such as Fig. 3 for address base.
Address base drives is input as three kinds of Chinese Character Recognition results (being respectively H, L, W algorithm), can see that in Fig. 2 L, W algorithm use identical word partitioning algorithm, and the H algorithm has used another kind of word partitioning algorithm, therefore the recognition result character string of L, W algorithm has identical length, and the length of the recognition result character string of H algorithm is different with preceding two kinds of possibilities, therefore at first these three kinds of recognition results are alignd, produce a character trail D, all there is 1 to 3 candidate's word (being respectively the recognition result of H, L, W) its each position.For character trail D, if need carry out the transit letter sorting, then use transit table address clauses and subclauses and D to mate, judgement obtains letter sorting information; If need carry out this mouthful letter sorting, then use this port address table to mate, judgement obtains this mouthful letter sorting information; Mix letter sorting if carry out this transit, then carry out the transit Address Recognition earlier, when being this message letter, carry out the identification of this port address again as if the result.
Below narrated respectively for the problem that relates in the result identification.
1. the foundation of recognition result alignment and recognition result character trail
The for convenience coupling that drives of address base and the recognition result that makes full use of three kinds of algorithms (H, L, W algorithm), at first need three kinds of identifications are comprehensively obtained the character trail D of an optimization, each word of this set D all has 1 to 3 candidate's word, be respectively (recognition result of H, L, W algorithm), and candidate's word sorts according to priority.If the recognition result character string of H, L, W algorithm is respectively Hr, Lr, Wr, the length of character string is respectively Hl, Ll, Wl, and Ll equates with Wl so, and Hl then not necessarily equates.In order not guarantee not reject useful information, the string length Dl after the alignment is the maximum length of Hl, Ll, Wl, promptly
Dl=max(Hl,Ll)
Adopted the Needleman-Wunsch algorithm that recognition result is carried out registration process here, because Lr in the recognition result, Wr aligns, therefore only needing Hr to align with Lr or Wr gets final product, just in when coupling as long as any one of two characters of same position is identical among the character among the Hr and Lr or the Wr, then think character among the Hr and the character match of Wr and Hr.Improving for this has carried out some to the Neeldeman-Wunsch algorithm, is the introduction of improved Needleman-Wunsch algorithm below:
Primary condition: M (i, 0)=M (0, and j)=0 (0≤i≤Ll, 0≤j≤Hl)
Tx(i,0)=Tx(0,j)=0?????(0≤i≤Ll,0≤j≤Hl)
Ty(i,0)=Ty(0,j)=0?????(0≤i≤Ll,0≤j≤Hl)
The recurrence condition:
Figure GSA00000114562200071
Figure GSA00000114562200072
Figure GSA00000114562200074
M wherein, Tx, Ty are the matrix of (Ll+1) * (Hl+1), and M is the matching score matrix, Tx, Ty is for recalling matrix, and each unit of expression M is which adjacent unit obtains, the position of Tx record x direction, the position of Ty record direction.σ is a scoring function, and as Hr (j) and Lr (i), Wr (i) is when any is equal, and matching score is Mat; When unequal, mispairing must be divided into Mis.And add punishing of space be divided into W.The value of each unit of M relies on the value of its left side, upper left, last three directions simultaneously.Here we to design Mat be 2, Mis is-1, W is-2.From matrix (Ll Hl) dates back to (0,0) forward, according to the sensing of recalling matrix, the character string Hd after obtaining aliging, Wd, Ld, they have formed character trail D, and length be Dl=max (Hl, Ll).
2. transit Address Recognition
The transit Address Recognition has comprised two parts, the judgement of the coupling of transit address base and transit matching result.
2.1. the coupling of transit address base
Can see that in Fig. 1 the inquire address in the transit address table has three types: provincial address, prefecture-level address, counties and districts level address.And, can resolve to two parts for each inquire address, be called place name and level another name here.Such as " Beijing ", Beijing is place name, and the city is the level another name, and place name has comprised most information with regard to an address, and the level another name all is identical to a lot of addresses, mainly is " city " " province " " autonomous region " " county " " district " etc. at transit table middle rank another name.For an identification character collection D, in general to all mate each inquire address, calculate its matching degree.Having adopted the Smith-Waterman algorithm to carry out matching score here calculates, Smith-Waterman algorithm input inquiry sequence is a certain address of transit table, and because the storehouse sequence of Smith-Waterman algorithm input is the character trail D that three candidate's words are arranged, therefore the Smith-Waterman algorithm is improved.
At first improved Smith-Waterman algorithm is introduced.If certain the bar address in the transit table is character string Q, its length is Ql, is the formula of improved Smith-Waterman algorithm below:
Primary condition:
M(i,0)=E(i,0)=F(i,0)=0????????(0≤i≤Ql)
M(0,j)=E(0,j)=F(0,j)=0????????(0≤j≤Dl)
The recurrence condition:
E(i,j)=max{E(i,j-1)-r,M(i,j-1)-q-r,0}?????(5)
F(i,j)=max(F(i-1,j)-r,M(i-1,j)-q-r,0}?????(6)
M(i,j)=max{0,M(i-1,j-1)+σ(Q(i),D(j)),E(i,j),F(i,j)}???????(7)
Figure GSA00000114562200091
Wherein, M, E, F are the matrix of (Ql+1) * (Dl+1), and σ is a scoring function, and q is room exploitation punishment, and r is that punishment, Mat matching score, Mis bit mismatch score are extended in the room.
For every inquire address in the transit address table, calculate by the Smith-Waterman algorithm, all from character trail D, obtain one section character string R, the matching degree of this section character string R and this inquire address maximum, and the position of R in D.Because there are subordinate relation in province, districts and cities, counties and districts between the third-level address, in order to reduce the matching times of address table, following Fig. 4 of coupling flow process of transit address table.
Through through the coupling of transit address table, obtain matching degree and form set DA greater than the address of setting thresholding, comprised among the set DA all each province that satisfy thresholding, districts and cities, counties and districts etc. the inquire address clauses and subclauses of different stage.Subordinate relation according to address in the transit address table, if the inquire address in the DA set has subordinate relation, it is combined into inquire address clauses and subclauses, such as having comprised three information in " Zhejiang Province " " Hangzhou " " Taizhou city " in the DA set, then will be combined into " Hangzhou, Zhejiang province city " and " Taizhou, Zhejiang Province city " two information.Be called set DB according to the set DA after the subordinate relation combination.
2.2. the judgement of transit matching result
Each clauses and subclauses is referred to as the address string among the set DB, and the address string can be made up of 1~3 inquire address, such as " Beijing ", " Pudong New Area, Shanghai ", " Hangzhou, Zhejiang province city Yuhang District " is respectively the address string that 1,2,3 inquire addresses are formed.DB is the set that comprises one or more address string, in order therefrom to choose a correct address string, need set up the evaluation model of matching result and adjudicate.For each inquire address, all have following several information: matching degree, matched position postcode.If comprise postcode in the recognition result here, then can extract the information of postcode identification.
At first this model need be set up the point system of matching degree, and concrete steps are as follows:
[A] is divided into place name+level another name two parts with inquire address, and length is a1 and a2
[B] inquiry place name and rank name be character match number b1 and the b2 in matched character string R (length Rl) respectively
[C] set place name and level another name fully the weights of coupling be c1, c2, c1=4 wherein, c2=1
[D] calculates matching score
S1=(c1*b1/a1+c2*b2/a2)/(c1+c2)?????????(9)
[E] mated fully to place name to be rewarded
By formula as can be known S2 be that 1.0 o'clock inquire addresses mate fully.
[F] sets that inquire address mates fully and incomplete weights during coupling, is respectively m1, m2, m1=100 wherein, m2=20.
Figure GSA00000114562200102
Distinguishing coupling and the not exclusively weights of coupling fully, is because inquire address when mating fully, and we think that this identifying information can not cause any ambiguity.S3 has represented the score of each inquire address, and best result is 100, by formula as can be known, mates S3 〉=16 as if place name in character trail D in the inquire address.Because place name reacted address information generally speaking, we choose thresholding MT1=16, think the basic trusted of inquire address.Mate b2/a2=1 fully and work as the level another name, when the place name matching degree is b1/a1=0.5, such as content among the character trail D is " Hang Chuanshi of Zhejiang Province ", the matching degree score S3=12 of inquire address " Hangzhou " so, this moment, we thought that this inquire address comprises the part address information, may be by the factors such as exclusiveness of other information such as postcode, its superior and the subordinate's address association, place name, determine that " Hangzhou " is correct information, therefore choose thresholding MT2=12, think that inquire address has available address information.
[G] forms set DA with S3 at thresholding MT2 and above inquire address, and by the subordinate relation of DA according to inquire address, obtains gathering DB.Next model need carry out the score evaluation to each the address string among the DB.If the address is gone here and there to such an extent that be divided into S4, the score of maximum three grades of inquire addresses (provincial, prefecture-level, counties and districts' level) that it comprises is respectively ss1, ss2, and ss3 (must be divided into 0 when not existing), by following judgment criterion:
When (1) arbitrary inquire address score is equal to or greater than MT1 in go here and there the address so,
S4=ss1+ss2+ss3??(12)
(2) when having all inquire address scores less than MT1 in the address string (when existing must more than or equal to MT2), according to the matched position of inquire address in D, the ways of writing that whether meets Chinese address by matched position, promptly whether write, get different values by orders provincial, prefecture-level, counties and districts:
S4=ss1+ss2+ss3 is if matched position meets sequential write (13)
(ss1, ss2 ss3) do not meet sequential write (14) if write to S4=max
According to above accurate, we have obtained the score of each address string, and above-mentioned " Hang Chuanshi of Zhejiang Province " score should be S4 and should be 112.
[H] if there is not the postcode identifying information, then S4 promptly is the final score of address string; If the postcode identifying information exists, then the postcode identifying information is added the scoring system of address string.When the postcode identifying information exists, use the postcode of every grade of inquire address in the string of identification postcode and address to compare, obtain can the match is successful minimum one-level inquire address, can match the prefecture-level of " Hangzhou, Zhejiang province city " as postcode " 310001 ", and that " 320001 " can only match is provincial.For address string,, then its score is had the award of an additivity if the match is successful for its certain grade inquire address postcode and postcode identifying information.The basic award value of certain grade of coupling is MW, mates the different of rank and inquire address matching degree score S3 according to postcode simultaneously, and MW has been set the different weights of Pyatyi.Because prefecture-level, counties and districts' level postcode coupling is 4 postcode couplings, and provincial postcode coupling is 2 postcode couplings, so prefecture-level, counties and districts' level matching ratio is provincial higher weights, and for the inquire address of S3 〉=MT1 if obtain the checking of postcode, also should have higher weights.Concrete rule is as follows:
When matching when provincial
Figure GSA00000114562200111
Figure GSA00000114562200112
When matching when prefecture-level
Figure GSA00000114562200113
Figure GSA00000114562200114
When matching counties and districts' level
Figure GSA00000114562200121
(20)
The value of MW is to set according to the degree of accuracy of identification postcode, here we to set MW be 40, during match query during even postcode and DB gather, be believable relatively.
More than set up the whole process of matching result evaluation model, each the address string among the set DB all can be estimated score accordingly through estimating.So next, need which address string accurate presentation letter address of the addressee among the judgement set DB.Here chosen the simplest judgement mode, promptly inquire address score, the matched position of address string in character trail D etc. sort in score, each address string by estimating to the address string, choosing 1~2 the highest address string of sorting position analyzes, obtain final result, idiographic flow such as Fig. 4.
Illustrate: MT3 is the thresholding that five equilibrium is estimated in conclusive judgement, and the value of MT3 has two kinds of situations, when the postcode recognition result is not comprehensively gone into evaluation model, gets MT3=MT1+1 here; MT3=MW+MT1+1 when the postcode recognition result is comprehensively gone into the evaluation model type.
The judging process and the result of above evaluation decision pattern are described with several examples below:
Example 1 " Shanghai City Fu Zhoulu ", " Shanghai City " score 100>" Foochow " score 17 is so the result is " Shanghai City ".
Example 2 " 442000 Xiamen Utilities Electric Co. " " Xiamen " score 16 is because postcode exists and do not match, so according to knowledge.
Example 3 " Hang Chuanshi of Zhejiang Province ", " Hangzhou, Zhejiang province city " must be divided into 112, so the result is " Hangzhou, Zhejiang province city ".
Example 4 " full mountain area, Shanghai City " " mountain area, Shanghai City " must be divided into 112, and " Baoshan District, Shanghai City " must be divided into 112, so the result is " Shanghai City ".
3. this port address identification
The identification of this port address is to utilize this port address table that recognition result character trail D is mated, and obtains the delivery suboffice of match address correspondence in this oral thermometer or delivers the road segment information.The storage mode of this port address table, it comprises road address and two tables of organization, at this moment because road address and organization are two kinds of expression-forms of address of the addressee.The identification of this port address has also comprised coupling and has adjudicated two parts.
4. at the basic procedure of this port address identification as shown in Figure 5, it has comprised the coupling of road address table and the coupling of organization table.Simultaneously the coupling of each table is divided into fuzzy matching and accurate coupling two parts again, and a plurality of matching results of the likelihood of two tables are comprehensively judged by the uniformity and the postcode identifying information of delivery information, obtains letter sorting information.Each step below makes introductions all round.
4.1. fuzzy matching
The coupling here adopts two step couplings, fuzzy matching and accurate coupling, main cause is that the time loss of accurately coupling is very big, and the address entry capacity of road address table and organization table is very big, for raising speed, designed fuzzy matching algorithm fast, used this algorithm to carry out fuzzy matching and improve a relative very little Candidate Set for accurately mating.Before fuzzy matching, at first need road address table and organization table are carried out the extraction of docuterm, docuterm is that the length that extracts from link name or unit name is 3 character string, and extracting principle is the docuterm similitude minimum each other of all extractions in the table.Fuzzy matching utilizes docuterm to remove to mate Chinese recognition result, adopts the quick comparison algorithm of direct search, chooses the Candidate Set that matching degree is accurately mated greater than the clauses and subclauses conduct of a certain thresholding.Respectively road address table and organization table being carried out fuzzy matching obtains two Candidate Sets and becomes the fuzzy set of matches of road fuzzy matching collection and unit.
4.2. accurately mate
When fuzzy matching, be that 3 docuterm has replaced actual link name or unit name to mate owing to adopted length, it has just selected two fuzzy Candidate Sets, but does not represent the matching degree of real road name or unit name.Accurately coupling is exactly that each clauses and subclauses and character trail D in the fuzzy Candidate Set are once mated again, and Matching Algorithm has adopted the improved Smith-Waterman algorithm of above introducing.Following formula is adopted in the calculating of this port address matching degree (Sl):
Sl=Match/max(Lin,Rl)???????(21)
Wherein Match represents to mate character number, and Lin represents the string length of link name or unit name, and Rl is the length of the matched character string R of Smith-Waterman algorithm output.Because link name and the diversity of unit name and factor affecting such as similitude each other, accurately only choose after the coupling mate (Sl=1.0) fully clauses and subclauses as a result of.After the coupling through two tables, meeting 0 is to the result and 0 result to a plurality of units name of a plurality of link name so.The reason that produces a plurality of link name results is itself to comprise many road names among the character trail D, has comprised " people road " and " Zhongshan Road " such as " crossing, Zhongshan Road, people road ", has comprised " middle Shan Xilu " and " Shan Xilu " such as " middle Shan Xilu " again; And also have above situation for a plurality of units name.Cause that ambiguity also comprises the unit name that exists a plurality of names identical in the organization table in the matching result simultaneously, they belong to two different places in same city, perhaps have many roads of the same name.
Road result for coupling, because the different doorplates on same road belong to different delivery suboffices or deliver the road section,, think that at this moment number is the string number immediately following the road name so need to extract its number, can obtain the result of road+number after extracting number, otherwise have only road.Result to road+number inquires about in the road address table, may obtain well-determined delivery information, or many different delivery informations when road (many of the same name); For having only link name in the road address table, to inquire about, may obtain unique delivery information (road only belongs to and delivers suboffice or road section), many delivery informations (a plurality of road of the same name), uncertain delivery information (road belongs to a plurality of delivery suboffices or road section).More than to same road inquiry summed up three kinds of results, be called here and determine to repeat the road matching result, uncertain road matching result by the road matching result.And the unit inquiry has only two kinds of situations to determine unit matching result and uncertain unit matching result.
4.3. the accurately judgement of matching result
Result for accurate coupling produces owing to the multiple situation of above analyzing, need comprehensively adjudicate by information such as districts under postcode, the matched position, finally obtains correct letter sorting information.
Fig. 6 accurately has after the coupling a plurality of as a result the time when link table or unit table, relatively wait information to carry out verification mutually by postcode coupling, affiliated district coupling, matching result delivery suboffice or road section, pick out inaccurate information or redundancy, obtain unique delivery suboffice or road section.Behind the information checking of Fig. 6, obtained the unique or a plurality of delivery suboffice or the road section that obtain by road address table and organization table respectively, if have only one in the coupling of link name and unit name the suboffice of delivery or road section result are arranged, if it is unique to deliver suboffice or road section, then export this letter sorting information, otherwise do not have information.Deliver suboffice or road section result if link name and unit name coupling all exist, then need to obtain last letter sorting information by mutual verification.As shown in Figure 7, if both compare by delivering suboffice or road section, if have unique identical delivery suboffice or road section, export this delivery suboffice or road section as letter sorting information, the coupling delivery information result of road address itself is unique else if, adopt this information as letter sorting information, think under other situations that information is uncertain and can't determine to deliver suboffice or road section.
More than introduced automatic identification of letter and method for sorting that address base drives, it is applied in the identification module of letter sorting machine.Show that through practice under relatively accurate complete, the situation that discrimination guarantees substantially of address base, this method can effectively be carried out analysis correction to recognition result, obtains result accurately.The key that this method can successfully be used is the accuracy of address base, the especially selective typing of road address table information integrity and organization table in this port address storehouse.Simultaneously under comprising the situation of postcode and full address, the letter image also can obtain better result.

Claims (3)

1. the postal letter under an address information storehouse drives is by the location method for sorting, it is characterized in that, each destination address in the described address information storehouse has a kind of realization transit letter sorting that comprises at least, this mouthful of letter sorting and the textual representation of delivering the address information of using for the road section, the address textual representation of the described postal letter that will obtain by IMAQ and character recognition and the destination address in the described address information storehouse travel through coupling, obtain the letter sorting information of described postal letter according to matching degree, realize the transit letter sorting of postal letter, road section after this mouthful of letter sorting and this mouthful letter sorting is delivered.
2. the postal letter under address information as claimed in claim 1 storehouse drives is characterized in that by the location method for sorting step that described character recognition obtains the address textual representation of described postal letter comprises:
The letter image is analyzed, obtained the zone of address of the addressee;
Chinese character to the address area adopts partitioning algorithm to cut apart, and obtains behind the multirow literal every style of writing word being divided into a plurality of individual characters again;
Adopt Chinese character to adopt recognizer to discern to each individual character, obtain the address textual representation of described postal letter.
3. the postal letter under address information as claimed in claim 2 storehouse drives is by the location method for sorting, it is characterized in that, travel through coupling in the address textual representation of the described postal letter that will obtain by IMAQ and character recognition and the destination address in the described address information storehouse, when obtaining the letter sorting information of described postal letter according to matching degree, the verification of sorting information in conjunction with the postcode information of described postal letter.
CN2010101709498A 2010-05-11 2010-05-11 Method for sorting postal letters according to addresses driven by address information base Active CN101844135B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010101709498A CN101844135B (en) 2010-05-11 2010-05-11 Method for sorting postal letters according to addresses driven by address information base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010101709498A CN101844135B (en) 2010-05-11 2010-05-11 Method for sorting postal letters according to addresses driven by address information base

Publications (2)

Publication Number Publication Date
CN101844135A true CN101844135A (en) 2010-09-29
CN101844135B CN101844135B (en) 2013-05-08

Family

ID=42769059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010101709498A Active CN101844135B (en) 2010-05-11 2010-05-11 Method for sorting postal letters according to addresses driven by address information base

Country Status (1)

Country Link
CN (1) CN101844135B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314645A (en) * 2011-09-26 2012-01-11 深圳市络道科技有限公司 Address matching method and system
CN102750351A (en) * 2012-06-11 2012-10-24 迪尔码国际营销服务(北京)有限公司 Matching method of address information based on rules
CN103390163A (en) * 2012-05-10 2013-11-13 中邮科技有限责任公司 Letter address automatic-collection method
CN103909066A (en) * 2014-04-03 2014-07-09 上海邮政科学研究院 Vouchered postal material sorting method and system capable of verifying image information and network information
CN104166679A (en) * 2014-07-08 2014-11-26 北京迪威特科技有限公司 Address matching method for sorting
CN104281576A (en) * 2013-07-02 2015-01-14 威盛电子股份有限公司 Display method for landmark data
CN104899202A (en) * 2014-03-04 2015-09-09 华为技术有限公司 Information processing method and system
CN108376365A (en) * 2018-03-22 2018-08-07 中国银行股份有限公司 A kind of Bank Number determines method and device
CN111709680A (en) * 2020-05-29 2020-09-25 无锡医迈德科技有限公司 Method and system for acquiring warehouse-in and warehouse-out information based on invoice
CN111921872A (en) * 2020-07-14 2020-11-13 北京京东振世信息技术有限公司 Order sorting method and device, electronic equipment and readable storage medium
CN112297033A (en) * 2020-11-03 2021-02-02 泰州市华仕达机械制造有限公司 Automatic change robotic arm actuating system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1273542A (en) * 1997-11-04 2000-11-15 西门子公司 Method and system for recognising routing information on letters and parcels
JP2001126031A (en) * 1999-10-29 2001-05-11 Toshiba Corp Method and device for address recognition
JP2001232303A (en) * 2000-02-24 2001-08-28 Hitachi Ltd Address recognizing device and postal item sorting machine using the same
WO2007048564A1 (en) * 2005-10-24 2007-05-03 Siemens Aktiengesellschaft Method and apparatus for fingerprinting reject recovery and error reduction using interactive principles
US20090324105A1 (en) * 2008-06-27 2009-12-31 Kabushiki Kaisha Toshiba Video coding system, sorting system, coding method and sorting method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1273542A (en) * 1997-11-04 2000-11-15 西门子公司 Method and system for recognising routing information on letters and parcels
JP2001126031A (en) * 1999-10-29 2001-05-11 Toshiba Corp Method and device for address recognition
JP2001232303A (en) * 2000-02-24 2001-08-28 Hitachi Ltd Address recognizing device and postal item sorting machine using the same
WO2007048564A1 (en) * 2005-10-24 2007-05-03 Siemens Aktiengesellschaft Method and apparatus for fingerprinting reject recovery and error reduction using interactive principles
US20090324105A1 (en) * 2008-06-27 2009-12-31 Kabushiki Kaisha Toshiba Video coding system, sorting system, coding method and sorting method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《中国博士学位论文全文数据库》 20070228 娄正良 中文邮政地址识别研究 第62-98页 1-3 , *
《清华大学学报》 20060731 蒋焰 基于地址结构匹配的手写中文地址的切分与识别 1236页第1栏15行-第1237页23行 1-3 第46卷, 第7期 *
《计算机工程与应用》 20031110 黄磊等 信函自动分拣软件系统 第21-24、50页 1-3 , 第19期 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314645A (en) * 2011-09-26 2012-01-11 深圳市络道科技有限公司 Address matching method and system
CN103390163A (en) * 2012-05-10 2013-11-13 中邮科技有限责任公司 Letter address automatic-collection method
CN103390163B (en) * 2012-05-10 2016-12-14 中邮科技有限责任公司 A kind of Post address automatic acquiring method
CN102750351A (en) * 2012-06-11 2012-10-24 迪尔码国际营销服务(北京)有限公司 Matching method of address information based on rules
CN104281576A (en) * 2013-07-02 2015-01-14 威盛电子股份有限公司 Display method for landmark data
CN104899202B (en) * 2014-03-04 2019-03-19 华为技术有限公司 A kind of information processing method and system
CN104899202A (en) * 2014-03-04 2015-09-09 华为技术有限公司 Information processing method and system
CN103909066A (en) * 2014-04-03 2014-07-09 上海邮政科学研究院 Vouchered postal material sorting method and system capable of verifying image information and network information
CN104166679A (en) * 2014-07-08 2014-11-26 北京迪威特科技有限公司 Address matching method for sorting
CN104166679B (en) * 2014-07-08 2018-10-09 北京迪威特科技有限公司 A kind of address matching method for sorting
CN108376365A (en) * 2018-03-22 2018-08-07 中国银行股份有限公司 A kind of Bank Number determines method and device
CN108376365B (en) * 2018-03-22 2021-06-18 中国银行股份有限公司 Bank number determining method and device
CN111709680A (en) * 2020-05-29 2020-09-25 无锡医迈德科技有限公司 Method and system for acquiring warehouse-in and warehouse-out information based on invoice
CN111709680B (en) * 2020-05-29 2023-07-07 无锡医迈德科技有限公司 Invoice-based method and system for acquiring warehouse-in and warehouse-out information
CN111921872A (en) * 2020-07-14 2020-11-13 北京京东振世信息技术有限公司 Order sorting method and device, electronic equipment and readable storage medium
CN112297033A (en) * 2020-11-03 2021-02-02 泰州市华仕达机械制造有限公司 Automatic change robotic arm actuating system
CN112297033B (en) * 2020-11-03 2021-07-23 来安县珙武机械制造有限公司 Automatic change robotic arm actuating system

Also Published As

Publication number Publication date
CN101844135B (en) 2013-05-08

Similar Documents

Publication Publication Date Title
CN101844135B (en) Method for sorting postal letters according to addresses driven by address information base
US9727782B2 (en) Method for organizing large numbers of documents
CN103246670B (en) Microblogging sequence, search, methods of exhibiting and system
CN102456022B (en) Short message management method and system
CN104933023B (en) Chinese address participle mask method
CN104199840B (en) Intelligent place name identification technology based on statistical model
CN101719128A (en) Fuzzy matching-based Chinese geo-code determination method
JP6835713B2 (en) Accounting support system
CN107368471B (en) Method for extracting place name address from webpage text
CN104915334A (en) Automatic extraction method of key information of bidding project based on semantic analysis
CN107301197A (en) A kind of business datum tracking processing system and method
CN110347777A (en) A kind of classification method, device, server and the storage medium of point of interest POI
CN102314645A (en) Address matching method and system
CN110599289A (en) Method for formatting official document
CN109815340A (en) A kind of construction method of national culture information resources knowledge mapping
CN103268330A (en) User interest extraction method based on image content
CN102404249A (en) Method and device for filtering junk emails based on coordinated training
CN102479230A (en) Method and device for extracting geographical feature words
CN104133861B (en) A kind of method of the international air ticket freight rate list of intelligently parsing excel forms
CN103929499A (en) Internet of things heterogeneous identification recognition method and system
CN108494977A (en) The recognition methods of note number, device and system
CN103457829B (en) Email processing method based on helpdesk automatic mail system and system
Kothari et al. Transfer of supervision for improved address standardization
CN105537131B (en) A kind of mail sorting systems based on diversified information synergism
CN115563096A (en) Closed park deduplication method based on longitude and latitude

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20191008

Address after: 200062 Putuo District, Zhongshan North Road, No. 3185,

Patentee after: China Post Science and Technology Co., Ltd.

Address before: 200062 Putuo District, Zhongshan North Road, No. 3185,

Patentee before: Shanghai Post Science Inst.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: No. 3185, Zhongshan North Road, Putuo District, Shanghai 200333

Patentee after: China Post Technology Co.,Ltd.

Address before: 200062 No. 3185, Putuo District, Shanghai, Zhongshan North Road

Patentee before: CHINA POST SCIENCE AND TECHNOLOGY Co.,Ltd.