CN101844135B - Method for sorting postal letters according to addresses driven by address information base - Google Patents

Method for sorting postal letters according to addresses driven by address information base Download PDF

Info

Publication number
CN101844135B
CN101844135B CN2010101709498A CN201010170949A CN101844135B CN 101844135 B CN101844135 B CN 101844135B CN 2010101709498 A CN2010101709498 A CN 2010101709498A CN 201010170949 A CN201010170949 A CN 201010170949A CN 101844135 B CN101844135 B CN 101844135B
Authority
CN
China
Prior art keywords
address
sorting
information
character
road
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2010101709498A
Other languages
Chinese (zh)
Other versions
CN101844135A (en
Inventor
吕岳
范生淼
吕淑静
屠晓
姚心宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Post Technology Co.,Ltd.
Original Assignee
SHANGHAI POST SCIENCE INST
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI POST SCIENCE INST filed Critical SHANGHAI POST SCIENCE INST
Priority to CN2010101709498A priority Critical patent/CN101844135B/en
Publication of CN101844135A publication Critical patent/CN101844135A/en
Application granted granted Critical
Publication of CN101844135B publication Critical patent/CN101844135B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method for sorting postal letters according to addresses driven by an address information base, wherein each delivery address in the address information base at least comprises a character expression containing address information for realizing transit sorting, local port sorting and road sections; the address character expression obtained by image acquisition and character identification is in transversal matching with the delivery addresses in the address information base; and sorting information of the postal letters can be obtained according to the matching degree, so that the transit sorting, the local port sorting and the road section delivery after the local port sorting can be realized. The invention is actually applied to an identification module for an information sorter; and under the conditions that the address base is relatively accurate and complete and the identification rate is basically ensured, the method can be effectively used for analyzing and correcting a letter address identification result to obtain accurate letter sorting information, i.e. the method can be used for sorting the letters in complete accordance with the letter address identification result.

Description

Postal letter under a kind of address information storehouse drives is by the location method for sorting
Technical field
The invention belongs to postal technical field, particularly the postal letter under a kind of address information storehouse driving is by the location method for sorting.
Background technology
Along with development economic and society, the letter amount constantly increases, and traditional hand-sorted can't satisfy actual needs, uses the automatically sorting letters machine to become a kind of trend.The automatically sorting letters machine is by gathering envelop image and it is carried out identifying processing, then according to recognition result, letter being sorted.If existing automatically sorting letters owner sorts letter according to the postcode recognition result, also useful address identifying information method that postcode is replenished and revised simultaneously.
If the sorting target be with letter sorting to delivery office, only rely on 6 postcodes can realize sorting, if but still can't sort when there is no postcode on envelope.If the sorting target is the section of being sorted to, namely needing letter sorting in same postcode scope is as the road section, and such sorting requirement can't rely on postcode to realize.
Summary of the invention
The purpose of this invention is to provide automatic the identification by the location method for sorting, to solve efficient and the not good enough problem of accuracy in present letter sorting technology of Post address that a kind of address base drives.
technical scheme of the present invention is, postal letter under a kind of address information storehouse drives is by the location method for sorting, each destination address in described address information storehouse has at least and realizes transit sorting a kind of comprising, this mouthful of sorting and for the textual representation of the address information of road section delivery, destination address in the address textual representation of the described postal letter that will obtain by IMAQ and character recognition and described address information storehouse travels through coupling, obtain the sorting information of described postal letter according to matching degree, realize the transit sorting of postal letter, road section after this mouthful of sorting and this mouthful sorting is delivered.
Further, the step of the address textual representation of the described postal letter of described character recognition acquisition comprises:
The letter image is analyzed, obtained the zone of address of the addressee;
Chinese character to the address area adopts partitioning algorithm to cut apart, and obtains after multline text, every style of writing word being divided into a plurality of individual characters again;
Adopt Chinese character to adopt recognizer to identify to each individual character, obtain the address textual representation of described postal letter.
Further, destination address in the address textual representation of the described postal letter that will obtain by IMAQ and character recognition and described address information storehouse travels through coupling, when obtaining the sorting information of described postal letter according to matching degree, the verification of sorting information in conjunction with the postcode information of described postal letter.
Address information of the present invention storehouse drives and comprises: address base that comprises the normal address data entries of model, and every width letter image is obtained a recognition result that comprises address information by mode identification technology, for every in address base inquire address data entries, mate in recognition result, obtain the highest inquire address data entries of matching degree, analyze matching degree information, export this entry reserved portion is chosen information if meet the demands, otherwise do not sort information.
The present invention is applied in the identification module of letter sorting machine, in the situation that relatively accurate complete, the discrimination basic guarantee of address base, the method can effectively be carried out analysis correction to recognition result, obtain result accurately, namely can rely on fully the Post address recognition result is carried out letter sorting.
Description of drawings
Fig. 1 is embodiment of the present invention interline counter address table structure
Fig. 2 is the Address Recognition flow chart that in the embodiment of the present invention, address base drives
Fig. 3 is that in the embodiment of the present invention, address base drives block diagram
Fig. 4 is embodiment of the present invention interline counter address court verdict flow chart
Fig. 5 is this port address identification process figure in the embodiment of the present invention
Fig. 6 is the postcode of multiple tracks road in the embodiment of the present invention, unit matching result, affiliated district information checking FB(flow block)
Fig. 7 is road and the mutual checking process figure of unit coupling delivery information result in the embodiment of the present invention
The specific embodiment
Elaborate for the specific embodiment of the present invention below in conjunction with accompanying drawing.
Send the difference on ground to according to letter, letter sorting divides dual mode: the sorting of this mouth and transit sorting.This mouthful of sorting is that letter is the letter of sorter location with sending to, and it need to be accurate to the sorting of delivering suboffice or delivering the road section; The transit sorting is that letter is other the regional letters except the sorter location with sending to, and it need to be accurate to sorting provincial, prefecture-level or counties and districts' level by the difference of sort plan.For these two kinds of sorting mode, all need to set up the Chinese address identification that mutually deserved address base drives letter.
For transit address storage format, the transit address table structure that represents as shown in Figure 1, formed by the three grades of administrative divisions in the whole nation, respectively 31 provincial administrative divisions (as Zhejiang Province), be its prefecture-level administrative division (as the Hangzhou) under each provincial administrative division, each prefecture-level administrative division comprises again the administrative division (as Yuhang District) of a plurality of counties and districts level.Each administrative division marked should the zone in the scope of postcode, as such as front four of the postcode of Yuhang District is 3111, rear two uncertain, so represent with 3111xx, x represents any 0~9 numeral.What specify is that the postcode of prefecture-level city's mark is the postcode of its inner city.Set up thus one cover gazetteer that the whole nation is accurate to the counties and districts administrative divisions with and corresponding postcode.Each address entry during the present invention will show is called inquire address.
For this port address storage format, this port address is divided into two of road address and organizations.
For road address information, adopt road to represent to the form of the relevant combination of number.And for organization information, its contents diversification can be various other address expression-forms except the name of road such as certain residential quarter, company, mansion, office, school, little place name.At first need to remove the information of some redundancies in these address informations, such as this mouthful unit information for Zhejiang Hangzhou, arrange in " Zhejiang ", " Zhejiang Province ", the everyday words such as " Hangzhou " " Hangzhou " " company " " Co., Ltd " are as redundancy, removing these information does not affect the expression of address, so we are not affecting lower these information of removing under the principle of expressing the address.Expression after removal is called the abbreviation of this address.Such as:
" the Zhejiang Province People's Government " referred to as " provincial people's government "
The abbreviation of " Zhejiang University " is appointed and is " Zhejiang University "
" Hangzhou Co., Ltd of Alibaba " referred to as " Alibaba "
The advantage that adopt to be called for short is, because this mouthful unit information is longer, the difficulty of coupling is very large fully, and as very high in phrase frequency of occurrences in the address in " Zhejiang Province ", the differentiation of similar address that can have a strong impact on two.Adopt and be called for short and than the interference of going out of large program, improve matching degree.
Be that be called for short the address for address entry after the removal redundancy, next will extract keyword as retrieval, for follow-up coupling is prepared.Here the definition of keyword is: the word that continuous three words in certain address form, and also the frequency that this word occurs in other addresses is minimum.
The representation of road address information is as follows:
Sequence number The road name Parity flag Starting symbol Stop number Postcode Under the district and Deliver suboffice The road segment number Retrieval
1 Chao Wanglu Very 7 53 210014 Xiacheng District
2 Moral victory road Even 328 10000 310015 The Gongshu District
" road name ": i.e. the title of road.Do etc. as Zhongshan North Road, century avenue, Chang Qingjie, Zu Miaoxiang, peach blossom.Note any punctuation mark to occur in title.
" parity flag ": the delivery number of indicating this section road is only odd number, only still is consecutive number for even number.
" starting symbol ": this section road is delivered the starting symbol of number.
" stop number ": this section road is delivered the termination number of number.If termination the unknown of this section road is defined as " 9999 " (when odd number or all number) or " 9998 " (during even number).
" postcode ": the postcode of this section road.
" affiliated district ": the district at this section road place.Wherein, the district refers to districts under city administration, county-level city, county etc.
" section road numbering ": section road numbering under this section road.
The representation of organization information is as follows:
Every unit information stores by row, comprises " unit name ", " postcode ", " affiliated district ", " actual address ", " being called for short 1 ", " being called for short 2 ", " being called for short 3 ", " section road numbering ", " remarks " 9 attributes, and every information stores by row.The delivery information of unit address as shown in the table:
A B C D E F G H I
1 The unit name Postcode Affiliated district Actual address Be called for short 1 Be called for short 2 Be called for short 3 Section road numbering Remarks
2 Xihu District of Hangzhou City people's court 310012 Xihu District No. 9, literary composition two West Road Xihu District people's court Xihu District law court
" unit name ": i.e. the title of unit.For example Zhejiang University, perseverance encourage mansion, Xihu District of Hangzhou City people's court, in melt City Garden etc.Title must be write full name, and for example " Zhejiang Province higher people's court " cannot be written as " provincial high people's courts ", but " provincial high people's courts " can write " being called for short a 1 " hurdle.Note any punctuation mark to occur in title.
" postcode ": the postcode of this unit location.
" affiliated district ": the district at this unit place.Wherein, the district refers to districts under city administration, county-level city, county etc.
" actual address ": the actual address at the unit place that this unit information is represented.As shown in figure 10.
" be called for short 1 ": the abbreviation of the unit that this unit information is represented.If without being sky.As shown in figure 10.
" be called for short 2 ": the abbreviation of the unit that this unit information is represented.If without being sky.
" be called for short 3 ": the abbreviation of the unit that this unit information is represented.If without being sky.
" section road numbering ": section road numbering under this unit actual address.
" remarks ": remark information.
Special circumstances wherein have:
[A] in XX, XX mill, XX garden, XX village, XX garden, XX residential quarter and similar this residential quarters in most cases all leave in " unit " worksheet.If belong to different delivery offices but a certain residential quarters occur, it is stored in " road " worksheet.For example garden, Baima belongs to first delivery office for No. 1-20, belongs to second delivery office for No. 21-40.Garden, Baima is deposited in " road " worksheet rather than " unit " worksheet.Store form is as follows:
" road " worksheet in the .xls of first delivery office
A B C D E F G H
1 The road name Odd even is complete Starting symbol Stop number Postcode Affiliated district Section road numbering Remarks
2 Garden, Baima Entirely 1 20 111111 Baiyun District
" road " worksheet in the .xls of second delivery office
A B C D E F G H
1 The road name Odd even is complete Starting symbol Stop number Postcode Affiliated district Section road numbering Remarks
2 Garden, Baima Entirely 21 40 111111 Baiyun District
[B] owing to punctuation mark can not occurring in " unit name ", need be expressed as " East Lake fragrant pavilion water bank " for the title of " East Lake fragrant pavilion water bank " this form.And for the title of " Shahu Lake village (residential district, former Shahu Lake) " this form, bracket can be removed, " residential district, former Shahu Lake " this annotation information is put into corresponding " remarks " hurdle.
[C] should deposit in " road " worksheet for postal private letter box, and Store form is as follows.And other mailbox can not be deposited in " road " worksheet.
Postal private letter box Store form table
A B C D E F G H
1 The road name Odd even is complete Starting symbol Stop number Postcode Affiliated district Section road numbering Remarks
2 The postal private letter box in XX city Very 1521 1521 111111 Baiyun District
[D] for the army that the Arabic numerals designation is arranged, for example " 73022 army " should deposit in " road " worksheet Store form such as following table.
The army that the Arabic numerals designation is arranged
A B C D E F G H
1 The road name Odd even is complete Starting symbol Stop number Postcode Affiliated district Section road numbering Remarks
2 Army Even 73022 73022 111111 Baiyun District
The army of [E] and other deposits in " unit " worksheet as " the 8th squadron of People's Armed Police ", and is as shown in the table.
The army that does not contain the Arabic numerals designation
A B C D E F G H I
1 The unit name Postcode Affiliated district Actual address Be called for short 1 Be called for short 2 Be called for short 3 Section road numbering Remarks
2 The 8th army of the Armed Police squadron 111111 Baiyun District No. 222, Baiyun Road
For a width letter image, in order to obtain final sorting information, need to use the methods such as graphical analysis, Chinese Character Recognition, data base querying that image is processed.Fig. 2 is the basic step that image information is processed, and at first needs the letter image is analyzed, and obtains the zone of address of the addressee; Again the Chinese character of address area is cut apart by row, obtained multline text; Then adopt the first and second, two kinds of Chinese character segmentation algorithms to be divided into individual character to every style of writing word; For the individual character that the first algorithm obtains, adopted respectively L and W Chinese Character Recognition algorithm that individual character is identified, and for the individual character that the second algorithm obtains, used H Chinese Character Recognition algorithm to carry out individual character identification; Driving algorithm in FA final address storehouse need to sort information in conjunction with recognition result and the final sorter of address base acquisition of information of L, W, three kinds of algorithms of H.Here the first and second Chinese character segmentation algorithms can be a kind of in the Chinese character segmentation algorithm, and L, W, three kinds of algorithms of H can be a kind of in the Chinese Character Recognition algorithm.
The technical scheme core that the present invention proposes is that address base drives, its basic thought is address base that comprises the normal address data entries of model, and every width letter image is obtained a recognition result that comprises address information by mode identification technology, for every in address base inquire address data entries, mate in recognition result, obtain the highest inquire address data entries of matching degree, analyze the information such as matching degree, export this entry if meet the demands reserved portion is chosen information, otherwise there is no information.Drive its basic procedure such as Fig. 3 for address base.
Address base drives is input as three kinds of Chinese Character Recognition results (being respectively H, L, W algorithm), can see that in Fig. 2 L, W algorithm use identical word partitioning algorithm, and the H algorithm has used another kind of word partitioning algorithm, therefore the recognition result character string of L, W algorithm has identical length, and the length of the recognition result character string of H algorithm is different from front two kinds of possibilities, therefore at first these three kinds of recognition results are alignd, produce a character trail D, there is 1 to 3 candidate word (being respectively the recognition result of H, L, W) its each position.For character trail D, if need to carry out the transit sorting, use transit table address entry and D to mate, judgement obtains sorting information; If need to carry out this mouthful sorting, use this port address table to mate, judgement obtains this mouthful sorting information; Mix sorting if carry out this transit, first carry out the transit Address Recognition, if when result is this message letter, then carry out the identification of this port address.
Below narrated respectively for the problem that relates in result identification.
1. the foundation of recognition result alignment and recognition result character trail
For the coupling that facilitates the address base driving and the recognition result that takes full advantage of three kinds of algorithms (H, L, W algorithm), at first need three kinds of identifications are comprehensively obtained the character trail D of an optimization, each word of this set D has 1 to 3 candidate word, be respectively (recognition result of H, L, W algorithm), and candidate word sorts according to priority.If the recognition result character string of H, L, W algorithm is respectively Hr, Lr, Wr, the length of character string is respectively Hl, Ll, Wl, and Ll equates with Wl so, and Hl not necessarily equates.In order not guarantee not reject useful information, the string length Dl after alignment is the maximum length of Hl, Ll, Wl, namely
Dl=max(Hl,Ll)
Adopted the Needleman-Wunsch algorithm to carry out registration process to recognition result here, due to Lr in recognition result, Wr aligns, therefore only needing Hr to align with Lr or Wr gets final product, namely in when coupling as long as in the character in Hr and Lr or Wr, any one of two characters of same position is identical, think character in Hr and the character match of Wr and Hr.Improve for this has carried out some to the Neeldeman-Wunsch algorithm, the below is the introduction of improved Needleman-Wunsch algorithm:
Primary condition: M (i, 0)=M (0, j)=0 (0≤i≤Ll, 0≤j≤Hl)
Tx(i,0)=Tx(0,j)=0 (0≤i≤Ll,0≤j≤Hl)
Ty(i,0)=Ty(0,j)=0 (0≤i≤Ll,0≤j≤Hl)
The recurrence condition:
Figure GSA00000114562200071
Figure GSA00000114562200072
Figure GSA00000114562200073
Figure GSA00000114562200074
M wherein, Tx, Ty are the matrix of (Ll+1) * (Hl+1), and M is the matching score matrix, Tx, Ty is for recalling matrix, and each unit of expression M is which adjacent unit obtains, and Tx records the position of x direction, and Ty records the position of direction.σ is scoring function, and as Hr (j) and Lr (i), Wr (i) is when any is equal, and matching score is Mat; When unequal, mispairing must be divided into Mis.And add punishing of space be divided into W.The value of each unit of M relies on the value of its left side, upper left, upper three directions simultaneously.Here we to design Mat be that 2, Mis is that-1, W is-2.Date back to forward (0,0) from (Ll, the Hl) of matrix, according to the sensing of recalling matrix, the character string Hd after being alignd, Wd, Ld, they have formed character trail D, and length is Dl=max (Hl, Ll).
2. transit Address Recognition
The transit Address Recognition has comprised two parts, the judgement of the coupling of transit address base and transit matching result.
2.1. the coupling of transit address base
Can see in Fig. 1, the inquire address in the transit address table has three types: provincial address, prefecture-level address, counties and districts' level address.And for each inquire address, can resolve to two parts, referred to herein as place name and level another name.Such as " Beijing ", Beijing is place name, and the city is the level another name, and place name has comprised most information with regard to an address, and the level another name is all identical to a lot of addresses, is mainly " city " " province " " autonomous region " " county " " district " etc. at transit table middle rank another name.For an identification character collection D, in general to mate each inquire address, calculate its matching degree.Having adopted the Smith-Waterman algorithm to carry out matching score here calculates, Smith-Waterman algorithm input inquiry sequence is a certain address of transit table, and because the storehouse sequence of Smith-Waterman algorithm input is the character trail D that three candidate words are arranged, therefore the Smith-Waterman algorithm is improved.
At first improved Smith-Waterman algorithm is introduced.If certain address in the transit table is character string Q, its length is Ql, and the below is the formula of improved Smith-Waterman algorithm:
Primary condition:
M(i,0)=E(i,0)=F(i,0)=0 (0≤i≤Ql)
M(0,j)=E(0,j)=F(0,j)=0 (0≤j≤Dl)
The recurrence condition:
E(i,j)=max{E(i,j-1)-r,M(i,j-1)-q-r,0} (5)
F(i,j)=max(F(i-1,j)-r,M(i-1,j)-q-r,0} (6)
M(i,j)=max{0,M(i-1,j-1)+σ(Q(i),D(j)),E(i,j),F(i,j)} (7)
Wherein, M, E, F are the matrix of (Ql+1) * (Dl+1), and σ is scoring function, and q is room exploitation punishment, and r is that punishment, Mat matching score, Mis bit mismatch score are extended in the room.
For every inquire address in the transit address table, calculate by the Smith-Waterman algorithm, all obtain one section character string R from character trail D, the matching degree of this section character string R and this inquire address maximum, and the position of R in D.Because there are subordinate relation in province and district city, counties and districts between the third-level address, in order to reduce the matching times of address table, following Fig. 4 of coupling flow process of transit address table.
Through through the coupling of transit address table, obtain matching degree and form set DA greater than the address of setting thresholding, comprised in set DA all each province that satisfy thresholding, districts and cities, counties and districts etc. the inquire address entry of different stage.Subordinate relation according to address in the transit address table, if the inquire address in the DA set has subordinate relation, it is combined into an inquire address entry, such as having comprised three of " Zhejiang Province " " Hangzhous " " Taizhou plain " information in the DA set, will be combined into two information of " Hangzhou, Zhejiang province city " and " City of Taizhou ".Be called set DB according to the set DA after the subordinate relation combination.
2.2. the judgement of transit matching result
In set DB, each entry is referred to as the address string, and the address string can be comprised of 1~3 inquire address, such as " Beijing ", " Pudong New Area, Shanghai ", " Hangzhou, Zhejiang province city Yuhang District " is respectively the address string that 1,2,3 inquire addresses form.DB is the set that comprises one or more address string, in order therefrom to choose a correct address string, need to set up the evaluation model of matching result and adjudicate.For each inquire address, all have following several information: matching degree, matched position postcode.If comprise postcode in recognition result here, can extract the information of postcode identification.
At first this model need to be set up the point system of matching degree, and concrete steps are as follows:
[A] is divided into place name+level another name two parts with inquire address, and length is a1 and a2
[B] inquiry place name and rank name be character match number b1 and the b2 in matched character string R (length Rl) respectively
[C] set place name and level another name fully the weights of coupling be c1, c2, c1=4 wherein, c2=1
[D] calculates matching score
S1=(c1*b1/a1+c2*b2/a2)/(c1+c2) (9)
[E] mated fully to place name to be rewarded
Figure GSA00000114562200101
By formula as can be known S2 be that 1.0 o'clock inquire addresses mate fully.
Weights when [F] sets inquire address and mate with Incomplete matching fully are respectively m1, m2, m1=100 wherein, m2=20.
Figure GSA00000114562200102
Distinguishing the weights of coupling and Incomplete matching fully, is when mating fully due to inquire address, and we think that this identifying information can not cause any ambiguity.S3 has represented the score of each inquire address, and best result is 100, by formula as can be known, if in inquire address, place name is mated S3 〉=16 in character trail D.Reacted address information due to place name generally, we choose thresholding MT1=16, think the basic trusted of inquire address.Mate b2/a2=1 fully and call when level, when the place name matching degree is b1/a1=0.5, such as content in character trail D is " Hang Chuanshi of Zhejiang Province ", the matching degree score S3=12 of inquire address " Hangzhou " so, this moment, we thought that this inquire address comprises the part address information, may be by the factors such as exclusiveness of other information such as postcode, its superior and the subordinate's address association, place name, determine that " Hangzhou " is correct information, therefore choose thresholding MT2=12, think that inquire address has available address information.
[G] forms set DA with S3 at thresholding MT2 and above inquire address, and by the subordinate relation of DA according to inquire address, obtains gathering DB.Next model need to carry out the score evaluation to each the address string in DB.If the address is gone here and there to such an extent that be divided into S4, the score of maximum three grades of inquire addresses (provincial, prefecture-level, counties and districts' level) that it comprises is respectively ss1, ss2, and ss3 (must be divided into 0 when not existing), by following judgment criterion:
When (1) arbitrary inquire address score is equal to or greater than MT1 in go here and there the address so,
S4=ss1+ss2+ss3 (12)
(2) when having all inquire address scores less than MT1 in address string (when existing must more than or equal to MT2), matched position according to inquire address in D, the ways of writing that whether meets Chinese address by matched position, namely whether write by orders provincial, prefecture-level, counties and districts, get different values:
If the S4=ss1+ss2+ss3 matched position meets sequential write (13)
S4=max (ss1, ss2, ss3) does not meet sequential write (14) if write
According to above accurate, we have obtained the score of each address string, and above-mentioned " Hang Chuanshi of Zhejiang Province " score should be S4 and should be 112.
[H] if there is not the postcode identifying information, S4 is namely the final score of address string; If the postcode identifying information exists, the postcode identifying information is added the score-system of address string.When the postcode identifying information exists, use the postcode of every grade of inquire address in the string of identification postcode and address to compare, obtain can the match is successful minimum one-level inquire address, can match the prefecture-level of " Hangzhou, Zhejiang province city " as postcode " 310001 ", and that " 320001 " can only match is provincial.For address string, if the match is successful for its certain grade inquire address postcode and postcode identifying information, its score is had the award of an additivity.The basic award value of certain grade of coupling is MW, according to the difference of postcode coupling rank and inquire address matching degree score S3, MW has been set the different weights of Pyatyi simultaneously.Because prefecture-level, counties and districts' level postcode coupling are 4 postcode couplings, and provincial postcode coupling is 2 postcode couplings, so prefecture-level, counties and districts' level matching ratio is provincial higher weights, and if obtain the checking of postcode for the inquire address of S3 〉=MT1, also should have higher weights.Specific rules is as follows:
When matching when provincial
Figure GSA00000114562200111
Figure GSA00000114562200112
When matching when prefecture-level
Figure GSA00000114562200113
Figure GSA00000114562200114
When matching counties and districts' level
Figure GSA00000114562200121
Figure GSA00000114562200122
(20)
The value of MW is to set according to the degree of accuracy of identification postcode, here we to set MW be 40, during match query during even postcode and DB gather, be relatively believable.
More than set up the whole process of matching result evaluation model, each the address string in set DB can obtain estimating accordingly score through estimating.So next, need which address string accurate presentation letter address of the addressee in judgement set DB.Here chosen the mode of the most simply adjudicating, namely in score, each address string, inquire address score, the matched position of address string in character trail D etc. sort by estimating to the address string, choosing 1~2 the highest address string of sorting position analyzes, obtain final result, idiographic flow such as Fig. 4.
Illustrate: MT3 is the thresholding that decile is estimated in conclusive judgement, and the value of MT3 has two kinds of situations, when the postcode recognition result does not comprehensively enter evaluation model, gets MT3=MT1+1 here; MT3=MW+MT1+1 when the postcode recognition result comprehensively enters the evaluation model type.
The below illustrates judging process and the result of above evaluation decision pattern with several examples:
Example 1 " Shanghai City Fuzhou road ", " Shanghai City " score 100>" Foochow " score 17 is therefore result is " Shanghai City ".
Example 2 " 442000 Xiamen Utilities Electric Co. " " Xiamen " score 16 is because postcode exists and do not mate, therefore according to knowledge.
Example 3 " Hang Chuanshi of Zhejiang Province ", " Hangzhou, Zhejiang province city " must be divided into 112, therefore result is " Hangzhou, Zhejiang province city ".
Example 4 " full mountain area, Shanghai City " " Jinshan District " must be divided into 112, and " Baoshan District, Shanghai " must be divided into 112, therefore result is " Shanghai City ".
3. this port address identification
The identification of this port address is to utilize this port address table that recognition result character trail D is mated, and obtains the delivery suboffice of match address correspondence in this oral thermometer or delivers the road segment information.The storage mode of this port address table, it comprises road address and two tables of organization, is at this moment two kinds of expression-forms of address of the addressee due to road address and organization.The identification of this port address has also comprised coupling and has adjudicated two parts.
4. at the basic procedure of this port address identification as shown in Figure 5, it has comprised the coupling of road address table and the coupling of organization table.Simultaneously the coupling of each table is divided into again fuzzy matching and exact matching two parts, and a plurality of matching results of the likelihood of two tables uniformity and postcode identifying information by delivery information comprehensively judged, obtains sorting information.The below's each step that makes introductions all round.
4.1. fuzzy matching
The coupling here adopts two step couplings, fuzzy matching and exact matching, main cause is that the time loss of exact matching is very large, and the address entry capacity of road address table and organization table is very large, for raising speed, designed fuzzy matching algorithm fast, used this algorithm to carry out fuzzy matching and improve a relative very little Candidate Set as exact matching.Before fuzzy matching, at first need road address table and organization table are carried out the extraction of docuterm, docuterm is that the length that extracts from link name or unit name is 3 character string, and the docuterm similitude each other that extracts principle and be all extractions in table is minimum.Fuzzy matching utilizes docuterm to remove to mate Chinese recognition result, adopts the quick comparison algorithm of direct search, chooses matching degree greater than the entry of a certain thresholding Candidate Set as exact matching.Respectively road address table and organization table being carried out fuzzy matching obtains two Candidate Sets and becomes the fuzzy set of matches of road fuzzy matching collection and unit.
4.2. exact matching
When fuzzy matching, be that 3 docuterm has replaced actual link name or unit name to mate owing to having adopted length, it has just selected two fuzzy Candidate Sets, but does not represent the matching degree of real road name or unit name.Exact matching once mates each entry in fuzzy Candidate Set and character trail D exactly again, and the algorithm of coupling has adopted the improved Smith-Waterman algorithm of above introducing.Following formula is adopted in the calculating of this port address matching degree (Sl):
Sl=Match/max(Lin,Rl) (21)
Wherein Match represents to mate character number, and Lin represents the string length of link name or unit name, and Rl is the length of the matched character string R of Smith-Waterman algorithm output.Due to the diversity of link name and unit name and the impact of the factors such as similitude each other, only choose the entry of coupling (Sl=1.0) fully after exact matching as a result of.After coupling through two tables, meeting 0 is to result and 0 result to a plurality of units name of a plurality of link name so.The reason that produces a plurality of link name results is itself to comprise many road names in character trail D, such as " crossing, Zhongshan Road, people road " comprised " people road " and " Zhongshan Road ", again such as " middle Shan Xilu " comprised " middle Shan Xilu " and " Shan Xilu "; And also have above situation for a plurality of units name.Cause that ambiguity also comprises the unit name that exists a plurality of names identical in the organization table in matching result simultaneously, they belong to two different places in same city, perhaps have many roads of the same name.
Road result for coupling, differently deliver suboffices or deliver the road section because the different doorplates on same road belong to, therefore need to extract its number, think that at this moment number is the string number immediately following the road name, can obtain the result of road+number after extracting number, otherwise only have road.Result to road+number is inquired about in the road address table, may obtain well-determined delivery information, or many different delivery informations (during many roads of the same name); Inquire about in the road address table for only having link name, may obtain unique delivery information (road only belongs to and delivers suboffice or road section), many delivery informations (a plurality of road of the same name), uncertain delivery information (road belongs to a plurality of delivery suboffices or road section).Above same path is inquired about summed up three kinds of results, is called here to determine to repeat the road matching result, uncertain road matching result by the road matching result.And the unit inquiry only has two kinds of situations to determine unit matching result and uncertain unit matching result.
4.3. the judgement of exact matching result
For the result that exact matching produces, due to the multiple situation of above analyzing, need to comprehensively adjudicate by information such as districts under postcode, matched position, finally obtain correct sorting information.
Fig. 6 is a plurality of as a result the time when having after link table or unit table exact matching, by postcode mate, affiliated district coupling, matching result delivers suboffice or the road section relatively waits information to carry out verification mutually, pick out inaccurate information or redundancy, obtain unique delivery suboffice or road section.After the information checking of Fig. 6, obtained respectively the unique or a plurality of delivery suboffice or the road section that are obtained by road address table and organization table, if only have one the suboffice of delivery or road section result are arranged in the coupling of link name and unit name, if it is unique to deliver suboffice or road section, export this sorting information, otherwise without information.Deliver suboffice or road section result if link name and unit name coupling all exist, need to obtain last sorting information by mutual verification.As shown in Figure 7, if both compare by delivering suboffice or road section, if have unique identical delivery suboffice or road section, export this delivery suboffice or road section as sorting information, the coupling delivery information result of road address itself is unique else if, adopt this information as sorting information, think in other situations that information is uncertain and can't determine to deliver suboffice or road section.
More than introduced the automatic identification of letter and method for sorting that address base drives, it is applied in the identification module of letter sorting machine.Show through practice, in the situation that relatively accurate complete, the discrimination basic guarantee of address base, the method can effectively be carried out analysis correction to recognition result, obtains result accurately.The key that the method can successfully be used is the accuracy of address base, the especially selective typing of road address table information integrity and organization table in this port address storehouse.Simultaneously in the situation that the letter image comprises postcode and full address also can be obtained better result.

Claims (1)

1. an address information storehouse drives lower postal letter and presses the location method for sorting, it is characterized in that, each destination address in described address information storehouse has at least and realizes transit sorting a kind of comprising, this mouthful of sorting and for the textual representation of the address information of road section delivery, destination address in the address textual representation of the described postal letter that will obtain by IMAQ and character recognition and described address information storehouse travels through coupling, obtain the sorting information of described postal letter according to matching degree, realize the transit sorting of postal letter, road section after this mouthful of sorting and this mouthful sorting is delivered,
The step that described character recognition obtains the address textual representation of described postal letter comprises:
The letter image is analyzed, obtained the zone of address of the addressee;
Chinese character to the address area adopts partitioning algorithm to cut apart, and obtains after multline text, every style of writing word being divided into a plurality of individual characters again;
Adopt the Chinese Character Recognition algorithm to identify to each individual character, obtain the address textual representation of described postal letter;
Wherein,
Adopt the first Chinese character segmentation algorithm and the second Chinese character segmentation algorithm that every style of writing word is divided into individual character, the individual character that obtains for the first Chinese character segmentation algorithm, adopt respectively L Chinese Character Recognition algorithm and W Chinese Character Recognition algorithm that individual character is identified, the individual character that obtains for the second Chinese character segmentation algorithm, adopt H Chinese Character Recognition algorithm to carry out individual character identification
For H, L and W Chinese Character Recognition algorithm, these three kinds of recognizer results are alignd, produce a character trail D, there is 1 to 3 candidate word each position of this character trail D, for character trail D, if need to carry out the transit sorting, use transit table address entry and D to mate, judgement obtains sorting information; If need to carry out this mouthful sorting, use this port address table to mate, judgement obtains this mouthful sorting information; Mix sorting if carry out this transit, first carry out the transit Address Recognition, if when result is this message letter, then carry out the identification of this port address.
CN2010101709498A 2010-05-11 2010-05-11 Method for sorting postal letters according to addresses driven by address information base Active CN101844135B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010101709498A CN101844135B (en) 2010-05-11 2010-05-11 Method for sorting postal letters according to addresses driven by address information base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010101709498A CN101844135B (en) 2010-05-11 2010-05-11 Method for sorting postal letters according to addresses driven by address information base

Publications (2)

Publication Number Publication Date
CN101844135A CN101844135A (en) 2010-09-29
CN101844135B true CN101844135B (en) 2013-05-08

Family

ID=42769059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010101709498A Active CN101844135B (en) 2010-05-11 2010-05-11 Method for sorting postal letters according to addresses driven by address information base

Country Status (1)

Country Link
CN (1) CN101844135B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314645A (en) * 2011-09-26 2012-01-11 深圳市络道科技有限公司 Address matching method and system
CN103390163B (en) * 2012-05-10 2016-12-14 中邮科技有限责任公司 A kind of Post address automatic acquiring method
CN102750351A (en) * 2012-06-11 2012-10-24 迪尔码国际营销服务(北京)有限公司 Matching method of address information based on rules
CN104281576B (en) * 2013-07-02 2018-08-31 威盛电子股份有限公司 The display methods of landmark data
CN104899202B (en) * 2014-03-04 2019-03-19 华为技术有限公司 A kind of information processing method and system
CN103909066B (en) * 2014-04-03 2016-07-06 上海邮政科学研究院 Vouchered postal material method for sorting that image information verifies mutually with the network information and system thereof
CN104166679B (en) * 2014-07-08 2018-10-09 北京迪威特科技有限公司 A kind of address matching method for sorting
CN108376365B (en) * 2018-03-22 2021-06-18 中国银行股份有限公司 Bank number determining method and device
CN111709680B (en) * 2020-05-29 2023-07-07 无锡医迈德科技有限公司 Invoice-based method and system for acquiring warehouse-in and warehouse-out information
CN111921872B (en) * 2020-07-14 2022-06-07 北京京东振世信息技术有限公司 Order sorting method and device, electronic equipment and readable storage medium
CN112297033B (en) * 2020-11-03 2021-07-23 来安县珙武机械制造有限公司 Automatic change robotic arm actuating system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1273542A (en) * 1997-11-04 2000-11-15 西门子公司 Method and system for recognising routing information on letters and parcels

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001126031A (en) * 1999-10-29 2001-05-11 Toshiba Corp Method and device for address recognition
JP2001232303A (en) * 2000-02-24 2001-08-28 Hitachi Ltd Address recognizing device and postal item sorting machine using the same
WO2007048564A1 (en) * 2005-10-24 2007-05-03 Siemens Aktiengesellschaft Method and apparatus for fingerprinting reject recovery and error reduction using interactive principles
JP2010009410A (en) * 2008-06-27 2010-01-14 Toshiba Corp Video coding system, classifying system, coding method and classifying method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1273542A (en) * 1997-11-04 2000-11-15 西门子公司 Method and system for recognising routing information on letters and parcels

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
JP特开平2001-126031A 2001.05.11
JP特开平2001-232303A 2001.08.28
娄正良.中文邮政地址识别研究.《中国博士学位论文全文数据库》.2007,第62-98页. *
蒋焰.基于地址结构匹配的手写中文地址的切分与识别.《清华大学学报》.2006,第46卷(第7期),1236页第1栏15行-第1237页23行. *
黄磊等.信函自动分拣软件系统.《计算机工程与应用》.2003,(第19期),第21-24、50页. *

Also Published As

Publication number Publication date
CN101844135A (en) 2010-09-29

Similar Documents

Publication Publication Date Title
CN101844135B (en) Method for sorting postal letters according to addresses driven by address information base
CN103246670B (en) Microblogging sequence, search, methods of exhibiting and system
CN101495953B (en) System and method of registration and maintenance of address data for each service point in a territory
CN104199840B (en) Intelligent place name identification technology based on statistical model
CN104731976A (en) Method for finding and sorting private data in data table
CN104982011A (en) Document classification using multiscale text fingerprints
CN105095238A (en) Decision tree generation method used for detecting fraudulent trade
CN106933883B (en) Method and device for classifying common search terms of interest points based on search logs
CN110347777A (en) A kind of classification method, device, server and the storage medium of point of interest POI
CN101645134B (en) Integral place name recognition method and integral place name recognition device
CN107301197A (en) A kind of business datum tracking processing system and method
CN102314645A (en) Address matching method and system
CN101980210A (en) Marked word classifying and grading method and system
JP6835713B2 (en) Accounting support system
CN109033225A (en) Chinese address identifying system
CN110188092A (en) The system and method for novel contradiction and disputes in a kind of excavation people's mediation
CN102479230A (en) Method and device for extracting geographical feature words
CN104598887A (en) Recognition method for written Chinese address of non-specification format
CN104133861B (en) A kind of method of the international air ticket freight rate list of intelligently parsing excel forms
CN107944030A (en) A kind of library automatic classification device
CN1979482A (en) Specific text infor mation processing method based on key tree and system therefor
CN108494977A (en) The recognition methods of note number, device and system
Molinder et al. More power to the people: Electricity adoption, technological change and social conflict
CN110175199A (en) Energy enterprise key user's identifying and analyzing method based on K mean cluster algorithm
Kothari et al. Transfer of supervision for improved address standardization

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20191008

Address after: 200062 Putuo District, Zhongshan North Road, No. 3185,

Patentee after: China Post Science and Technology Co., Ltd.

Address before: 200062 Putuo District, Zhongshan North Road, No. 3185,

Patentee before: Shanghai Post Science Inst.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: No. 3185, Zhongshan North Road, Putuo District, Shanghai 200333

Patentee after: China Post Technology Co.,Ltd.

Address before: 200062 No. 3185, Putuo District, Shanghai, Zhongshan North Road

Patentee before: CHINA POST SCIENCE AND TECHNOLOGY Co.,Ltd.