CN104598887A - Recognition method for written Chinese address of non-specification format - Google Patents

Recognition method for written Chinese address of non-specification format Download PDF

Info

Publication number
CN104598887A
CN104598887A CN201510044955.1A CN201510044955A CN104598887A CN 104598887 A CN104598887 A CN 104598887A CN 201510044955 A CN201510044955 A CN 201510044955A CN 104598887 A CN104598887 A CN 104598887A
Authority
CN
China
Prior art keywords
address
word
character
recognition
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510044955.1A
Other languages
Chinese (zh)
Other versions
CN104598887B (en
Inventor
吕岳
韦箫华
吕淑静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN201510044955.1A priority Critical patent/CN104598887B/en
Publication of CN104598887A publication Critical patent/CN104598887A/en
Application granted granted Critical
Publication of CN104598887B publication Critical patent/CN104598887B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

The invention provides a recognition method for a written Chinese address of a non-specification format, and establishes an address representation method of a specification format. According to the method, the structure of a word class tree is put forward to store a Chinese address base, wherein an address word is stored in each node, and a written address of a specification format is stored in a route from the root node to the leaf node. The whole address recognition is achieved through the steps of establishing the word class tree, establishing a character index table, conducting over-segmentation on an image, combining segmentation blocks, recognizing characters, generating a candidate address word, and recognizing the address of the specification format. The written address of the non-specification format can be mapped to the corresponding address of the specification format, and therefore recognition is achieved.

Description

For the recognition methods of non-canonical format handwritten Chinese address
Technical field
The invention belongs to handwritten Chinese Address Recognition technical field, particularly to the identification of the hand-written Chinese address of non-canonical format.
Background technology
Chinese address be identified in the automatic sorting of letter and parcel play a part very crucial.In mail Center, large batch of letter and parcel is had every day to be processed and to send with charge free.This just requires that the process of mail not only wants fast, and wants accurately.Although people make great progress in the research of Chinese address identification, in the middle of real mail, hand-written Address Recognition remains the difficult problem failing to solve very well.Such as, the change of Chinese character quantity many and writing style is various, the word in address and also may exist between word and connect pen.Particularly the polytrope of address format write and no regularity, this considerably increases the difficulty of opponent's write address identification.Rarely work at present and consider that this identifies address on the one hand specially.
Traditional Chinese handwritten Address Recognition method main target is Chinese characters all on the width address image that identification is from cover to cover given.They need an address list to provide the contextual information of Address Recognition.Each entry in this list is a sufficient address, and is usually mated by the recognition result be used for one by one with Input Address image.For improving the efficiency of address search and reducing the storage space of address list, there has been proposed a kind of method based on search tree structure and carry out storage address information.In the structure of these trees, what each node was deposited is a character, is therefore also referred to as word level tree.But on the one hand, word level tree is more responsive to noise ratio, because it requires that all characters in the image of address all must identify in order.On the other hand, whether candidate pattern block accurately can have a great impact recognition performance with the mating of child node of root node.Briefly, Address Recognition based on word level tree construction needs to depend on an address list pre-defined, if the address information in address list is incomplete, namely, it does not comprise all format write changes of address, or the address information that address list provides is not enough, and so in the middle of the application of reality, the discrimination of these Address Recognition methods will reduce greatly.
Usually, an address is made up of some addresses word, and these address words are defined as basic administration cell.Such as: the specification writing format address " Zhongshan North Road, Putuo District, Shanghai " shown in Fig. 2 (a) comprises address word " Shanghai City ", " Putuo District ", and " Zhongshan North Road ".The last character of each address word is defined as key word, as " province ", " city ", " district ", " road ", etc.
But in actual applications, the address book WriteMode on envelope is very complicated, and people can not write according to the cannonical format of address usually.Such as, in fig. 2, the specification writing form that Fig. 2 (a) is address, the various non-canonical formats that Fig. 2 (b-e) then shows it are write, and these unconventional writing are considered to rational in reality.
In sum, be almost an impossible mission with manually going to collect all these unconventional address written forms.
Summary of the invention
These unconventional handwritten Chinese addresses are finally mapped to the corresponding address of standardized writing by the method that the object of the invention is to propose for the deficiencies in the prior art based on word level tree construction, realize identifying it; Overcome the limitation of classic method to non-standard handwritten Chinese Address Recognition.
The object of the present invention is achieved like this:
For a recognition methods for non-canonical format handwritten Chinese address, comprise the following steps:
Build word level tree, described structure word level tree is in order to represent and the address of storage specification format write;
Build character index table, described structure character index table is in order to represent the association between single character and address word;
Segmentation-identifying processing, described segmentation-identifying processing method is the segmentation for carrying out character to image, merges and merge formed candidate pattern block to block to carry out character recognition;
Generate candidate site word, the method for described generation candidate site word is for obtaining the higher candidate site word of degree of confidence;
Cannonical format Address Recognition, described cannonical format Address Recognition method is used for the mode of being write to the cannonical format corresponding to it by hand-written address maps to be identified; Wherein:
The degree of depth of described structure word level tree is the 5,1st layer is root node, and store expression " province " respectively from the 2nd layer to the 5th layer, " city ", the address word of " district " and " road " name, wherein each node stores an address word.
Character for storing all characters be comprised in the word of address, and associates with all addresses word comprising this character by described structure character index table.
Described segmentation-identifying processing also comprises:
Image over-segmentation, becomes atomic block by Iamge Segmentation, for the lap between handwritten Chinese character or company's pen part being separated;
Combination and segmentation block, is merged into candidate pattern block by continuous print atom, for recovering single character that over-segmentation process causes or the separated situation about opening of the character of tiled configuration;
Character recognition, for identifying candidate pattern block, and calculates recognition result degree of confidence;
Described image over-segmentation is by adopting connected member analysis, and normalization Overlapping Calculation and Projection Analysis technology are carried out over-segmentation to image and finally obtained a series of atom block;
The method of described combination and segmentation block continuous print atom block is carried out one by one merging to form candidate pattern block;
Described character recognition also comprises:
Hand-written character sorter, for classifying to candidate pattern block;
Degree of confidence is changed, for carrying out the calculating of degree of confidence to recognition result;
Described generation candidate site word is by conjunction with candidate pattern recognition result, and character index table and word level set the address word stored, and prunes and obtain word level tree.
Described cannonical format Address Recognition is set candidate site word bluebeard compound level, is taken to end searching method upwards and combines candidate site word, finally generate candidate site to word level tree.Get the highest candidate site of degree of confidence as final Address Recognition result.
Instant invention overcomes the limitation of classic method to non-standard handwritten Chinese Address Recognition, propose the method based on word level tree construction, address maps non-canonical format can write to the corresponding address of cannonical format, thus realizes the identification of non-canonical format being write to address.
Accompanying drawing explanation
Fig. 1 is process flow diagram of the present invention;
Fig. 2 is the different ways of writing instance graph in address " Zhongshan North Road, Putuo District, Shanghai ";
Fig. 3 is the schematic diagram of the word level tree of specification writing address format;
Fig. 4 is address line image over-segmentation fructufy illustration;
Fig. 5 is candidate pattern block diagram example figure;
Fig. 6 is the generation schematic diagram of candidate site word;
Fig. 7 is the instance graph of candidate site word correspondence position in candidate pattern block diagram;
Fig. 8 is word level tree route searching process flow diagram;
Fig. 9 searches for and generates the instance graph of candidate site in word level tree;
Figure 10 is the recognition result instance graph of non-canonical format handwritten Chinese address.
Embodiment
As shown in Figure 1, be the process flow diagram of the embodiment of the present invention, the method specifically comprises:
Build word level tree, in order to represent and the address of storage specification format write.
The address administrative relation of China is a kind of top-down hierarchical structure.The quantity of level is generally 4.These 4 layers respectively corresponding " province ", " city ", " district " and " road " name.According to this structure definition one tree, the degree of depth is 5.Root node is empty, and store expression " province " respectively from the 2nd layer to the 5th layer, " city ", the address word of " district " and " road " name, wherein each node stores an address word.In word level tree, the corresponding standardization form of the paths from root node to leafy node write address.
For the situation of middle omission key word is write in process address, the last character (except " road " word) of each address word is defined as option.As shown in Figure 3, the word table in bracket shows it is option to the word level tree built.
In this word level tree, once a certain leaf node (i.e. road name) is identified, all candidate sites comprising this road name can be obtained.Such as, if address word " Zhongshan North Road " is identified, address word " Shanghai City " can be obtained, " Putuo District ", " Zhejiang Province ", " Hangzhou ", " Xiacheng District " by setting to word level upwards searching for the end of to of carrying out, etc.So relevant candidate site " Putuo District, Shanghai City Zhejiang Province Zhongshan North Road " and " Hangzhou, Zhejiang province city Xiacheng District Zhongshan North Road ", etc., just can obtain.Further, if address word " Putuo District " or " Shanghai City " also identified, so candidate site " Putuo District, Shanghai City Zhejiang Province Zhongshan North Road " is larger by the possibility as recognition result, particularly when the situation that " Putuo District " and " Shanghai City " is all identified.
Build character index table, in order to represent the association between single character and address word.
As shown in table 1, character index table is point 3 row, and the 2nd is classified as all characters appeared in the word of address, and the 1st is classified as GB2312-80 coding corresponding to the 2nd row character.3rd is classified as all relative address words comprising a certain character.When a character is identified time, all address words comprising this character can be obtained, for generating last candidate site word.
Table 1
Image over-segmentation, for separating the lap between handwritten Chinese character or company's pen part.
First connected member analysis is carried out to image, then Overlapping Calculation is normalized to adjacent connected member, be used for judging whether to merge these connected members, because they some may be different piece in same character.Judge that whether connected member is containing connecting a part, if having, then splits it finally by Projection Analysis.As much as possible the lap of kinds of characters or the connecting pen that exists between them are divided and cut open, finally obtain a series of atom block.To the segmentation result of Fig. 2 (d) as shown in Figure 4.In the diagram, atomic block, by order arrangement from left to right, has all carried out label in order to it above atomic block.
Combination and segmentation block, for recovering single character that over-segmentation process causes or the separated situation about opening of the character of tiled configuration.
To image after over-segmentation process, continuous print atomic block is combined and generate candidate pattern block, as shown in Figure 5.Defining all candidate pattern blocks is a set P={p (1,1), p (1,2), p (1,3), p (2,1), p (2,2), p (2,3)..., p (m, n)..., p (l, q), wherein, (m, n) is the numbering (1≤m≤l, 1≤n≤q) of atomic block, and l is the sum of atomic block, the maximum atomic block number that q comprises for a candidate pattern block, and in the present embodiment, q is set as 3.
Character recognition, for identifying candidate pattern block, and calculates recognition result degree of confidence.
In candidate pattern block diagram, with character classifier, each candidate pattern block is identified, generate a series of candidate characters.Large and the unconfined handwritten Chinese character for identification categorical measure, MQDF method is method the most practical at present.But its character feature memory space is larger.Present invention incorporates the method that MQDF differentiates study and shares distribution subspace, when not reducing discrimination, reducing the space of character storage shared by feature.
About the degree of confidence of character recognition, i.e. posterior probability p (w|x), (w is the character identified, and x is image feature vector), it is extremely important to the identification of character string, but it can not directly obtain from the output of MQDF sorter.Therefore, need the method adopting degree of confidence conversion, transfer the output of sorter to posterior probability.The present invention is by sigmoidal function application in degree of confidence conversion, then the posterior probability of character identification result can be expressed as
p sg ( w j | x ) = exp ( ad j ( x ) + β ) 1 + exp ( - ad j ( x ) + β ) , j = 1,2 , . . . , M - - - ( 1 )
Wherein, M is total classification number of character, d j(x) for sorter be w to classification joutput mark, α and β is degree of confidence parameter to be optimized, can be optimized by minimizing entropy loss function (CE) of reporting to the leadship after accomplishing a task to it.By the theoretical proof of Dempster-Shafer (D-S), the calculating of character degree of confidence can be expressed as:
p sg ( w j | x ) = exp ( αd j ( x ) + β ) 1 + Σ i = 1 M exp ( - αd j ( x ) + β ) , j = 1,2 , . . . , M - - - ( 2 )
Finally the maximum candidate characters of front 20 degree of confidence is got as recognition result to each mode block, arrange in the mode of degree of confidence size descending.The recognition result of the candidate pattern block in Fig. 5 is as shown in table 2.The recognition result of some mode blocks is empty, because can by their shape, namely the ratio of width to height directly judges whether they are a rational character, such as mode block p (10,1), p (15,1), p (17,1), p (19,1)it not rational character.
Table 2
Generate candidate site word, by conjunction with candidate pattern recognition result, character index table and word level sets the address word stored, and set prune word level.
In word level tree, implicit expression illustrates the address list (AW_O) that stores all addresses word, as shown in Fig. 6 (a).From this table, generate candidate site word through a series of process.These process comprise 3 steps: first, and by associating of the candidate characters that identified and address word, his-and-hers watches AW_O prunes, and the address word in association generates a new address word list AW_R.Mating then by the position limitation relation of the candidate characters that identified in the word of address and candidate pattern block, Table A W_R is pruned further and obtains address list AW_P.Finally, by calculating the mark of the address word in AW_P, the address word that address word mark is greater than a presetting threshold value is stored in list AW_C, and the address word so deposited in AW_C is then final candidate site word.Be described as follows:
(1), the generation of AW_R: by the recognition result of candidate pattern, do not meet in Table A W_O address word all deleted (nr is number of characters identified in the word of a certain address, and nl is the number of characters that comprises of address word for this reason).Candidate site word composition AW_R list (as Fig. 6 (b)) of last remainder.
(2), the generation of AW_P: need to consider the character and corresponding mode block position limitation relation in the picture that have been identified in candidate site word.If the mode block position in the picture that in a certain candidate site word in Table A W_R, recognized character mates does not meet position limitation relation, this candidate site word is by deleted.Candidate site word composition AW_P list (as Fig. 6 (c)) of last remainder.
(3), the generation of AW_C: a kind of method that the present invention proposes calculated address word mark, for calculating the address word mark in AW_P, circular introduces below.If the mark of a certain candidate site word in AW_P is less than predefined empirical value, then this candidate site word is deleted.The address word of last remainder then forms Table A W_C.Each address word in Table A W_C is then defined as final candidate site word (as Fig. 6 (d)).
By following formula, address word is calculated:
MSF = nr nl · Σ k = 1 nl ( p k ds + SC k ) + v - - - ( 3 )
This formula considers two kinds of situations: a kind of ratio being the number of characters identified in the word of address and accounting for all number of characters that this address word comprises, another kind is the degree of confidence of cutting cube.Wherein, for individual character degree of confidence, be can be calculated by formula (2).The calculating of nr/nl considers recognized character ratio shared in the word of address.Relative, if all characters in this address word are all identified, and with mode block to mate relative position reasonable, then increase the degree of confidence of this address word, the mark of increase represents with a constant v (1≤v≤4).SC is the degree of confidence of cutting cube, is defined as
SC = m · ( 1 - pw ph ) - - - ( 4 )
Wherein, m is the continuous print atomic block quantity of a composition mode block, and the combination of these atomic block does not comprise even pen.The ratio of width to height of pw/ph mode block for this reason.
For reducing the error rate identified, in Table A W_P, every mark all can be deleted lower than the address word of a threshold epsilon.ε is defined as
Wherein, nl is the character number that candidate site word comprises, be an empirical value, through repeatedly testing, getting 2.5 can make recognition system obtain best performance.
By this step, the candidate site word of the generation correspondence position in candidate pattern block diagram as shown in Figure 7.
Cannonical format Address Recognition, for the mode of being write to corresponding cannonical format by hand-written address maps to be identified.
When identification Chinese handwritten address, its all non-canonical formats can be mapped in a certain paths of word level tree.After candidate site word is generated, can search for tree in conjunction with their node relationships in word level tree, the candidate site of generating standard format writing.Be taken to the end upwards searching method, search for root node from the leaf node (corresponding road name) of tree.Can obtain some candidate sites in this step, the mark of every bar candidate site is equivalent to the cumulative of the mark of the candidate site word identified that it comprises.Finally, the maximum candidate site of mark is got as recognition result.The idiographic flow of this step as shown in Figure 8.
In the present invention, store respectively and represent " province ", " city " with 4 lists, the address word of " district " and " road " name, these 4 lists use PR respectively, and CI, DI, RO represent.In addition, with three metaset TN={CN, PN, AS} represent a node of search volume.Wherein, CN points to the present node of word level tree, and PN points to the father node of CN, and AS is address word mark cumulative in search procedure.For a candidate word W, its leftmost mode block (lp (W)) and rightmost mode block (rp (W)) correspond respectively to it first by the character that mates with last is by the character mated.Judge the address word that father node is corresponding and the position limitation relation of address word corresponding to child node in pattern block diagram whether reasonable, whether the rp being based on father node is less than the lp of child node.In literary composition, the position size of address word is by from left to right ascending sort.
Before search, first check whether list RO is empty.If RO is empty, namely road name does not all have identified, now AS=0, stops this time search, and recognition result is known for refusing.Otherwise, search for from the address word stored in list RO one by one.First, CN points to an address word in RO, and AS is initially mark corresponding to this address word, and PN points to the father node of CN.Ensuing search is in two kinds of situation: namely PN ∈ DI or if it represents that the candidate site word that PN points to is unrecognized, and in this case, PN directly points to the father node of PN indication node, then continues search.If PN ∈ is DI, then illustrate that the candidate site word pointed by PN is identified.If the address word now pointed by PN and CN meets position relationship rp (PN) < lp (CN), so AS then equals the cumulative of these two address word marks.Then CN points to word level tree node corresponding to PN, and PN points to the father node of CN.Otherwise if these two words do not meet position relationship, PN directly points to the father node of PN indication node, then continue search.When PN points to the root node of tree, represent that search this time terminates.Finally, the canonical address alternatively address result this time obtained from leaf node reverse search to root node, AS is the mark of its correspondence.With two metaset RS={ ξ, AS} stores this Search Results, and what wherein ξ stored is the specification candidate site that current search obtains.
Fig. 9 illustrates the search procedure in word level tree.Such as, search for from leafy node " Zhongshan North Road ", AS equals the mark 20.19 of this address word.The address word " Putuo District " that PN points to has been identified alternatively address word, and rp (" Putuo District ") <lp (" Zhongshan North Road "), so AS equals 34.85 (=20.19+14.66).Finally, candidate site " Zhongshan North Road, Putuo District, Shanghai " is obtained to the Search Results of this paths, the mark of its correspondence is 48.41 (=20.19+14.66+13.56), is the highest score of all candidate sites, so as final recognition result.By route searching, the address word that some are not identified also is included in candidate site as recognition result, but their mark can not get adding up.
There are some candidate site word positions in Fractionation regimen framework may overlapping (as Fig. 7).If the address word of same grade is overlapping, do not affect the search of tree, because the relation of Bu Shi the superior and the subordinate between them, the relation of not corresponding father node and child node in tree, so different paths can be obtained in pattern block diagram, such as: " Shanghai City " and " Shanghai ", " Putuo District " and " Putuo ".On the contrary, if these two address words are different brackets, the father node in their corresponding same paths of possibility and child node relationships, such as: " Putuo District " and " Putuo road ".In this case, the word that priority is low will be skipped in search procedure, and also do not add up to the mark of this low priority address word in this path simultaneously.In the present invention, the priority of address word increases along with the increase of the node number of plies, so, represents that the priority of the address word of road name is the highest.
After address words all in RO is all searched in tree, generate some candidate sites.Finally, the highest candidate site of mark is only got as recognition result.Recognition result S represents, is defined as
S=arg maxξ(AS i|i=1,2,…,n) (6)
N is the candidate site sum generated, in the figure 7, and n=5.Obviously, AS when i=3 time iobtain largest score 48.41, therefore the standardized writing address " Zhongshan North Road, Putuo District, Shanghai " of its correspondence final recognition result that is Fig. 2 (d).
Figure 10 shows the recognition result of the Chinese handwritten address line image in Fig. 2.As can be seen from Figure 10, what the address of this three classes non-standard format write can be identified as specification by the present invention writes address " Zhongshan North Road, Putuo District, Shanghai ".

Claims (8)

1., for a recognition methods for non-canonical format handwritten Chinese address, it is characterized in that the method comprises the following steps:
Build word level tree, in order to represent and the address of storage specification format write;
Build character index table, in order to represent the association between single character and address word;
Segmentation-identifying processing, for carrying out the segmentation of character to image, merges and merges formed candidate pattern block to block and carry out character recognition;
Generate candidate site word, for obtaining the higher candidate site word of degree of confidence;
Cannonical format Address Recognition, for the mode of being write to corresponding cannonical format by hand-written address maps to be identified.
2. recognition methods as claimed in claim 1, is characterized in that the degree of depth that described structure word level is set be the 5,1st layer is root node, expression " province " is stored respectively from the 2nd layer to the 5th layer, " city ", the address word of " district " and " road " name, wherein each node stores an address word.
3. recognition methods as claimed in claim 1, is characterized in that described structure character index table is for storing all characters be comprised in the word of address, and is associated with all addresses word comprising this character by character.
4. recognition methods as claimed in claim 1, is characterized in that described segmentation-identifying processing comprises:
Image over-segmentation, becomes atomic block by Iamge Segmentation, for the lap between handwritten Chinese character or company's pen part being separated;
Combination and segmentation block, carries out merging one by one and forms candidate pattern block by continuous print atom block, for recovering single character that over-segmentation process causes or the separated situation about opening of the character of tiled configuration;
Character recognition, for identifying candidate pattern block, and calculates recognition result degree of confidence.
5. recognition methods as claimed in claim 4, it is characterized in that described image over-segmentation is by adopting connected member analysis, normalization Overlapping Calculation and Projection Analysis are carried out over-segmentation to image and are finally obtained a series of atom block.
6. recognition methods as claimed in claim 4, is characterized in that described character recognition also comprises:
Hand-written character sorter, for classifying to candidate pattern block;
Degree of confidence is changed, for carrying out the calculating of degree of confidence to recognition result.
7. recognition methods as claimed in claim 1, it is characterized in that described generation candidate site word is by conjunction with candidate pattern recognition result, character index table and word level sets the address word stored, and set prune and obtain word level.
8. recognition methods as claimed in claim 1, is characterized in that described cannonical format Address Recognition is set candidate site word bluebeard compound level, is taken to end searching method upwards and combines candidate site word, finally generate candidate site to word level tree; Get the highest candidate site of degree of confidence as final Address Recognition result.
CN201510044955.1A 2015-01-29 2015-01-29 Recognition methods for non-canonical format handwritten Chinese address Active CN104598887B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510044955.1A CN104598887B (en) 2015-01-29 2015-01-29 Recognition methods for non-canonical format handwritten Chinese address

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510044955.1A CN104598887B (en) 2015-01-29 2015-01-29 Recognition methods for non-canonical format handwritten Chinese address

Publications (2)

Publication Number Publication Date
CN104598887A true CN104598887A (en) 2015-05-06
CN104598887B CN104598887B (en) 2017-11-24

Family

ID=53124660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510044955.1A Active CN104598887B (en) 2015-01-29 2015-01-29 Recognition methods for non-canonical format handwritten Chinese address

Country Status (1)

Country Link
CN (1) CN104598887B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503634A (en) * 2016-10-11 2017-03-15 讯飞智元信息科技有限公司 A kind of image alignment method and device
CN107133215A (en) * 2017-05-20 2017-09-05 复旦大学 A kind of Chinese canonical address recognition methods of offline handwriting
CN108369582A (en) * 2018-03-02 2018-08-03 福建联迪商用设备有限公司 A kind of address error correction method and terminal
CN108647263A (en) * 2018-04-28 2018-10-12 淮阴工学院 A kind of network address method for evaluating confidence crawled based on segmenting web page
CN109961259A (en) * 2019-03-28 2019-07-02 上海中通吉网络技术有限公司 Address Standardization processing method and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6327386B1 (en) * 1998-09-14 2001-12-04 International Business Machines Corporation Key character extraction and lexicon reduction for cursive text recognition
CN101645134A (en) * 2005-07-29 2010-02-10 富士通株式会社 Integral place name recognition method and integral place name recognition device
CN102289467A (en) * 2011-07-22 2011-12-21 浙江百世技术有限公司 Method and device for determining target site
CN103678708A (en) * 2013-12-30 2014-03-26 小米科技有限责任公司 Method and device for recognizing preset addresses

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6327386B1 (en) * 1998-09-14 2001-12-04 International Business Machines Corporation Key character extraction and lexicon reduction for cursive text recognition
CN101645134A (en) * 2005-07-29 2010-02-10 富士通株式会社 Integral place name recognition method and integral place name recognition device
CN102289467A (en) * 2011-07-22 2011-12-21 浙江百世技术有限公司 Method and device for determining target site
CN103678708A (en) * 2013-12-30 2014-03-26 小米科技有限责任公司 Method and device for recognizing preset addresses

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
娄正良: "中文邮政地址识别研究", 《中国优秀博士学位论文全文数据库》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503634A (en) * 2016-10-11 2017-03-15 讯飞智元信息科技有限公司 A kind of image alignment method and device
CN107133215A (en) * 2017-05-20 2017-09-05 复旦大学 A kind of Chinese canonical address recognition methods of offline handwriting
CN108369582A (en) * 2018-03-02 2018-08-03 福建联迪商用设备有限公司 A kind of address error correction method and terminal
WO2019165644A1 (en) * 2018-03-02 2019-09-06 福建联迪商用设备有限公司 Address error correction method and terminal
CN108369582B (en) * 2018-03-02 2021-06-25 福建联迪商用设备有限公司 Address error correction method and terminal
CN108647263A (en) * 2018-04-28 2018-10-12 淮阴工学院 A kind of network address method for evaluating confidence crawled based on segmenting web page
CN108647263B (en) * 2018-04-28 2022-04-12 淮阴工学院 Network address confidence evaluation method based on webpage segmentation crawling
CN109961259A (en) * 2019-03-28 2019-07-02 上海中通吉网络技术有限公司 Address Standardization processing method and equipment
CN109961259B (en) * 2019-03-28 2021-07-27 上海中通吉网络技术有限公司 Address standardization processing method and equipment

Also Published As

Publication number Publication date
CN104598887B (en) 2017-11-24

Similar Documents

Publication Publication Date Title
CN104598887A (en) Recognition method for written Chinese address of non-specification format
CN101719128B (en) Fuzzy matching-based Chinese geo-code determination method
CN104391942B (en) Short essay eigen extended method based on semantic collection of illustrative plates
CN106446208B (en) A kind of smart phone trip mode recognition methods considering road network compatible degree
CN108369582B (en) Address error correction method and terminal
CN101844135B (en) Method for sorting postal letters according to addresses driven by address information base
CN109165273B (en) General Chinese address matching method facing big data environment
CN101206121B (en) Placename retrieval device
CN106649597A (en) Method for automatically establishing back-of-book indexes of book based on book contents
CN103902591A (en) Decision tree classifier establishing method and device
CN106909611A (en) A kind of hotel&#39;s automatic matching method based on Text Information Extraction
CN103970842A (en) Water conservancy big data access system and method for field of flood control and disaster reduction
CN103412888A (en) Point of interest (POI) identification method and device
CN103324929B (en) Based on the handwritten Chinese recognition methods of minor structure study
CN109033225A (en) Chinese address identifying system
CN102360436B (en) Identification method for on-line handwritten Tibetan characters based on components
CN102428467A (en) Similarity-Based Feature Set Supplementation For Classification
CN101276327B (en) Address recognition device
CN108021683A (en) A kind of scale model retrieval implementation method based on three-dimensional labeling
CN105528411A (en) Full-text retrieval device and method for interactive electronic technical manual of shipping equipment
CN102004796B (en) Non-retardant hierarchical classification method and device of webpage texts
CN107291895A (en) A kind of quick stratification document searching method
CN111522892A (en) Geographic element retrieval method and device
CN114780680A (en) Retrieval and completion method and system based on place name and address database
CN106339459A (en) Method for pre-classifying Chinese webpages based on keyword matching

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant