CN104598887B - Recognition methods for non-canonical format handwritten Chinese address - Google Patents
Recognition methods for non-canonical format handwritten Chinese address Download PDFInfo
- Publication number
- CN104598887B CN104598887B CN201510044955.1A CN201510044955A CN104598887B CN 104598887 B CN104598887 B CN 104598887B CN 201510044955 A CN201510044955 A CN 201510044955A CN 104598887 B CN104598887 B CN 104598887B
- Authority
- CN
- China
- Prior art keywords
- address
- word
- candidate
- recognition
- character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000011218 segmentation Effects 0.000 claims abstract description 16
- 238000004458 analytical method Methods 0.000 claims description 5
- 238000013138 pruning Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000001788 irregular Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000003909 pattern recognition Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 14
- 229920001983 poloxamer Polymers 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000011282 treatment Methods 0.000 description 1
Landscapes
- Character Discrimination (AREA)
Abstract
The present invention gives a kind of recognition methods for non-canonical format handwritten Chinese address, and establish the address method for expressing of a cannonical format.This method proposes that the structure of word level tree stores Chinese address storehouse, wherein each node stores an address word, paths from root node to leafy node then store the address that a cannonical format is write.Whole Address Recognition includes:Build word level tree;Build character index table;Image over-segmentation;Combination and segmentation block;Character recognition;Generate candidate site word;Cannonical format Address Recognition.The present invention can identify the corresponding address of the address of cache of non-canonical format writing to cannonical format so as to realize.
Description
Technical Field
The invention belongs to the technical field of handwritten Chinese address recognition, and particularly relates to recognition of a non-standard format handwritten Chinese address.
Background
Chinese address recognition plays a very critical role in the automated sorting of letters and packages. At a mail processing center, large batches of letters and packages are handled and dispatched each day. This requires that the handling of the mail be not only fast but also accurate. Although great progress has been made in the research of Chinese address recognition, handwriting address recognition still remains a problem that is not well solved among real letters. For example, the number of Chinese characters is large, writing styles are varied, and continuous strokes may exist between characters in addresses. Especially the variability and irregularity of the writing format of the address, which greatly increases the difficulty of recognizing the written address. Currently there is little work to specifically consider this aspect to identify addresses.
The traditional Chinese hand-written address recognition method mainly aims at recognizing all Chinese characters on a given address image by the original text. They require a list of addresses to provide context information for address identification. Each entry in the list is a complete address and is typically used to match the recognition of the input address image one by one. In order to improve the efficiency of address retrieval and reduce the storage space of the address list, a method based on a search tree structure is proposed to store address information. In the structure of these trees, each node stores one character and is therefore also referred to as a word-level tree. On the one hand, however, word-level trees are relatively sensitive to noise because it requires that all characters in an address image must be recognized in order. On the other hand, whether the matching of the candidate pattern block with the child nodes of the root node is accurate or not has a great influence on the recognition performance. Briefly, address recognition based on word-level tree structures relies on a predefined address list, and if the address information in the address list is incomplete, i.e. it does not include all written format changes of the address, or the address list provides insufficient address information, the recognition rate of these address recognition methods in practical applications will be greatly reduced.
Generally, an address is composed of address words, which are defined as basic administration units. For example: the canonical written format address "shangshan north road in putta region of shanghai city" shown in fig. 2(a) contains the address words "shanghai city", "putta region", and "shangshan north road". The last word of each address word is defined as a keyword, such as "province", "city", "district", "way", etc.
In practical application, however, the address writing mode on the envelope is very complicated, and people usually do not write according to the standard format of the address. For example, in FIG. 2, FIG. 2(a) is a canonical written form of the address, while FIGS. 2(b-e) show its various non-canonical formatted writings that are deemed reasonable in reality.
In summary, manually gathering all of these non-canonical forms of address writing is an almost impossible task to accomplish.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method based on a word-level tree structure to map the non-standard handwritten Chinese addresses into corresponding addresses of standardized writing, so as to realize the identification of the addresses; the method overcomes the limitation of the traditional method to the recognition of the non-standard handwritten Chinese address.
The purpose of the invention is realized as follows:
a recognition method for a handwritten Chinese address in an irregular format comprises the following steps:
constructing a word-level tree, wherein the constructed word-level tree is used for expressing and storing addresses in a standard writing format;
constructing a character index table, wherein the constructed character index table is used for representing the association between a single character and an address word;
a segmentation-recognition process for performing character segmentation, merging and character recognition on candidate pattern blocks into which the segmentation blocks are merged on the image;
generating candidate address words, wherein the method for generating the candidate address words is used for obtaining the candidate address words with higher confidence;
standard format address recognition, wherein the standard format address recognition method is used for mapping the handwriting address to be recognized to a corresponding standard format writing mode; wherein:
the depth of the constructed word-level tree is 5, the 1 st layer is a root node, address words representing names of 'province', 'city', 'district' and 'road' are stored from the 2 nd layer to the 5 th layer respectively, and each node stores one address word.
The constructed character index table is used for storing all characters contained in the address words and associating the characters with all address words containing the characters.
The segmentation-recognition process further includes:
over-segmenting the image, namely segmenting the image into original sub-blocks for segmenting the overlapped part or the continuous stroke part among the handwritten Chinese characters;
merging the segmentation blocks, merging the continuous atoms into candidate mode blocks, and recovering the condition that a single character or characters of left and right structures are separated caused by an over-segmentation process;
character recognition, which is used for recognizing the candidate mode block and calculating the confidence coefficient of the recognition result;
the image over-segmentation is carried out by adopting connected component analysis, normalized overlapping degree calculation and projection analysis technology to obtain a series of atom segmentation blocks;
the method for merging the segmentation blocks is to merge continuous atom segmentation blocks one by one to form a candidate mode block;
the character recognition further includes:
the handwritten character classifier is used for classifying the candidate mode blocks;
the confidence conversion is used for calculating the confidence of the recognition result;
the generated candidate address words are obtained by pruning the word-level tree by combining the candidate pattern recognition result, the character index table and the address words stored in the word-level tree.
The standard format address identification is to combine the candidate address words with a word-level tree, combine the candidate address words with the word-level tree by a bottom-up searching method, and finally generate the candidate address. And taking the candidate address with the highest confidence as a final address recognition result.
The invention overcomes the limitation of the traditional method to the recognition of the non-standard handwritten Chinese address, provides a method based on a word-level tree structure, and can map the address written in the non-standard format to the corresponding address written in the standard format, thereby realizing the recognition of the address written in the non-standard format.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram showing an example of different writing modes of the address "Shanghai Putuo district Zhongshan North road of Shanghai city";
FIG. 3 is a diagram of a word-level tree that specifies a written address format;
FIG. 4 is a diagram of the result of over-segmentation of the address line image;
FIG. 5 is a diagram of an example of a block diagram of an alternative mode;
FIG. 6 is a schematic diagram of candidate address word generation;
FIG. 7 is a diagram illustrating an example of corresponding positions of candidate address words in a block diagram of a candidate mode;
FIG. 8 is a flow chart of a word-level tree path search;
FIG. 9 is an example diagram of searching in a word-level tree and generating candidate addresses;
FIG. 10 is a diagram illustrating an example of recognition results of a handwritten Chinese address in an irregular format.
Detailed Description
As shown in fig. 1, which is a flowchart of an embodiment of the present invention, the method specifically includes:
and constructing a word-level tree for representing and storing the addresses in the standard writing format.
The address administration relationship of China is a top-down hierarchical structure. The number of levels is typically 4. These 4 layers correspond to the names "province", "city", "district" and "road", respectively. A tree is defined according to this structure with a depth of 5. The root node is empty, and address words representing names of "province", "city", "district" and "road" are stored from the 2 nd layer to the 5 th layer, respectively, wherein each node stores one address word. In a word-level tree, a path from a root node to a leaf node corresponds to a written address in a normalized format.
To handle the case where keywords are omitted from address writing, the last word (except for the "way" word) of each address word is defined as a selectable item. The constructed word-level tree is shown in FIG. 3, with the word representations in parentheses being optional.
In this word-level tree, once a leaf node (i.e., a road name) is identified, all candidate addresses containing the road name are available. For example, if the address word "north road of zhongshan" is recognized, the address words "shanghai city", "pluta region", "zhejiang province", "hangzhou city", "lower city region", etc. can be obtained by searching the word-level tree from bottom to top. Then the relevant candidate addresses "north-middle mountain in Zhejiang province of Putuo district of Shanghai city" and "north-middle mountain in urban district in Hangzhou city of Zhejiang province" etc. can be obtained. Further, if the address word "pluronic area" or "shanghai city" is also recognized, the possibility that the candidate address "pluronic area, zhengshan north of the province of pluronic area" is used as the recognition result is higher, particularly when both "pluronic area" and "shanghai city" are recognized.
And constructing a character index table to represent the association between the single character and the address word.
As shown in Table 1, the character index table is divided into 3 columns, the 2 nd column is all the characters appearing in the address word, and the 1 st column is the GB2312-80 code corresponding to the 2 nd column characters. Column 3 is all related address words that include a certain character. When a character is recognized, all address words containing the character can be obtained for generating the final candidate address word.
TABLE 1
And the image over-segmentation is used for segmenting the overlapped part or the continuous stroke part between the handwritten Chinese characters.
The image is first analyzed for connected components and then normalized overlap calculations are performed on adjacent connected components to determine whether to merge the connected components, as some of them may be different parts of the same character. And finally, judging whether the connected element contains a continuous stroke part or not through projection analysis, and if so, segmenting the connected element. And (3) segmenting the overlapped parts of different characters or connecting strokes existing among the different characters as much as possible to finally obtain a series of atom segmentation blocks. The result of the division of fig. 2(d) is shown in fig. 4. In fig. 4, the atomic blocks are arranged in order from left to right, and are numbered sequentially above the atomic blocks.
And merging the segmentation blocks to recover the condition that a single character or characters of left and right structures are separated due to the over-segmentation process.
After the image is subjected to the segmentation process, the successive atomic blocks are combined and a candidate pattern block is generated, as shown in fig. 5. Defining all candidate pattern blocks as a set P ═ P(1,1),p(1,2),p(1,3),p(2,1),p(2,2),p(2,3),...,p(m,n),...,p(l,q)Where (m, n) is the number of the atomic block (1. ltoreq. m.ltoreq.l, 1. ltoreq. n.ltoreq.q), l is the total number of atomic blocks, and q is the maximum number of atomic blocks included in a candidate pattern block, and in this embodiment, q is set to 3.
And character recognition is used for recognizing the candidate mode block and calculating the confidence coefficient of the recognition result.
In the candidate pattern block diagram, each candidate pattern block is identified by a character classifier to generate a series of candidate characters. For identifying handwritten Chinese characters with large number of categories and no constraint, the MQDF method is the most practical method at present. But its character feature storage capacity is large. The invention combines the MQDF distinguishing learning and the method of sharing the distribution subspace, and reduces the space occupied by the character storage characteristics under the condition of not reducing the recognition rate.
The confidence in character recognition, i.e., the posterior probability p (w | x), (w being the recognized character and x being the image feature vector), is very important for the recognition of character strings, but it cannot be derived directly from the output of the MQDF classifier. Therefore, a confidence conversion method is needed to convert the output of the classifier into the posterior probability. The invention applies the sigmoidal function to the confidence conversion, and the posterior probability of the character recognition result can be expressed as
Wherein M is the total number of classes of characters, dj(x) For the classifier to be of class wjThe output scores of α and β are all confidence parameters to be optimized, which can be optimized by minimizing the cross entropy loss function (CE). As demonstrated by Dempster-Shafer (D-S) theory, the computation of character confidence can be expressed as:
and finally, taking the top 20 candidate characters with the maximum confidence coefficient as a recognition result for each mode block, and arranging the candidate characters in a mode of descending the confidence coefficient. The recognition results of the candidate pattern blocks in fig. 5 are shown in table 2. Some pattern blocks are empty because they can be directly determined whether they are a legitimate character by their shape, i.e. aspect ratio, such as pattern block p(10,1),p(15,1),p(17,1),p(19,1)Are not legitimate characters.
TABLE 2
And generating candidate address words, and pruning the word-level tree by combining the candidate pattern recognition result, the character index table and the address words stored in the word-level tree.
Implicit in the word-level tree is an address list (AW _ O) storing all address words, as shown in fig. 6 (a). From this table, candidate address words are generated through a series of processes. These treatments comprise 3 steps: firstly, pruning the table AW _ O through the association of the identified candidate characters and the address words, and generating a new address word list AW _ R by the address words in the association. And then, further pruning the table AW _ R to obtain an address list AW _ P through matching of the recognized candidate characters in the address words and the position limit relation of the candidate mode block. And finally, by calculating the scores of the address words in the AW _ P, the address words with the address word scores larger than a preset threshold value are stored in the list AW _ C, and the address words stored in the AW _ C are final candidate address words. The concrete description is as follows:
(1) generation of AW _ R: unsatisfied in the table AW _ O by the recognition result of the candidate patternThe address words of (n) are all deleted (nr is the number of characters already recognized in a certain address word, nl is the number of characters included in the address word). The last remaining candidate address words make up the AW _ R list (fig. 6 (b)).
(2) Generation of AW _ P: the position restriction relationship between the recognized character in the candidate address word and the corresponding mode block in the image needs to be considered. If the position of the pattern block matched with the recognized character in a certain candidate address word in the table AW _ R in the image does not meet the position limit relationship, the candidate address word is deleted. The last remaining candidate address words make up the AW _ P list (fig. 6 (c)).
(3) Generation of AW _ C: the invention provides a method for calculating address word scores, which is used for calculating the address word scores in AW _ P. If the score of a candidate address word in AW _ P is less than a predefined empirical threshold, then the candidate address word is deleted. The last remaining address words then form the table AW _ C. Each address word in the table AW _ C is then defined as the final candidate address word (see fig. 6 (d)).
The address words are calculated by the following formula:
this formula takes into account two cases: one is the ratio of the number of recognized characters in the address word to the number of all characters contained in the address word, and the other is the confidence of the cutting block. Wherein,the confidence of the single character can be calculated by the formula (2). The calculation of nr/nl takes into account the proportion of recognized characters in the address word. In contrast, if all the characters in the address word are recognized and the matching relative position with the pattern block is reasonable, the confidence of the address word is increased, and the increased score is represented by a constant v (1 ≦ v ≦ 4). SC is confidence of the cutting block and is defined as
Where m is the number of consecutive atomic blocks that make up a mode block, and the combination of these atomic blocks does not contain a continuum. pw/ph is the aspect ratio of this mode block.
To reduce the recognition error rate, all address words in the table AW _ P having a score below a threshold are deleted. Is defined as
Wherein nl is the number of characters contained in the candidate address word,is an empirical threshold that, over a number of tests,taking 2.5 enables the recognition system to achieve optimal performance.
Through this step, the corresponding positions of the generated candidate address words in the candidate pattern block diagram are shown in fig. 7.
And standard format address recognition, which is used for mapping the handwriting address to be recognized to a corresponding standard format writing mode.
When a Chinese handwritten address is recognized, all its non-canonical formats can be mapped to a certain path of the word-level tree. After candidate address words are generated, the tree may be searched in conjunction with their node relationships in the word-level tree to generate candidate addresses written in canonical format. And searching from leaf nodes (corresponding road names) of the tree to the root node by adopting a bottom-up searching method. Several candidate addresses may be obtained in this step, the score of each candidate address being equivalent to the accumulation of the scores of the candidate address words it contains that have been identified. And finally, taking the candidate address with the largest score as a recognition result. The specific flow of this step is shown in fig. 8.
In the present invention, address words indicating names of "province", "city", "district", and "road" are stored in 4 lists, respectively, and the 4 lists are indicated by PR, CI, DI, and RO, respectively. In addition, one node of the search space is represented by one triplet TN ═ { CN, PN, AS }. The CN points to the current node of the word-level tree, the PN points to the father node of the CN, and the AS is the address word fraction accumulated in the searching process. For a candidate word W, its leftmost pattern block (lp (W)) and rightmost pattern block (rp (W)) correspond to its first matched character and its last matched character, respectively. And judging whether the position limit relationship between the address words corresponding to the parent nodes and the address words corresponding to the child nodes in the mode diagram is reasonable or not according to whether the rp of the parent nodes is smaller than the lp of the child nodes or not. The position sizes of the address words in the text are sorted in ascending order from left to right.
Before searching, checking whether list RO is empty, if RO is empty, i.e. no road name is identified, at this moment AS is 0, stopping this search, and if the identification result is refusal, otherwise, starting to search one by one from address words stored in list ROIf it is notIf the address words pointed to by PN and CN at this time satisfy the positional relationship rp (PN) < lp (CN), then AS equals the accumulation of scores of the two address words, then CN points to the corresponding lexical tree node of PN, PN points to the father node of CN, otherwise, if the two words do not satisfy the positional relationship, PN points directly to the father node of the node pointed to by PN, then search is continued, when PN points to the root node of the tree, it indicates that the search is finished this time, finally, the canonical address obtained by reverse search from the leaf node to the root node is taken AS the candidate address result, AS is its corresponding score, the search result is stored with a binary set RS { ξ }, wherein ξ stores the canonical candidate address obtained by the current search.
FIG. 9 illustrates the search process in a word-level tree. For example, starting the search from the leaf node "north-middle-mountain road", AS equals 20.19 of the score of this address word. The address word "putt zone" pointed to by the PN has been identified AS a candidate address word, and rp ("putt zone") < lp ("north-middle-mountain road"), then AS is equal to 34.85(═ 20.19+ 14.66). Finally, the search result of this path obtains the candidate address "northeast road of shanghai city putta district", whose corresponding score is 48.41(═ 20.19+14.66+13.56), which is the highest score of all candidate addresses, and therefore, it is used as the final recognition result. Some address words that are not recognized are also included as recognition results in the candidate addresses through the path search, but their scores are not accumulated.
There are some candidate address words whose positions in the frame of the segmentation pattern may overlap (see fig. 7). If the address words at the same level are overlapped, the search of the tree is not influenced, because the address words are not in the relationship of the upper level and the lower level, and the relationship between the parent node and the child node is not corresponding in the tree, different paths can be obtained in the mode block diagram, such as: shanghai city, Shanghai, Putuo district and Putuo. Conversely, if the two address words are of different levels, they may correspond to parent and child relationships of the same path, such as: the Putuo area and the Putuo road. In this case, the low priority word will be skipped during the search, and the path does not accumulate the score of this low priority address word. In the invention, the priority of the address words is increased along with the increase of the node layer number, so that the priority of the address words representing road names is the highest.
When all address words in the RO are searched in the tree, several candidate addresses are generated. And finally, only taking the candidate address with the highest score as a recognition result. The recognition result is represented by S and is defined as
S=arg maxξ(ASi|i=1,2,…,n) (6)
n is the total number of generated candidate addresses, and in fig. 7, n is 5. Obviously, AS when i ═ 3iThe maximum score 48.41 is obtained, so its corresponding normalized writing address "Shanghai North road in Putuo district of Shanghai city" is the final recognition result of FIG. 2 (d).
Fig. 10 shows the recognition result of the chinese handwritten address line image in fig. 2. As can be seen from FIG. 10, these three types of non-canonical written addresses can all be recognized by the present invention as canonical written addresses "Shanghai North road in Putuo district, Shanghai city".
Claims (3)
1. A recognition method for a handwritten Chinese address in an irregular format, characterized in that the method comprises the following steps:
step 1: constructing a word-level tree for representing and storing addresses in a standard writing format; the depth of the constructed word-level tree is 5, the 1 st layer is a root node, address words representing names of 'province', 'city', 'district' and 'road' are stored from the 2 nd layer to the 5 th layer respectively, and each node stores one address word;
step 2: constructing a character index table for representing the association between a single character and an address word; the constructed character index table is used for storing all characters contained in the address words and associating the characters with all the address words containing the characters;
and step 3: segmentation-recognition processing for performing character segmentation and merging on the image and performing character recognition on candidate mode blocks into which the segmentation blocks are merged; the method specifically comprises the following steps:
over-segmenting the image, namely segmenting the image into original sub-blocks for segmenting the overlapped part or the continuous stroke part among the handwritten Chinese characters;
merging the segmentation blocks, merging the continuous atom segmentation blocks one by one to form a candidate mode block, wherein the candidate mode block is used for recovering the condition that a single character or characters of left and right structures are separated caused by an over-segmentation process;
character recognition, which is used for recognizing the candidate mode block and calculating the confidence coefficient of the recognition result;
and 4, step 4: generating candidate address words for obtaining the candidate address words with higher confidence; the generated candidate address words are obtained by pruning a word-level tree by combining a candidate pattern recognition result, a character index table and address words stored in the word-level tree;
and 5: standard format address recognition, which is used for mapping the handwriting address to be recognized to a corresponding standard format writing mode; the standard format address identification is to combine the candidate address words with a word-level tree, combine the candidate address words by a bottom-up searching method for the word-level tree, and finally generate a candidate address; and taking the candidate address with the highest confidence as a final address recognition result.
2. The identification method of claim 1 wherein the image over-segmentation is performed by applying connected component analysis, normalized overlap computation and projection analysis to the image and finally obtain a series of atomic segmentation blocks.
3. The recognition method of claim 1, wherein the character recognition further comprises:
the handwritten character classifier is used for classifying the candidate mode blocks;
and the confidence conversion is used for calculating the confidence of the recognition result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510044955.1A CN104598887B (en) | 2015-01-29 | 2015-01-29 | Recognition methods for non-canonical format handwritten Chinese address |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510044955.1A CN104598887B (en) | 2015-01-29 | 2015-01-29 | Recognition methods for non-canonical format handwritten Chinese address |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104598887A CN104598887A (en) | 2015-05-06 |
CN104598887B true CN104598887B (en) | 2017-11-24 |
Family
ID=53124660
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510044955.1A Active CN104598887B (en) | 2015-01-29 | 2015-01-29 | Recognition methods for non-canonical format handwritten Chinese address |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104598887B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106503634B (en) * | 2016-10-11 | 2020-02-14 | 讯飞智元信息科技有限公司 | Image alignment method and device |
CN107133215A (en) * | 2017-05-20 | 2017-09-05 | 复旦大学 | A kind of Chinese canonical address recognition methods of offline handwriting |
CN108369582B (en) * | 2018-03-02 | 2021-06-25 | 福建联迪商用设备有限公司 | Address error correction method and terminal |
CN108647263B (en) * | 2018-04-28 | 2022-04-12 | 淮阴工学院 | Network address confidence evaluation method based on webpage segmentation crawling |
CN109961259B (en) * | 2019-03-28 | 2021-07-27 | 上海中通吉网络技术有限公司 | Address standardization processing method and equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6327386B1 (en) * | 1998-09-14 | 2001-12-04 | International Business Machines Corporation | Key character extraction and lexicon reduction for cursive text recognition |
CN101645134A (en) * | 2005-07-29 | 2010-02-10 | 富士通株式会社 | Integral place name recognition method and integral place name recognition device |
CN102289467A (en) * | 2011-07-22 | 2011-12-21 | 浙江百世技术有限公司 | Method and device for determining target site |
CN103678708A (en) * | 2013-12-30 | 2014-03-26 | 小米科技有限责任公司 | Method and device for recognizing preset addresses |
-
2015
- 2015-01-29 CN CN201510044955.1A patent/CN104598887B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6327386B1 (en) * | 1998-09-14 | 2001-12-04 | International Business Machines Corporation | Key character extraction and lexicon reduction for cursive text recognition |
CN101645134A (en) * | 2005-07-29 | 2010-02-10 | 富士通株式会社 | Integral place name recognition method and integral place name recognition device |
CN102289467A (en) * | 2011-07-22 | 2011-12-21 | 浙江百世技术有限公司 | Method and device for determining target site |
CN103678708A (en) * | 2013-12-30 | 2014-03-26 | 小米科技有限责任公司 | Method and device for recognizing preset addresses |
Non-Patent Citations (1)
Title |
---|
中文邮政地址识别研究;娄正良;《中国优秀博士学位论文全文数据库》;20070215;第I139-72页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104598887A (en) | 2015-05-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104598887B (en) | Recognition methods for non-canonical format handwritten Chinese address | |
CN107256262B (en) | Image retrieval method based on object detection | |
US11080910B2 (en) | Method and device for displaying explanation of reference numeral in patent drawing image using artificial intelligence technology based machine learning | |
CN110909725A (en) | Method, device and equipment for recognizing text and storage medium | |
CN108369582B (en) | Address error correction method and terminal | |
US20090144277A1 (en) | Electronic table of contents entry classification and labeling scheme | |
Sadri et al. | A genetic framework using contextual knowledge for segmentation and recognition of handwritten numeral strings | |
US20150199567A1 (en) | Document classification assisting apparatus, method and program | |
US12118813B2 (en) | Continuous learning for document processing and analysis | |
CN103984943A (en) | Scene text identification method based on Bayesian probability frame | |
EP3483747A1 (en) | Preserving and processing ambiguity in natural language | |
CN111291099B (en) | Address fuzzy matching method and system and computer equipment | |
CN102360436B (en) | Identification method for on-line handwritten Tibetan characters based on components | |
CN106155998B (en) | A kind of data processing method and device | |
Zhu et al. | Deep residual text detection network for scene text | |
CN114677695A (en) | Table analysis method and device, computer equipment and storage medium | |
CN114780680A (en) | Retrieval and completion method and system based on place name and address database | |
Yu et al. | Recognizing text in historical maps using maps from multiple time periods | |
US12118816B2 (en) | Continuous learning for document processing and analysis | |
CN113505190B (en) | Address information correction method, device, computer equipment and storage medium | |
Machanavajjhala et al. | Collective extraction from heterogeneous web lists | |
CN105447104A (en) | Knowledge map generating method and apparatus | |
CN108845999B (en) | Trademark image retrieval method based on multi-scale regional feature comparison | |
Blank et al. | A depth-first branch-and-bound algorithm for geocoding historic itinerary tables | |
US20140181124A1 (en) | Method, apparatus, system and storage medium having computer executable instrutions for determination of a measure of similarity and processing of documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |