CN111079622A - Method for miniaturizing handwritten text recognizer under unified recognition framework - Google Patents
Method for miniaturizing handwritten text recognizer under unified recognition framework Download PDFInfo
- Publication number
- CN111079622A CN111079622A CN201911260983.1A CN201911260983A CN111079622A CN 111079622 A CN111079622 A CN 111079622A CN 201911260983 A CN201911260983 A CN 201911260983A CN 111079622 A CN111079622 A CN 111079622A
- Authority
- CN
- China
- Prior art keywords
- recognizer
- miniaturized
- word
- handwritten text
- natural language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/30—Writer recognition; Reading and verifying signatures
- G06V40/33—Writer recognition; Reading and verifying signatures based only on signature image, e.g. static signature recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2132—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
- G06F18/21322—Rendering the within-class scatter matrix non-singular
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2132—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
- G06F18/21322—Rendering the within-class scatter matrix non-singular
- G06F18/21324—Rendering the within-class scatter matrix non-singular involving projections, e.g. Fisherface techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Character Discrimination (AREA)
Abstract
The invention relates to the field of information input of mobile intelligent terminals such as smart phones, tablet computers, portable computers and navigators, in particular to a method for a miniaturized handwritten text recognizer under a unified recognition framework. The method specifically comprises the following steps: (1) the method comprises the following steps of (1) a small single word segmentation classifier, (2) a small single word recognizer, (3) a small natural language model, and (4) system parameter optimization.
Description
Technical Field
The invention relates to the field of information input of mobile intelligent terminals such as smart phones, tablet computers, portable computers and navigators, in particular to a method for a miniaturized handwritten text recognizer under a unified recognition framework.
Technical Field
With the integration of Mobile Computing (Mobile Computing) into wireless communication, networks, Mobile technologies, cloud Computing and Mobile intelligent terminals, pen-based user interfaces have become a hot spot in research. The rapid development of mobile intelligent terminals such as smart phones, tablet computers, laptop computers, navigators and the like makes handwritten information input popular and accepted by people. Currently, the equipment mainly uses handwritten single characters (1 character at a time) to input and uses association candidate item sets to realize information input, so that the freedom and the speed of input are greatly limited; although there is also handwriting string input for inputting several words (2, 3 at a time), the recognition rate and speed are still further improved. Like people writing at ordinary times, handwritten text input (writing multiple lines and multiple characters in each line at one time) is the best choice for further improving the information input speed and the degree of freedom, and further brings convenience to production and life of people. Although handwriting information in text form has a higher speed and satisfies human writing habits than single-word handwriting, recognition of handwritten text is challenging. The challenge is mainly due to uncertainty generated by word segmentation and recognition in the process of recognition of handwritten text, namely firstly, the input device cannot determine which strokes or parts form a handwritten word, and secondly, misrecognition is generated in the process of recognition of the word. Therefore, handwritten text recognition needs to go through a series of sub-expert modules. If each sub-expert module makes judgment one by one, errors generated by the previous sub-module are transmitted to the subsequent sub-expert module, namely errors are accumulated, and the handwritten text recognition rate is greatly reduced. In addition, the recognized handwritten text is text instead of single character and has natural language characteristic, so that ambiguity caused by uncertainty in processing of the sub expert modules can be eliminated by means of information fusion technology. In view of this, the three sub-expert modules of word segmentation, word recognition and natural language model make recognition and judgment on the handwritten text together, that is, the recognition of the handwritten text under the unified recognition framework is the optimal choice for ensuring the high recognition rate of the handwritten text. While the unified recognition framework ensures high recognition rates for the handwritten text recognizer, the memory space required for the handwritten text recognizer increases dramatically with the increase in sub-expert modules. In view of the characteristics of relatively small memory and high real-time responsiveness of a mobile intelligent terminal for completing information input in the mobile computing process, the handwritten text recognizer running on the mobile intelligent terminal is required to occupy relatively small memory. The mutual exclusivity of the high recognition rate and the miniaturization of the handwritten text recognizer enables the existing research and application to basically fail to use all expert modules, namely, the expert modules are added when the recognition rate is improved, and the expert modules are compressed when the handwritten text recognizer is miniaturized, so that the handwritten text input cannot be applied to the mobile intelligent terminal.
Disclosure of Invention
In order to solve the problems in the background art and achieve the characteristics of high recognition rate and miniaturization, the invention provides a method for miniaturizing a handwritten text recognizer under a unified recognition framework.
A method for miniaturizing a handwritten text recognizer under a unified recognition framework specifically comprises the following steps:
(1) small-sized single word segmentation classifier
In the construction process of the single character segmentation classifier, segmentation based on multiple hypotheses is adopted, all strokes are firstly divided into inseparable stroke blocks according to the overlapping of external matrixes of adjacent strokes, and then internal structure feature vectors are extracted from original blocks or combined blocks simultaneously according to geometric features extracted from off-strokes of the adjacent original blocks to form single character segmentation feature vectors. Then, feature selection is carried out on the extracted feature vectors through Fisher-based linear judgment analysis, so that the dimensionality of the single character segmentation feature vectors is compressed to about 10;
(2) miniaturized single-word recognizer
In the construction process of the miniaturized single word recognizer, the online single word recognizer LTM which only occupies hundreds of KB of memory is selected, and then the online LTM recognizer and the miniaturized offline single word recognizer are linearly integrated based on the minimum classification error. Initializing parameters of a feature compression matrix and an offline word recognizer by using a maximum likelihood estimation method, and further performing block clustering on the feature vector of the recognizer to obtain a data dictionary (clustering center) for data compression. Finally, optimizing the parameters of the three parts based on a judgment analysis method step to ensure the miniaturization and high recognition rate of the offline single-word recognizer;
(3) miniaturized natural language model
In the construction of a miniaturized natural language model, a traditional mode for storing the occurrence probability of a vocabulary tuple is converted into a generalized probability generating function by direct fitting, namely, a natural language model based on unitary, binary and ternary linear Interpolation (Interpolation) smoothing is adopted;
(4) system parameter optimization
And on the basis of the miniaturization of the expert modules, optimizing the integrated parameters of the modules of the unified identification framework based on a minimum classification error algorithm.
Detailed Description
A method for miniaturizing a handwritten text recognizer under a unified recognition framework specifically comprises the following steps:
(1) small-sized single word segmentation classifier
In the construction process of the single character segmentation classifier, segmentation based on multiple hypotheses is adopted, all strokes are firstly divided into inseparable stroke blocks according to the overlapping of external matrixes of adjacent strokes, and then internal structure feature vectors are extracted from original blocks or combined blocks simultaneously according to geometric features extracted from the off-strokes of the adjacent original blocks to form single character segmentation feature vectors. Then, feature selection is carried out on the extracted feature vectors through Fisher-based linear judgment analysis, so that the dimensionality of the single character segmentation feature vectors is compressed to about 10;
(2) miniaturized single-word recognizer
In the construction process of the miniaturized single word recognizer, the online single word recognizer LTM which only occupies hundreds of KB of memory is selected, and then the online LTM recognizer and the miniaturized offline single word recognizer are linearly integrated based on the minimum classification error. Initializing parameters of a feature compression matrix and an offline word recognizer by using a maximum likelihood estimation method, and further performing block clustering on the feature vector of the recognizer to obtain a data dictionary (clustering center) for data compression. Finally, optimizing the parameters of the three parts based on a judgment analysis method step to ensure the miniaturization and high recognition rate of the offline single-word recognizer;
(3) miniaturized natural language model
In the construction of a miniaturized natural language model, a traditional mode for storing the occurrence probability of a vocabulary tuple is converted into a generalized probability generating function by direct fitting, namely, a natural language model based on unitary, binary and ternary linear Interpolation (Interpolation) smoothing is adopted;
(4) system parameter optimization
And on the basis of the miniaturization of the expert modules, optimizing the integrated parameters of the modules of the unified identification framework based on a minimum classification error algorithm.
Claims (1)
1. A method for miniaturizing a handwritten text recognizer under a unified recognition framework is characterized by comprising the following steps:
(1) small-sized single word segmentation classifier
In the construction process of the single character segmentation classifier, segmentation based on multiple hypotheses is adopted, all strokes are firstly divided into indivisible stroke blocks according to the overlapping of external matrixes of adjacent strokes, then internal structure feature vectors are extracted from original blocks or combined blocks simultaneously according to geometrical features extracted from the off-strokes of the adjacent original blocks to form single character segmentation feature vectors, and then feature selection is carried out on the extracted feature vectors through Fisher-based linear judgment analysis, so that the dimensionality of the single character segmentation feature vectors is compressed to about 10;
(2) miniaturized single-word recognizer
In the construction process of the miniaturized single-word recognizer, firstly, selecting an online single-word recognizer LTM which only occupies hundreds of KB of memory, then linearly integrating the online LTM recognizer and the miniaturized offline single-word recognizer based on minimum classification errors, initializing parameters of a feature compression matrix and the offline single-word recognizer by using a maximum likelihood estimation method, further obtaining a data dictionary (a clustering center) for data compression by carrying out block clustering on a feature vector of the recognizer, and finally optimizing the three partial parameters based on a judgment analysis method step to ensure the miniaturization and high recognition rate of the offline single-word recognizer;
(3) miniaturized natural language model
In the construction of a miniaturized natural language model, a traditional mode for storing the occurrence probability of a vocabulary tuple is converted into a generalized probability generating function by direct fitting, namely, a natural language model based on unitary, binary and ternary linear Interpolation (Interpolation) smoothing is adopted;
(4) system parameter optimization
And on the basis of the miniaturization of the expert modules, optimizing the integrated parameters of the modules of the unified identification framework based on a minimum classification error algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911260983.1A CN111079622A (en) | 2019-12-10 | 2019-12-10 | Method for miniaturizing handwritten text recognizer under unified recognition framework |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911260983.1A CN111079622A (en) | 2019-12-10 | 2019-12-10 | Method for miniaturizing handwritten text recognizer under unified recognition framework |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111079622A true CN111079622A (en) | 2020-04-28 |
Family
ID=70313652
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911260983.1A Pending CN111079622A (en) | 2019-12-10 | 2019-12-10 | Method for miniaturizing handwritten text recognizer under unified recognition framework |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111079622A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101930545A (en) * | 2009-06-24 | 2010-12-29 | 夏普株式会社 | Handwriting recognition method and device |
TW201201113A (en) * | 2010-06-22 | 2012-01-01 | Sharp Kk | Handwriting recognition method and device |
CN206497439U (en) * | 2016-11-26 | 2017-09-15 | 黄淮学院 | A kind of handwritten Chinese character is quickly recognized and input instrument |
CN107209862A (en) * | 2015-01-21 | 2017-09-26 | 国立大学法人东京农工大学 | Program, information storage medium and identifying device |
-
2019
- 2019-12-10 CN CN201911260983.1A patent/CN111079622A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101930545A (en) * | 2009-06-24 | 2010-12-29 | 夏普株式会社 | Handwriting recognition method and device |
TW201201113A (en) * | 2010-06-22 | 2012-01-01 | Sharp Kk | Handwriting recognition method and device |
CN107209862A (en) * | 2015-01-21 | 2017-09-26 | 国立大学法人东京农工大学 | Program, information storage medium and identifying device |
CN206497439U (en) * | 2016-11-26 | 2017-09-15 | 黄淮学院 | A kind of handwritten Chinese character is quickly recognized and input instrument |
Non-Patent Citations (6)
Title |
---|
CUONG TUAN NGUYEN 等: "A unifed method for augmented incremental recognition of online handwritten Japanese and English text", 《INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION (IJDAR)》, pages 53 - 72 * |
JINFENG GAO 等: "Building compact recognizer with recognition rate maintained for on-line handwritten Japanese text recognition", 《PATTERN RECOGNITION LETTERS》, pages 169 - 177 * |
JINFENG GAO 等: "Complexity reduction with recognition rate maintained for online handwritten Japanese text recognition", 《PROC. SPIE 8297, DOCUMENT RECOGNITION AND RETRIEVAL XIX》, vol. 8297, pages 1 - 82970 * |
JINFENG GAO: "Development of a Robust and Compact On-Line Handwritten Japanese Text Recognizer for Hand-Held Devices", 《IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS》, pages 927 - 938 * |
陈擎国: "计算机技术在手写体汉字 识别方面的应用及发展趋势", 《科技传播》, pages 1 - 3 * |
高金锋 等: "联机手写体文本识别器的小型化研究", 《科技成果》, pages 1 - 5 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9417711B2 (en) | System and method for implementing sliding input of text based upon on-screen soft keyboard on electronic equipment | |
Wöllmer et al. | A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams | |
CN109740447A (en) | Communication means, equipment and readable storage medium storing program for executing based on artificial intelligence | |
EP2828793A1 (en) | Rotation-free recognition of handwritten characters | |
CN111753802B (en) | Identification method and device | |
CN112100337A (en) | Emotion recognition method and device in interactive conversation | |
Dai Nguyen et al. | Recognition of online handwritten math symbols using deep neural networks | |
Cojocaru et al. | Watch your strokes: improving handwritten text recognition with deformable convolutions | |
CN116110059A (en) | Offline handwriting mathematical formula identification method based on deep learning | |
Inunganbi et al. | Handwritten Meitei Mayek recognition using three‐channel convolution neural network of gradients and gray | |
CN101256624B (en) | Method and system for establishing HMM topological structure being suitable for recognizing hand-written East Asia character | |
CN104267835A (en) | Self-adaption gesture recognition method | |
CN116843155B (en) | SAAS-based person post bidirectional matching method and system | |
CN111079622A (en) | Method for miniaturizing handwritten text recognizer under unified recognition framework | |
CN111695450A (en) | Face rapid identification method based on IMobileNet | |
Ly et al. | Attention augmented convolutional recurrent network for handwritten Japanese text recognition | |
CN110990588B (en) | Method for miniaturizing natural language model of handwritten text recognizer under unified recognition framework | |
Mandal et al. | Exploration of CNN features for online handwriting recognition | |
Peng et al. | Temporal pyramid transformer with multimodal interaction for video question answering | |
CN113851113A (en) | Model training method and device and voice awakening method and device | |
CN114283791A (en) | Speech recognition method based on high-dimensional acoustic features and model training method | |
CN106570458A (en) | Recognition method for recognizing on-line handwritten Chinese and Japanese | |
Hai-Sheng et al. | Style transfer for QR code | |
CN113961701A (en) | Message text clustering method and device | |
Parwej | The state of the art recognize in arabic script through combination of online and offline |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |