CN111079622A

CN111079622A - Method for miniaturizing handwritten text recognizer under unified recognition framework

Info

Publication number: CN111079622A
Application number: CN201911260983.1A
Authority: CN
Inventors: 高金锋; 姚汝贤; 马贺红; 张瑜; 张俊明; 赖晗
Original assignee: Huanghuai University
Current assignee: Huanghuai University
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2020-04-28

Abstract

The invention relates to the field of information input of mobile intelligent terminals such as smart phones, tablet computers, portable computers and navigators, in particular to a method for a miniaturized handwritten text recognizer under a unified recognition framework. The method specifically comprises the following steps: (1) the method comprises the following steps of (1) a small single word segmentation classifier, (2) a small single word recognizer, (3) a small natural language model, and (4) system parameter optimization.

Description

Method for miniaturizing handwritten text recognizer under unified recognition framework

Technical Field

The invention relates to the field of information input of mobile intelligent terminals such as smart phones, tablet computers, portable computers and navigators, in particular to a method for a miniaturized handwritten text recognizer under a unified recognition framework.

Technical Field

With the integration of Mobile Computing (Mobile Computing) into wireless communication, networks, Mobile technologies, cloud Computing and Mobile intelligent terminals, pen-based user interfaces have become a hot spot in research. The rapid development of mobile intelligent terminals such as smart phones, tablet computers, laptop computers, navigators and the like makes handwritten information input popular and accepted by people. Currently, the equipment mainly uses handwritten single characters (1 character at a time) to input and uses association candidate item sets to realize information input, so that the freedom and the speed of input are greatly limited; although there is also handwriting string input for inputting several words (2, 3 at a time), the recognition rate and speed are still further improved. Like people writing at ordinary times, handwritten text input (writing multiple lines and multiple characters in each line at one time) is the best choice for further improving the information input speed and the degree of freedom, and further brings convenience to production and life of people. Although handwriting information in text form has a higher speed and satisfies human writing habits than single-word handwriting, recognition of handwritten text is challenging. The challenge is mainly due to uncertainty generated by word segmentation and recognition in the process of recognition of handwritten text, namely firstly, the input device cannot determine which strokes or parts form a handwritten word, and secondly, misrecognition is generated in the process of recognition of the word. Therefore, handwritten text recognition needs to go through a series of sub-expert modules. If each sub-expert module makes judgment one by one, errors generated by the previous sub-module are transmitted to the subsequent sub-expert module, namely errors are accumulated, and the handwritten text recognition rate is greatly reduced. In addition, the recognized handwritten text is text instead of single character and has natural language characteristic, so that ambiguity caused by uncertainty in processing of the sub expert modules can be eliminated by means of information fusion technology. In view of this, the three sub-expert modules of word segmentation, word recognition and natural language model make recognition and judgment on the handwritten text together, that is, the recognition of the handwritten text under the unified recognition framework is the optimal choice for ensuring the high recognition rate of the handwritten text. While the unified recognition framework ensures high recognition rates for the handwritten text recognizer, the memory space required for the handwritten text recognizer increases dramatically with the increase in sub-expert modules. In view of the characteristics of relatively small memory and high real-time responsiveness of a mobile intelligent terminal for completing information input in the mobile computing process, the handwritten text recognizer running on the mobile intelligent terminal is required to occupy relatively small memory. The mutual exclusivity of the high recognition rate and the miniaturization of the handwritten text recognizer enables the existing research and application to basically fail to use all expert modules, namely, the expert modules are added when the recognition rate is improved, and the expert modules are compressed when the handwritten text recognizer is miniaturized, so that the handwritten text input cannot be applied to the mobile intelligent terminal.

Disclosure of Invention

In order to solve the problems in the background art and achieve the characteristics of high recognition rate and miniaturization, the invention provides a method for miniaturizing a handwritten text recognizer under a unified recognition framework.

A method for miniaturizing a handwritten text recognizer under a unified recognition framework specifically comprises the following steps:

(1) small-sized single word segmentation classifier

In the construction process of the single character segmentation classifier, segmentation based on multiple hypotheses is adopted, all strokes are firstly divided into inseparable stroke blocks according to the overlapping of external matrixes of adjacent strokes, and then internal structure feature vectors are extracted from original blocks or combined blocks simultaneously according to geometric features extracted from off-strokes of the adjacent original blocks to form single character segmentation feature vectors. Then, feature selection is carried out on the extracted feature vectors through Fisher-based linear judgment analysis, so that the dimensionality of the single character segmentation feature vectors is compressed to about 10;

(2) miniaturized single-word recognizer

In the construction process of the miniaturized single word recognizer, the online single word recognizer LTM which only occupies hundreds of KB of memory is selected, and then the online LTM recognizer and the miniaturized offline single word recognizer are linearly integrated based on the minimum classification error. Initializing parameters of a feature compression matrix and an offline word recognizer by using a maximum likelihood estimation method, and further performing block clustering on the feature vector of the recognizer to obtain a data dictionary (clustering center) for data compression. Finally, optimizing the parameters of the three parts based on a judgment analysis method step to ensure the miniaturization and high recognition rate of the offline single-word recognizer;

(3) miniaturized natural language model

In the construction of a miniaturized natural language model, a traditional mode for storing the occurrence probability of a vocabulary tuple is converted into a generalized probability generating function by direct fitting, namely, a natural language model based on unitary, binary and ternary linear Interpolation (Interpolation) smoothing is adopted;

(4) system parameter optimization

And on the basis of the miniaturization of the expert modules, optimizing the integrated parameters of the modules of the unified identification framework based on a minimum classification error algorithm.

Detailed Description

(1) small-sized single word segmentation classifier

In the construction process of the single character segmentation classifier, segmentation based on multiple hypotheses is adopted, all strokes are firstly divided into inseparable stroke blocks according to the overlapping of external matrixes of adjacent strokes, and then internal structure feature vectors are extracted from original blocks or combined blocks simultaneously according to geometric features extracted from the off-strokes of the adjacent original blocks to form single character segmentation feature vectors. Then, feature selection is carried out on the extracted feature vectors through Fisher-based linear judgment analysis, so that the dimensionality of the single character segmentation feature vectors is compressed to about 10;

(2) miniaturized single-word recognizer

(3) miniaturized natural language model

(4) system parameter optimization

Claims

1. A method for miniaturizing a handwritten text recognizer under a unified recognition framework is characterized by comprising the following steps:

(1) small-sized single word segmentation classifier

In the construction process of the single character segmentation classifier, segmentation based on multiple hypotheses is adopted, all strokes are firstly divided into indivisible stroke blocks according to the overlapping of external matrixes of adjacent strokes, then internal structure feature vectors are extracted from original blocks or combined blocks simultaneously according to geometrical features extracted from the off-strokes of the adjacent original blocks to form single character segmentation feature vectors, and then feature selection is carried out on the extracted feature vectors through Fisher-based linear judgment analysis, so that the dimensionality of the single character segmentation feature vectors is compressed to about 10;

(2) miniaturized single-word recognizer

In the construction process of the miniaturized single-word recognizer, firstly, selecting an online single-word recognizer LTM which only occupies hundreds of KB of memory, then linearly integrating the online LTM recognizer and the miniaturized offline single-word recognizer based on minimum classification errors, initializing parameters of a feature compression matrix and the offline single-word recognizer by using a maximum likelihood estimation method, further obtaining a data dictionary (a clustering center) for data compression by carrying out block clustering on a feature vector of the recognizer, and finally optimizing the three partial parameters based on a judgment analysis method step to ensure the miniaturization and high recognition rate of the offline single-word recognizer;

(3) miniaturized natural language model

(4) system parameter optimization