CN1234094C - Character written-form judgement apparatus and method based on Bayes classification device - Google Patents

Character written-form judgement apparatus and method based on Bayes classification device Download PDF

Info

Publication number
CN1234094C
CN1234094C CN 02157957 CN02157957A CN1234094C CN 1234094 C CN1234094 C CN 1234094C CN 02157957 CN02157957 CN 02157957 CN 02157957 A CN02157957 A CN 02157957A CN 1234094 C CN1234094 C CN 1234094C
Authority
CN
China
Prior art keywords
feature
character
pca
font
swimming
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 02157957
Other languages
Chinese (zh)
Other versions
CN1438604A (en
Inventor
徐蔚然
刘刚
郭军
张洪刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN 02157957 priority Critical patent/CN1234094C/en
Publication of CN1438604A publication Critical patent/CN1438604A/en
Application granted granted Critical
Publication of CN1234094C publication Critical patent/CN1234094C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

The present invention discloses a device based on a Bayes classifier and a method thereof for judging a character font. The device comprises a character image input device and a front-end processing device, a feature extractor, a training sample memory, a PCA analyzer, a PCA converter, a classifier parameter estimator, a Bayes classifier, a reliability estimating device, a judgment result output device and a control processor. The present invention is a device which automatically learns knowledge by using statistical treatment method according to training samples, and accurately judges the character font. The device is well structured, operates simply, has high discrimination precision, can also obtain the high precision for characters which are severely contaminated by stamps or shadings, and the font can be accurately judged only needing 4 to 5 Chinese characters; parameters which are set manually are few, the present invention does not depend on people's experience, the step of segmenting a character, which is complicated and easy to make a mistake is avoided, and images do not need to be binarized. The device is suitable for character recognition systems with a strict precision requirement, such as a bank check recognition system, a letter address recognition system, a form recognition system, etc.

Description

Character script judgment device and method thereof based on Bayes classifier
Technical field
The present invention relates to the Chinese automatic recognition technical field, under the word pollution serious situation particularly to be identified, equipment and method that handwritten form and printed Chinese character are differentiated.This equipment is applicable in the strict character identification system of accuracy of identification, as check discriminating system, letter Address Recognition system, form recognition system etc.
Background technology
Font judges it is the basic problem in literal identification field, its importance is embodied in two aspects at least: the identification of multi-font literal is converted into single Character Font Recognition 1., thereby increase substantially the literal accuracy of identification: 2. keep the original document font information, realize that automatic document disposal system (ADP) prints the function of original document font." handwritten form and print hand writing font discrimination technology " belongs to a kind of font discrimination technology, and this technology is the gordian technique that automatic writing system recognition system (for example bank money amount in Chinese character automatic recognition system) is achieved.In the bank, the check that hand filling and printing are filled in mixes.And the Recognition Theory of handwritten text and print hand writing is different fully with recognition methods, and a kind of sorter high precision is simultaneously discerned this two kinds of literal.In addition, the amount in Chinese character literal of cashier's check is subjected to the pollution of seal and check shading very serious, needs character identification system to possess and removes the function of polluting.But because handwritten word and printed words " write " by different modes, thereby seal and check shading are also inequality to the conflicting mode and the influence degree of two class literal, handle two class literal so must use different decontamination methods.Consider the strict demand of check discriminating system to precision again, the font of accurately judging literal to be identified is to realize the automatic writing system key technique in identification.
Because the importance that font is judged, years of researches have been carried out at present both at home and abroad, many methods have also been proposed: 1. based on the template matching method (J.Hochberg of cluster, P.Kelly, T.Thomas, L.Kerms, 1997-IEEE PAMI, Automatic Script Identification From Document Images UsingCluster-Based Templates); 2. based on the font discrimination (T.N.Tan, 1998-IEEE PAMI, Rotation Invariant Texture Features and Their Use inAutomatic Script Identification) of rotational invariance textural characteristics; 3. based on the character recognition method (A.Zramdini, R.Ingold, 1998-IEEE PAMI, Optical Font Recognition UsingTypographical Features) of text printout feature; 4. based on the font discrimination (Y.Zhu, T.N.Tan, Y.H.Wang, 2001-IEEE PAMI, Font Recognition Based on Global TextureAnalysis) of overall texture analysis; 5. based on gradient vector, the Japanese handwritten form of grey level histogram and neural network and block letter font are judged (S.Imade, S.Tatsuta, 1993-Proc.2nd Intl.Conf, Segmentation andclassification for mixed text/image documents using neural network): 6. based on direction character, the English handwritten form of symmetrical feature and neural network and block letter font are judged (K.Kuhnke, 1995-Int.Conf.Document Analysis and Recognition2, A system for machine-written andhand-written character distinction), 7. judge (K.C.Fan based on the unsimplified Hanzi handwritten form and the block letter font of literal piece space of a whole page variance space characteristics, L.S.Wang, Y.T.Tu, 1997-Pattern Recognition, Classification of of machine-printed and handwritten texts using character block layoutvariance).
Though above these methods all are used for font discrimination, they are also different.1. ~ 4. method is mainly used in the different fonts of differentiating print hand writing; Judge though 5. ~ 7. be specifically designed to handwritten form and block letter font, they at spoken and written languages be respectively Japanese, English and unsimplified Hanzi.Also do not find at present at the handwritten form of simplified Hanzi and the article and the patent of block letter font judgement.In addition, though said method all has their own characteristics each, also there is following common drawback in they: all be the disposal route at free of contamination clean character image, all need a lot of literal, as passage, differentiate font, need more artificial setup parameter and dependence people's experience.
Summary of the invention
The objective of the invention is from the file and picture that mixes, handwritten form and print hand writing to be distinguished in order to solve the problems referred to above that the font of handwritten form and print hand writing is judged in the literal identification better.The present invention proposes a kind of character script judgment device and method thereof based on Bayes classifier.Equipment of the present invention realizes that by the following technical solutions described equipment comprises:
Character image input media and front end processing device are used for from extraneous input picture, and the position of determining literal to be identified in image;
Feature extractor is used for extracting the feature that is used to differentiate character script from the testing window of character image input media and front end processing device;
The training sample storer is used for the feature of all learning training samples is kept at together, is beneficial to the automatic study of Bayes classifier;
The PCA analyzer, the feature that is used for whole training samples that the training sample storer is preserved is carried out principal component analysis (PCA) (Principal Component Analysis), thereby obtains the PCA transducer;
The PCA transducer is used for according to the parameter that the PCA analyzer is determined the feature of sample being carried out the PCA conversion;
The classifier parameters estimator, all learning training samples that are used for providing according to the PCA transducer are estimated whole parameters of Bayes classifier automatically;
Bayes classifier is used for differentiating font according to the parameter of being determined by the classifier parameters estimator;
The confidence level estimation unit is used to assess the credibility that Bayes classifier is exported the result;
The judged result output unit is used for the analysis result of this device is exported to miscellaneous equipment;
Processor controls is used to control each above-mentioned device, so that coordinate different device, realizes the automatic study of described device and font judgement automatically.
Described character image input media and front end processing device comprise character image input media and testing window steady arm.
Described feature extractor comprises composing feature extractor, morphological feature extraction device, shade of gray distribution characteristics extraction apparatus and texture feature extraction device.
Described composing feature extractor is used to extract the feature on the arrangement mode of literal, and these features comprise: literal altitude feature, average word quant's sign, the wide absolute difference feature of word, average word space feature and maximum word space feature.
Described morphological feature extraction device is used to extract the modal feature of character stroke, and it comprises: the characteristics of mean of longitudinal projection's value tag, vertical average distance of swimming feature, the average distance of swimming feature of level, long distance of swimming advantageous characteristic and the long distance of swimming.
Described shade of gray distribution characteristics extraction apparatus is used for extracting the feature of the gray scale-gradient two-dimensional histogram of testing window, and it comprises the first two-dimensional histogram feature and two features of the second two-dimensional histogram feature.
The texture feature extraction device is used for extracting the textural characteristics of image.
Described output result comprises: the picture numbers of analysis, the credibility that font judged result and font are judged.
Described character script determination methods based on Bayes classifier, under the control of the processor controls of equipment, the method comprising the steps of:
From extraneous input picture, and the position of in image, determining literal to be identified;
From the testing window of character image input media and front end processing device, extract the feature that is used to differentiate character script;
The feature of all learning training samples is kept at together, so that the automatic study of Bayes classifier;
The feature of whole training samples that the training sample storer is preserved is carried out principal component analysis (PCA), thereby obtains the PCA conversion;
According to the parameter that the PCA analyzer is determined the feature of sample is carried out the PCA conversion;
All learning training samples that provide according to the PCA transducer are estimated whole parameters of Bayes classifier automatically;
Differentiate font according to the parameter that the classifier parameters estimator is determined;
Assessment Bayes classifier output result's credibility;
The result exports to miscellaneous equipment with discriminatory analysis.
The feature that described extraction is used to differentiate character script comprises composing feature, morphological feature, shade of gray distribution characteristics and textural characteristics.
Feature on the arrangement mode of described composing character representation literal comprises: literal altitude feature, average word quant's sign, the wide absolute difference feature of word, average word space feature and maximum word space feature, and their computing formula is respectively:
Figure C0215795700081
Figure C0215795700083
Literal wherein to be identified has N word, W iThe word that is i word is wide, W oBe the normal width of print hand writing, S iBe i the word space that obtains.
Described morphological feature is represented the modal feature of character stroke, comprising: the characteristics of mean of longitudinal projection's value tag, vertical average distance of swimming feature, the average distance of swimming feature of level, long distance of swimming advantageous characteristic and the long distance of swimming.Their computing formula is respectively:
Figure C0215795700085
Figure C0215795700091
Wherein the longitudinal projection of P (i) expression testing window i row is worth, and T is a threshold value, m lAnd m g, the expression run length is the distance of swimming number of l and g, N lAnd N g, represent the maximum length of the level and the vertical direction distance of swimming respectively.N tBe to get fixed threshold value by experience.
Described shade of gray distribution characteristics is represented the feature in the gray scale-gradient two-dimensional histogram of testing window, comprises the first two-dimensional histogram feature and two features of the second two-dimensional histogram feature.
The computing formula of these two features is respectively:
Figure C0215795700092
(x, y) the expression two-dimensional histogram is in point (x, value y) for hist.
Textural characteristics in the textural characteristics presentation video.
Described output result comprises: the picture numbers of analysis, the credibility that font judged result and font are judged.
According to training sample, utilize the automatic learning knowledge of statistical processing methods, thereby accurately judge the equipment of character script.This device structureization is good, and is simple to operate, the discrimination precision height, and, can obtain high precision equally for by the literal of seal and shading severe contamination; Only need 4 ~ 5 Chinese characters can accurately judge font, and artificial setup parameter is few, does not rely on people's experience; Avoided literal cutting step complicated and that make mistakes easily; Do not need image is carried out binaryzation.This equipment is applicable in the strict character identification system of accuracy of identification, as check discriminating system, letter Address Recognition system, form recognition system etc.
Below in conjunction with the detailed description of accompanying drawing, the flesh and blood of technical scheme that the present invention may be better understood, identical mark is represented identical device in the accompanying drawing.
Description of drawings
Fig. 1 is based on the character script judgment means block diagram of Bayes classifier;
Fig. 2 is the synoptic diagram of character image input media and front end processing device;
Fig. 3 is the synoptic diagram of the image of block letter and handwritten text, and Fig. 3 (a) is a print hand writing, and Fig. 3 (b) is a handwritten text;
Fig. 4 is the shade of gray distribution plan of the character image correspondence of Fig. 3, the corresponding print hand writing of Fig. 4 (a), the corresponding handwritten text of Fig. 4 (b);
Fig. 5 is the template of each class textural characteristics, and Fig. 5 (a) is the template of textural characteristics 1, and Fig. 5 (b) is the template of textural characteristics 2, and Fig. 5 (c) is the template of textural characteristics 3;
Fig. 6 is the automatic training study mode of operation process flow diagram of this device;
Fig. 7 is the automatic font judgment task model process figure of this device.
Embodiment
A kind of character script judgment device and method thereof based on Bayes classifier that the present invention proposes realizes by the following technical solutions.
Fig. 1 is based on the character script judgment device block diagram of Bayes classifier.As shown in Figure 1, described equipment comprises:
Character image input media and front end processing device 1 are used for from extraneous input picture, and the position of determining literal to be identified in image.The preceding termination external unit of described character image input media, external unit can be the image input device of scanner or similar functions.As shown in Figure 2, character image input media and front end processing device 1 are made up of character image input media 11 and 12 two basic devices of testing window steady arm.Character image input media 11 is transformed into 256 grades of gray level images representing with matrix form to the image file of any form of importing from external unit.Both comprise literal to be identified from the gray level image matrix of character image input media and front end processing device 1 output, comprised a large amount of useless figures again.Testing window steady arm 12 is the testing window steady arm, and its function is exactly to determine the position of testing window, makes literal to be identified be included in the middle of the testing window.
Feature extractor 2 is used for extracting the feature that is used to differentiate character script from the testing window of character image input media and front end processing device 1.Extract and select abundant and effective feature extremely important, the method for feature that this device extracted and extraction feature is its key with high precision discriminating power.Feature extractor 2 uses 4 sub-devices in order to extract 4 big class, totally 15 kinds of features.These 4 sub-devices are respectively composing feature extractor, morphological feature extraction device, shade of gray distribution characteristics extraction apparatus and texture feature extraction device.The composing feature extractor is used to extract the feature on the arrangement mode of literal, and these features comprise: literal altitude feature, average word quant's sign, the wide absolute difference feature of word, average word space feature and maximum word space feature.
Their computing formula is:
Figure C0215795700104
Literal wherein to be identified has N word, W iThe word that is i word is wide, W oBe the normal width of print hand writing, S iBe i the word space that obtains.The morphological feature extraction device is used to extract the modal feature of character stroke, comprising: the characteristics of mean of longitudinal projection's value tag, vertical average distance of swimming feature, the average distance of swimming feature of level, long distance of swimming advantageous characteristic and the long distance of swimming.Their computing formula is:
Figure C0215795700113
Figure C0215795700115
Wherein the longitudinal projection of P (i) expression testing window i row is worth, and T is a threshold value, m lAnd m gThe expression run length is the distance of swimming number of l and g, N lAnd N gThe maximum length of representing the level and the vertical direction distance of swimming respectively.N tBe to get fixed threshold value by experience.Feature in the gray scale-gradient two-dimensional histogram of shade of gray distribution characteristics extraction apparatus extraction testing window comprises the first two-dimensional histogram feature and two features of the second two-dimensional histogram feature.
It among Fig. 3 the image of block letter and handwritten text.Black surround among the figure has marked testing window.Fig. 4 is the shade of gray distribution plan of the character image correspondence of Fig. 3.Zone 1 and zone 2 among Fig. 4 are used to extract the first two-dimensional histogram feature and the second two-dimensional histogram feature 2, and the computing formula of these two features is:
(x, y) the expression two-dimensional histogram is at point (x, value y), image promptly shown in Figure 4 for hist.The texture feature extraction device is used for extracting the textural characteristics of image, comprises that textural characteristics 1 is to feature 3.The value of textural characteristics is exactly the quantity that comprises each class textural characteristics template in the testing window.The template of each class textural characteristics is seen Fig. 5.
Training sample storer 3 is used for the feature of all learning training samples is kept at together, is beneficial to the automatic study of Bayes classifier 7;
PCA analyzer 4, the feature that is used for whole training samples that training sample storer 3 is preserved is carried out principal component analysis (PCA) (Principal Component Analysis), thereby obtains the PCA conversion;
PCA transducer 5 is used for according to the parameter that PCA analyzer 4 is determined the feature of sample being carried out the PCA conversion;
Classifier parameters estimator 6, all learning training samples that are used for providing according to PCA transducer 5 are estimated whole parameters of Bayes classifier 7 automatically;
Bayes classifier 7 is used for differentiating font according to the parameter of being determined by classifier parameters estimator 6;
Confidence level estimation unit 9 is used to assess the credibility that Bayes classifier 7 is exported results;
Judged result output unit 10 is used for the analysis result of this device is exported to miscellaneous equipment;
Processor controls 8 is used to control each above-mentioned device, so that coordinate different device, realizes the automatic study of described device and font judgement automatically.
Described output result comprises: the picture numbers of analysis, the credibility that font judged result and font are judged.
Character script judgment device and method based on Bayes classifier of the present invention can more be expressly understood by following description.Character script judgment device based on Bayes classifier is made up of above-mentioned 12 basic devices, and this equipment has two mode of operations: training study pattern and font judgment model automatically automatically.
Automatically the function that need finish of training study mode of operation is: to analyzing according to the learning training sample, thereby determine whole parameters of PCA transducer; According to the learning training sample, estimate whole parameters of Bayes classifier.Under this pattern, the device that participates in work mainly contains device 1 to device 8.
Fig. 6 is the process flow diagram of automatic training study mode of operation.Its concrete steps are as follows:
Step 61 is read in a learning training image from input media, and input media can be made up of the image input device of scanner or similar functions, and image can be colour, gray scale or black white image, the form of image can be BMP, TIF, JPG, international standard forms such as GIF;
Step 62 becomes the format conversion of input picture the matrix form of 256 grades of gray scales to represent, so that it is handled;
Step 63 is accurately determined the position of test window, all extracts from test window in order to the full detail of judging font;
Step 64 is extracted 4 classes from test window, totally 15 kinds of features;
Step 65 the characteristic storage of this learning training training sample in the training sample storer;
Step 61 ~ step 65 is finished from a training sample and is extracted feature, and is saved in the function in the training sample storer;
Step 66 judges whether to also have new training sample, if judged result is a "Yes", then forwards step 61 to; If judged result is a "No", then forward step 67 to; Step 61 ~ step 66 constantly repeats, and all is saved in the training sample storer up to the feature of all training samples;
The feature of all training samples in the step 67 pair training sample storer is carried out principal component analysis (PCA), thereby obtains the PCA transducer;
Step 68 is utilized the classifier parameters estimator simultaneously according to all training samples in the training sample storer, thereby obtains Bayes classifier, finally finishes the learning training process.
The function that automatic font judgment task pattern is finished is: according to the knowledge that automated sample learning training pattern is obtained, utilize Bayes classifier to judge the font of input characters image.Under this pattern, the device that participates in work mainly contains character image input media and front end processing device 1, feature extractor 2, PCA transducer 5, Bayes classifier 7, processor controls 8, confidence level estimation unit 9 and judged result output unit 10 formations.Because classifier parameters estimator 6 does not participate in work, so Bayes classifier 7 is directly passed in the output of PCA transducer 5, as the input of Bayes classifier 7.
Fig. 7 is the process flow diagram of automatic font judgment task pattern.Its concrete steps are as follows:
Step 71 is read in a character image from input media, and input media can be made up of the image input device of scanner or similar functions, and image can be colour, gray scale or black white image, and form can be BMP, TIF, JPG etc.;
Step 72 becomes the format conversion of input picture the matrix form of 256 grades of gray scales to represent, so that it is handled;
Step 73 is accurately determined the position of test window, all extracts from test window in order to the full detail of judging font;
Step 74 is extracted 4 classes from test window, totally 15 kinds of features;
The feature of step 75 pair this character image is carried out the PCA conversion, is added up independently new feature each other;
Step 76 utilizes Bayes classifier to judge the font of this character image, and estimates this result's credibility;
Sequence number, font judged result and the result's of this character image of step 77 output credibility is finished the font of this character image is judged.
Character script judgment means based on Bayes classifier can be under any operating system platform, utilize any programming language, utilize software mode to realize, also can adopt suitable hardware to realize, have good realizability, and can be integrated into flexibly in other the character identification system.
More than describe and only provide implementation method of the present invention by means of embodiment.For those skilled in the art is conspicuous, the implementation detail that the invention is not restricted to provide above, can realize with additional embodiments under the situation that does not break away from feature of the present invention that some parts among the embodiment decompose, merge or use microprocessor to realize.Therefore, it is illustrative that the embodiment that provides should be considered to, rather than restrictive.Therefore, it is defined by the appended claims realizing and use possibility of the present invention.Thereby the realization various selections of being determined by claim of the present invention comprise that equivalent embodiment also belongs to scope of the present invention.

Claims (13)

1, a kind of character script judgment device based on Bayes classifier, it is characterized in that: described equipment comprises:
Character image input media and front end processing device (1) are used for from extraneous input picture, and the position of determining literal to be identified in image;
Described character image input media and front end processing device (1) comprise character image input media (11) and testing window steady arm (12);
Feature extractor (2) is used for extracting the feature that is used to differentiate character script from the testing window of character image input media and front end processing device (1);
Training sample storer (3) is used for the feature of all learning training samples is kept at together, is beneficial to the automatic study of Bayes classifier;
PCA analyzer (4), the feature that is used for whole training samples that the training sample storer is preserved is carried out principal component analysis (PCA), thereby obtains the PCA conversion;
PCA transducer (5) is used for according to the parameter that PCA analyzer (4) is determined the feature of sample being carried out the PCA conversion;
Classifier parameters estimator (6), all learning training samples that are used for providing according to PCA transducer (5) are estimated whole parameters of Bayes classifier (6) automatically;
Bayes classifier (7) is used for differentiating font according to the parameter of being determined by classifier parameters estimator (6);
Confidence level estimation unit (9) is used to assess the credibility that Bayes classifier (7) is exported the result;
Judged result output unit (10) is used for the analysis result of this device is exported to miscellaneous equipment;
Processor controls (8) is used to control each above-mentioned device, so that coordinate different device, realizes the automatic study of described device and font judgement automatically.
2, according to the equipment of claim 1, it is characterized in that: described feature extractor comprises composing feature extractor, morphological feature extraction device, shade of gray distribution characteristics extraction apparatus and texture feature extraction device.
3, according to the equipment of claim 2, it is characterized in that: described composing feature extractor is used to extract the feature on the arrangement mode of literal, and these features comprise: literal altitude feature, average word quant's sign, the wide absolute difference feature of word, average word space feature and maximum word space feature.
4, according to the equipment of claim 3, it is characterized in that: described morphological feature extraction device is used to extract the modal feature of character stroke, and it comprises: the characteristics of mean of longitudinal projection's value tag, vertical average distance of swimming feature, the average distance of swimming feature of level, long distance of swimming advantageous characteristic and the long distance of swimming.
5, according to the equipment of claim 4, it is characterized in that: described shade of gray distribution characteristics extraction apparatus is used for extracting the feature of the gray scale-gradient two-dimensional histogram of testing window, and it comprises the first two-dimensional histogram feature and two features of the second two-dimensional histogram feature.
6, according to the equipment of claim 5, it is characterized in that: described texture feature extraction device is used for extracting the textural characteristics of image.
7, according to the equipment of claim 1 or 5, it is characterized in that: described output result comprises: the picture numbers of analysis, the credibility that font judged result and font are judged.
8, a kind of character script determination methods based on Bayes classifier is characterized in that: under the control of the processor controls of equipment, the method comprising the steps of:
From extraneous input picture, and the position of in image, determining literal to be identified;
Extract the feature that is used to differentiate character script from the testing window of character image input media and front end processing device;
The feature of all learning training samples is kept at together, so that the automatic study of Bayes classifier;
The feature of whole training samples that the training sample storer is preserved is carried out principal component analysis (PCA), thereby obtains the PCA conversion;
According to the parameter that the PCA analyzer is determined the feature of sample is carried out the PCA conversion;
Automatically estimate the whole parameters of Bayes classifier according to all learning training samples that the PCA transducer provides;
Differentiate font according to the parameter that the classifier parameters estimator is determined;
Assessment Bayes classifier output result's credibility;
The result exports to miscellaneous equipment with discriminatory analysis.
9, method according to Claim 8 is characterized in that: the feature that described extraction is used to differentiate character script comprises composing feature, morphological feature, shade of gray distribution characteristics and textural characteristics.
10, according to the method for claim 9, it is characterized in that: the feature on the arrangement mode of described composing character representation literal, comprise: literal altitude feature, average word quant's sign, the wide absolute difference feature of word, average word space feature and maximum word space feature, their computing formula is respectively:
Figure C021579570003C1
Figure C021579570003C3
Literal wherein to be identified has N word, W iThe word that is i word is wide, W 0Be the normal width of print hand writing, S iBe i the word space that obtains.
11, according to the method for claim 10, it is characterized in that: described morphological feature is represented the modal feature of character stroke, comprise: the characteristics of mean of longitudinal projection's value tag, vertical average distance of swimming feature, the average distance of swimming feature of level, long distance of swimming advantageous characteristic and the long distance of swimming, their computing formula is respectively:
Figure C021579570004C4
Figure C021579570004C5
Figure C021579570004C6
Wherein the longitudinal projection of P (i) expression testing window i row is worth, and T is a threshold value, m lAnd m gThe expression run length is the distance of swimming number of l and g, N lAnd N gThe maximum length of representing the level and the vertical direction distance of swimming respectively, N lBe to get fixed threshold value by experience.
12, according to the method for claim 11, it is characterized in that: described shade of gray distribution characteristics is represented the feature in the gray scale-gradient two-dimensional histogram of testing window, comprises the first two-dimensional histogram feature and two features of the second two-dimensional histogram feature,
The computing formula of these two features is respectively:
Figure C021579570004C7
(x, y) the expression two-dimensional histogram is in point (x, value y) for hist.
13, according to Claim 8 or 12 method, it is characterized in that: described output result comprises: the picture numbers of analysis, the credibility that font judged result and font are judged.
CN 02157957 2002-12-23 2002-12-23 Character written-form judgement apparatus and method based on Bayes classification device Expired - Fee Related CN1234094C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 02157957 CN1234094C (en) 2002-12-23 2002-12-23 Character written-form judgement apparatus and method based on Bayes classification device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 02157957 CN1234094C (en) 2002-12-23 2002-12-23 Character written-form judgement apparatus and method based on Bayes classification device

Publications (2)

Publication Number Publication Date
CN1438604A CN1438604A (en) 2003-08-27
CN1234094C true CN1234094C (en) 2005-12-28

Family

ID=27672213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 02157957 Expired - Fee Related CN1234094C (en) 2002-12-23 2002-12-23 Character written-form judgement apparatus and method based on Bayes classification device

Country Status (1)

Country Link
CN (1) CN1234094C (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100356393C (en) * 2005-08-18 2007-12-19 北大方正集团有限公司 Character recognition method predicted base on font
US7599556B2 (en) * 2005-08-25 2009-10-06 Joseph Stanley Czyszczewski Apparatus, system, and method for scanning segmentation
US7899253B2 (en) * 2006-09-08 2011-03-01 Mitsubishi Electric Research Laboratories, Inc. Detecting moving objects in video by classifying on riemannian manifolds
CN101315670B (en) 2007-06-01 2010-08-11 清华大学 Specific shot body detection device, learning device and method thereof
CN102521516A (en) * 2011-12-20 2012-06-27 北京商纳科技有限公司 Method and system for automatically creating error homework textbooks
CN103914680B (en) * 2013-01-07 2018-03-23 上海宝信软件股份有限公司 A kind of spray printing character picture identification and check system and method
CN103824373B (en) * 2014-01-27 2016-06-08 深圳辰通智能股份有限公司 A kind of bill images amount of money sorting technique and system
CN107220655A (en) * 2016-03-22 2017-09-29 华南理工大学 A kind of hand-written, printed text sorting technique based on deep learning
CN107945807B (en) * 2016-10-12 2021-04-13 厦门雅迅网络股份有限公司 Voice recognition method and system based on silence run
CN108009472B (en) * 2017-10-25 2020-07-21 五邑大学 Finger back joint print recognition method based on convolutional neural network and Bayes classifier
CN111027345A (en) * 2018-10-09 2020-04-17 北京金山办公软件股份有限公司 Font identification method and apparatus

Also Published As

Publication number Publication date
CN1438604A (en) 2003-08-27

Similar Documents

Publication Publication Date Title
JP5379085B2 (en) Method and system for classifying connected groups of foreground pixels in a scanned document image based on marking type
Lelore et al. FAIR: a fast algorithm for document image restoration
CN1234094C (en) Character written-form judgement apparatus and method based on Bayes classification device
CN1258894A (en) Apparatus and method for identifying character
CN1991865A (en) Device, method, program and media for extracting text from document image having complex background
CN109784342A (en) A kind of OCR recognition methods and terminal based on deep learning model
CN1324068A (en) Explanatory and search for handwriting sloppy Chinese characters based on shape of radicals
CN1472695A (en) Symbol identifying device and method
US20030012438A1 (en) Multiple size reductions for image segmentation
Kumar et al. Multi-script robust reading competition in ICDAR 2013
CN105184329A (en) Cloud-platform-based off-line handwriting recognition method
CN110188750A (en) A kind of natural scene picture character recognition method based on deep learning
Lam et al. Reading newspaper text
Khan et al. Car Number Plate Recognition (CNPR) system using multiple template matching
CN1987894A (en) Self adaptive two-valued method, device and storage medium for file
CN114386413A (en) Handling digitized handwriting
CN115273108B (en) Automatic collection method and system for artificial intelligent identification
Almohri et al. A real-time DSP-based optical character recognition system for isolated Arabic characters using the TI TMS320C6416T
Tomaschek Evaluation of off-the-shelf OCR technologies
Bozkurt et al. Classifying fonts and calligraphy styles using complex wavelet transform
Shashidhara et al. A Review On Text Extraction Techniques For Degraded Historical Document Images
Aparna et al. A complete OCR system development of Tamil magazine documents
CN109086769A (en) A kind of fracture adhesion laser printing numberical string identifying method
Bagoriya et al. Font type identification of hindi printed document
JPH08272902A (en) Method for recognizing character of different quality and different font

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20051228

Termination date: 20100125