CN103810484B - The mimeograph documents discrimination method analyzed based on printing character library - Google Patents

The mimeograph documents discrimination method analyzed based on printing character library Download PDF

Info

Publication number
CN103810484B
CN103810484B CN201310538041.1A CN201310538041A CN103810484B CN 103810484 B CN103810484 B CN 103810484B CN 201310538041 A CN201310538041 A CN 201310538041A CN 103810484 B CN103810484 B CN 103810484B
Authority
CN
China
Prior art keywords
mrow
msub
eta
msup
chinese character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310538041.1A
Other languages
Chinese (zh)
Other versions
CN103810484A (en
Inventor
姚勇
王韦桦
张东方
郭红艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201310538041.1A priority Critical patent/CN103810484B/en
Publication of CN103810484A publication Critical patent/CN103810484A/en
Application granted granted Critical
Publication of CN103810484B publication Critical patent/CN103810484B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Character Discrimination (AREA)

Abstract

The present invention relates to a kind of mimeograph documents discrimination method analyzed based on printing character library, belong to mimeograph documents authentication technique field.Its key step includes:The feature of the sample print machine Chinese character image of different model is extracted, by learning process, sample training is turned into the characteristic value storehouse corresponding to the identical Chinese character of different model printer used in system, matched successively with feature Chinese character base, complete the coarse fractions class to Chinese character base;On the basis of previous step, HU moment characteristics values, this Chinese character of deeper sign, until the Chinese character identified in image that the characteristic information characterized can be only are utilized.A kind of character image computer knowledge method for distinguishing for differentiating with conventional character library and printing type is proposed in the present invention, theory is clear and definite, process is brief, easy to operate, quick and precisely, is a kind of effective and feasible method.The invention integrates multiple technologies and useful information, to improve discriminating accuracy rate, is that public safety and material evidence evaluation department provide the computer printout inspection of document system automated.

Description

The mimeograph documents discrimination method analyzed based on printing character library
Technical field
The present invention relates to a kind of mimeograph documents discrimination method analyzed based on printing character library, belong to mimeograph documents authentication technique Field.
Background technology
With the popularization of office automation, printer is widely used in daily life and work.And printing File is as topmost written record form, no matter in criminal suit, or civil, administrative litigation, and mimeograph documents are examined Censorship amount all sharply increase, examination requirements mainly the true and false including file differentiate, source verification (board the same) and shape Into three aspects such as time check.Wherein how the quick method for judging type of printer is realized using Computer Applied Technology, There is provided important clue by the investigation for case, and this kind of method is not reported so far, in addition this respect research data Lack very much.
Because the operation principle of various printers and printing are not quite similar with character library, printing writing should have differences, so By extracting type fount, and analysis is identified to it, then character and fontlib are differentiated through overmatching using grader It is born into the printer for investigating mimeograph documents.Thus the utilization of Character Font Recognition technology is very crucial, in current Character Font Recognition technology, Have plenty of the frequency domain character for extracting individual Chinese character with the method for wavelet analysis and do training sample, use the quadratic classifier of modification Carry out the identification of monocase.Have plenty of using the wavelet analysis method application BP nerve nets based on wavelet energy distribution proportion feature Network realizes identification in the unrelated condition of word.But the recognition effect of the above method can be remarkably decreased when number of fonts increases. Chinese character style feature, which is extracted, also based on texture analysis, using Gabor filter carries out Chinese Character Font Recognition.Its recognition speed It hurry up, discrimination is very high, but its dimension is high, it is computationally intensive.Know in addition with using font of the gray scale based on empirical mode decomposition Not.This method recognition of dimension low (only 9 dimensions), amount of calculation is small and discrimination is high.Above method is to enter for monocase What row was extracted.Also it is the Character Font Recognition technology for being directed to monocase to have, and mainly uses wavelet transformation to extract character feature, to seven kinds Its discrimination height is identified in monocase Chinese character style.But intrinsic dimensionality is up to 256 dimensions, and this can have a strong impact on the identification of identifier Speed, increases amount of calculation.
Chinese character is pictograph, and number of words is more, font change is abundant, and structure is extremely complex, and its average stroke number is English words Female more than ten times.And having multiple fonts, printing Chinese character font mainly has the Song typeface, imitation Song-Dynasty-Style typeface, regular script, black matrix, lishu and children Circle etc..Difference is between them:The different of overall font are for example from the font of whole word, and Song typeface font is square;It is imitative The Song typeface is the font for imitating Song's version book, and font is slightly longer;Regular script font is similar to handwritten form, square;Black matrix font is upright;Lishu Basic structure is square, and body side is flat;Children's circle font is round and smooth, and body is bigger.The stroke of different fonts thickness change on be Different.The size difference of each word is larger.Also stroke decoration and orientation angle are different, and same basic strokes are in different words It is significantly different at the first stroke of a Chinese character of body and receipts pen.The angle of other basic strokes writing is also different.
Because Chinese character is to be arranged by basic strokes such as these horizontal, vertical, slash, right-falling stroke, point, folding, hooks, combined and constituted.Cause This, a Chinese character can represent its feature by " horizontal, vertical, slash, right-falling stroke ".Equally, it is also to be embodied among stroke the characteristics of various fonts , font also can represent its feature by " horizontal, vertical, slash, right-falling stroke ".
The content of the invention
Differentiate printing type it is an object of the invention to analysis using printout font character library, work is being reduced as far as possible Effective information is made full use of under conditions of measuring, conventional printing type is judged using conventional font and font size, to be to expand Big research range is prepared.
To achieve these goals, technical scheme is as follows.
A kind of mimeograph documents discrimination method analyzed based on printing character library, its key step is included:Extract different model Sample training, by learning process, is turned into the different model printer used in system by the feature of sample print machine Chinese character image Characteristic value storehouse corresponding to identical Chinese character, i.e. stroke feature and the HU moment characteristics values simplified, that is, obtain total pixel of Chinese character image Number, number of hits, are matched with feature Chinese character base successively, complete the coarse fractions class to Chinese character base, as a result recognized as next step With object;Then, on the basis of previous step, HU moment characteristics values, this Chinese character of deeper sign, until the spy characterized are utilized Reference breath can uniquely identify the Chinese character in image.
Its specific steps includes:(1) feature extraction of font is set up with character library;(2) classifier design.Wherein:
(1) feature extraction of font is set up with character library:The extraction of Chinese character image feature, is exactly the spy according to Chinese character image Point, its code word can be represented by working out, this code word one chinese character of correspondence, it is obvious that the code word of different Chinese character image It must differ, the character that code word is represented has uniqueness.Then, learn by training, using identical character representation method, Set up the Chinese character base for belonging to this type feature.
(1a) extracts stroke feature sequence:From the signature analysis to Chinese character, stroke direction clue is comprehensively, accurately, stably The composition information of Chinese character is reflected, by counting the stroke feature of Chinese character file and picture in file to be tested, realizes and distinguishes different Chinese character style, and thus carry out type of printer of the interpretation belonging to it, implement step as follows:
1st step, eight regions are equally divided into by Chinese character image, and according to from left to right, order from top to bottom is counted successively Black pixel (pixel that i.e. value is 1) in each region, so, can be obtained according to the black pixel count in eight regions Eight characteristic values.
2nd step, acquisition characteristic value is passed through using stroke, is passed through twice with longitudinal direction from laterally passing through twice, i.e., in transverse direction 1/3 and 2/3 at pass through respectively, record through stain number, similarly, in longitudinal direction method according to this, so, four can be obtained again Characteristic value.
All black pixel numbers in 3rd step, statistical picture, so, just obtain a characteristic value, add the eight of the first step again Individual characteristic value, four characteristic values of second step, just have 13 characteristic values altogether.
(1b) extract strokes sequence away from feature:Pass through the extraction of stroke feature sequence, it has been found that the sequence is in fact N number of independent random same distribution variable can be counted as.So, on the basis of strokes sequence is obtained, it is possible to by carrying The moment characteristics of image are taken to realize identification.It is a kind of important side in pattern-recognition to carry out image recognition using moment invariants Method.Square is used for the distribution for characterizing random quantity in statistics, and the spatial distribution for representing material is used in mechanics.If binary map Picture or gray level image regard two-dimentional density fonction as, it is possible to which square technology is applied in graphical analysis.So, square can For describing the feature of piece image, and it is extracted as the feature similar with mechanics to statistics.In recent years, by two and three dimensions The attention of the invariant feature of square value striked by image caused image circle personage.The square type of skill is a lot, has been employed In image classification and many aspects of identifying processing.For each stroke feature sequence, intrinsic dimensionality and calculating speed are being considered On the basis of, we extract the first order and second order moments of discrete HU squares as characteristic value.To image in actual application Processing often uses discrete function, therefore the research of bending moment is not more meaningful in the discrete case, if f (x, y) is certain X-Y scheme Transform, then its (p+q) rank moment of the orign be defined as:
WhereinThe as centre of moment coordinate in region.Normalized central moment is expressed as η simultaneouslypq, it is defined as:
Wherein Y=(p+q)/2.
Following 7 invariant moments group can be exported using second order and three ranks normalization central moment, the exponent number of central moment is bigger, and institute is anti- The shape details reflected are more, but simultaneously more sensitive to noise and computationally intensive, and there was only M in the discrete case1Still have There is rotational invariance, can prove that other six squares in not bending moment also have rotational invariance.In the present embodiment, from calculating The smaller invariant M of amount1, M2, M3, M4.The not bending moment of image has consistency when affine transformation occurs for image, that is, works as figure As in rotation, translation, the uniformly conversion such as flexible, the value of its square will not change, and M1, M2, M3, M4Amount of calculation is not Too big, it is suitable to select it as the constant parameter of identification target.Choose φ1=M1, φ2=M2, φ3=M3, φ4=M4Make For preceding 4 characteristic quantities.
The foundation of (1c) standard character library:
The feature of the sample print machine Chinese character image of different model is extracted, by learning process, sample training is turned into is Characteristic value storehouse corresponding to the system identical Chinese character of different model printer used.I.e. using the most frequently used standard Chinese character as object, word Body is respectively the conventional Song typeface, imitation Song-Dynasty-Style typeface, regular script, black matrix, lishu and children's circle, and font size is one to No. six.Choose simplified HU squares Characteristic value, for Chinese character to be identified, by the way of two grades encode:Obtain first the total pixel number of Chinese character image, number of hits this A little statistical informations, are matched with feature Chinese character base successively, complete the coarse fractions class to Chinese character base, as a result recognized as next step With object;Secondly, on the basis of previous step, HU moment characteristics values, this Chinese character of deeper sign, until the spy characterized are utilized Reference breath can uniquely identify the Chinese character in image.
(2) classifier design:Classifier design is the type for differentiating printer for realization, that is, passes through word to be checked and standard The contrast of character library characteristic value, realizes the discriminating of document printer type.But under many factors restriction, handling big character at present When collecting identification problem, minimum distance classifier is often still selected.Using thick, disaggregated classification the strategy based on Confidence Analysis come Complete the judgement of Chinese character generic to be identified.
(2a) rough sort:The purpose of rough sort is that quick in a big character set to select a number comparatively small Subset of candidate words, and ensure that the probability comprising correct classification belonging to character to be identified in Candidate Set is as big as possible.This is required slightly Grader is simple in construction, arithmetic speed is fast.Therefore, we devise a kind of euclidean distance classifier, if MiIt is font to be identified I-th of HU moment characteristics value,It is i-th of standard HU moment characteristics average of kth kind font, when meeting following condition, waits to know Malapropism body is considered as kth0Font is planted, wherein G is font classification number.
(2b) disaggregated classification:Bayes classifier is statistical sorter optimal in theory, when handling practical problem, people Wish to go to approach it as far as possible.When the feature in character is the equal condition of the prior probability of Gaussian Profile and all kinds of feature distributions Under, Bayes classifier is reduced to mahalanobis distance grader.But the condition is generally difficult to meet in practice, and mahalanobis distance The performance of grader serious deterioration with the generation of estimation error of the covarianee matrix.We are using the secondary Discrimination Functions of amendment MQDF is measured as disaggregated classification, and it is a deformation of mahalanobis distance, and its functional form is:
Wherein λijAnd φijThe respectively ith feature value and characteristic vector of the covariance matrix of jth class sample, K is represented The principal subspace dimension of the number of the main eigenvector intercepted, i.e. pattern class, its optimal value is by experiment determination, h2It is to small capital The experiment estimation of value indicative.What MQDF was produced is second judgement curved surface, because need to only estimate before each classification covariance matrix K main Levy vector, it is to avoid the negative effect of small characteristic value evaluated error.MQDF differentiates that distance can be regarded as in d dimension principal subspaces Mahalanobis distance and the Euclidean distance in remaining (d-k) dimension space weighted sum, weighted factor is 1/h2
(2c) confidence calculations:If the output Candidate Set of rough sort device is { (c1, d1), (c2, d2) ... (cn, dn), n is Candidate Set capacity, cnAnd dnRespectively candidate characters and corresponding rough sort distance.
The effect of disaggregated classification device is that rough sort Candidate Set is sorted again according to the distance recalculated, finds input word Most probable classification belonging to symbol.If rough sort result can property it is sufficiently high, in other words, if c1It is the correct classification of input character When, then disaggregated classification need not be carried out completely.Confidence level f according to rough segmentation resultconSize decide whether to be finely divided Class, using the distance of output as measurement, according to lower calculating confidence level:
fcon=(d2-d1)/d1 (7)
When confidence level is less than certain threshold value, rough sort Candidate Set is sent into disaggregated classification device and handled, otherwise directly export thick Classification results.
The beneficial effect of the invention is:A kind of word graph to commonly use character library discriminating printing type is proposed in the present invention As computer knowledge method for distinguishing, theory is clear and definite, process is brief, easy to operate, quick and precisely, is a kind of effective and feasible method. Although the scope of the invention is limited only to printer, this method extends to the fields such as facsimile machine, duplicator, and application prospect is wide It is wealthy.In addition, also there is the text locating features, these text locating features such as ink powder accumulation edge roughness splash in mimeograph documents Method is not limited by printable character content, and application is wider, therefore to improve discriminating accuracy rate, should also be using more Text locating feature, integrates various features to improve the accuracy rate of mimeograph documents discriminating.The invention integrates multiple technologies and useful Information, is that public safety and material evidence evaluation department provide the computer printout inspection of document automated to improve discriminating accuracy rate System.
Brief description of the drawings
Fig. 1 is printer Intelligent detecting structured flowchart in the embodiment of the present invention.
Fig. 2 is using stroke transverse crossing to obtain characteristic value schematic diagram in the embodiment of the present invention.
Fig. 3 is using stroke to traverse longitudinally through acquisition characteristic value schematic diagram in the embodiment of the present invention
Embodiment
The embodiment to the present invention is described below in conjunction with the accompanying drawings, to be better understood from the present invention.
Embodiment
Printer Intelligent detecting structured flowchart in the present embodiment is as shown in figure 1, key step includes:Extract different model Sample print machine Chinese character image feature, by learning process, sample training is turned into the different model used in system and printed Characteristic value storehouse corresponding to the identical Chinese character of machine, i.e. stroke feature and the HU moment characteristics values simplified, that is, obtain total picture of Chinese character image Prime number, number of hits, are matched with feature Chinese character base successively, are completed the coarse fractions class to Chinese character base, are as a result recognized as next step Match object;Then, on the basis of previous step, using HU moment characteristics values, this Chinese character of deeper sign, until characterize Characteristic information can uniquely identify the Chinese character in image.
Its specific steps includes:(1) feature extraction of font is set up with character library;(2) classifier design.Wherein:
(1) feature extraction of font is set up with character library:The extraction of Chinese character image feature, is exactly the spy according to Chinese character image Point, its code word can be represented by working out, this code word one chinese character of correspondence, it is obvious that the code word of different Chinese character image It must differ, the character that code word is represented has uniqueness.Then, learn by training, using identical character representation method, Set up the Chinese character base for belonging to this type feature.
(1a) extracts stroke feature sequence:From the signature analysis to Chinese character, stroke direction clue is comprehensively, accurately, stably The composition information of Chinese character is reflected, by counting the stroke feature of Chinese character file and picture in file to be tested, realizes and distinguishes different Chinese character style, and thus carry out type of printer of the interpretation belonging to it, implement step as follows:
1st step, eight regions are equally divided into by Chinese character image, and according to from left to right, order from top to bottom is counted successively Black pixel (pixel that i.e. value is 1) in each region, so, can be obtained according to the black pixel count in eight regions Eight characteristic values.
2nd step, acquisition characteristic value is passed through using stroke, is passed through twice with longitudinal direction from laterally passing through twice, i.e., in transverse direction 1/3 and 2/3 at pass through respectively, record through stain number, similarly, in longitudinal direction method according to this, so, four can be obtained again Characteristic value.As shown in Figure 2 and Figure 3.
All black pixel numbers in 3rd step, statistical picture, so, just obtain a characteristic value, add the eight of the first step again Individual characteristic value, four characteristic values of second step, just have 13 characteristic values altogether.
(1b) extract strokes sequence away from feature:Pass through the extraction of stroke feature sequence, it has been found that the sequence is in fact N number of independent random same distribution variable can be counted as.So, on the basis of strokes sequence is obtained, it is possible to by carrying The moment characteristics of image are taken to realize identification.It is a kind of important side in pattern-recognition to carry out image recognition using moment invariants Method.Square is used for the distribution for characterizing random quantity in statistics, and the spatial distribution for representing material is used in mechanics.If binary map Picture or gray level image regard two-dimentional density fonction as, it is possible to which square technology is applied in graphical analysis.So, square can For describing the feature of piece image, and it is extracted as the feature similar with mechanics to statistics.In recent years, by two and three dimensions The attention of the invariant feature of square value striked by image caused image circle personage.The square type of skill is a lot, has been employed In image classification and many aspects of identifying processing.For each stroke feature sequence, intrinsic dimensionality and calculating speed are being considered On the basis of, we extract the first order and second order moments of discrete HU squares as characteristic value.To image in actual application Processing often uses discrete function, therefore the research of bending moment is not more meaningful in the discrete case, if f (x, y) is certain X-Y scheme Transform, then its (p+q) rank moment of the orign be defined as:
WhereinThe as centre of moment coordinate in region.Normalized central moment is expressed as η simultaneouslypq, it is defined as:
Wherein Y=(p+q)/2p+q=2,3 ....
Following 7 invariant moments group can be exported using second order and three ranks normalization central moment, the exponent number of central moment is bigger, and institute is anti- The shape details reflected are more, but simultaneously more sensitive to noise and computationally intensive, and there was only M in the discrete case1Still have There is rotational invariance, can prove that other six squares in not bending moment also have rotational invariance.In the present embodiment, from calculating The smaller invariant M of amount1, M2, M3, M4.The not bending moment of image has consistency when affine transformation occurs for image, that is, works as figure As in rotation, translation, the uniformly conversion such as flexible, the value of its square will not change, and M1, M2, M3, M4Amount of calculation is not Too big, it is suitable to select it as the constant parameter of identification target.Choose φ1=M1, φ2=M2, φ3=M3, φ4=M4Make For preceding 4 characteristic quantities.
The foundation of (1c) standard character library:
The feature of the sample print machine Chinese character image of different model is extracted, by learning process, sample training is turned into is Characteristic value storehouse corresponding to the system identical Chinese character of different model printer used.I.e. using the most frequently used standard Chinese character as object, word Body is respectively the conventional Song typeface, imitation Song-Dynasty-Style typeface, regular script, black matrix, lishu and children's circle, and font size is one to No. six.Choose simplified HU squares Characteristic value, for Chinese character to be identified, by the way of two grades encode:Obtain first the total pixel number of Chinese character image, number of hits this A little statistical informations, are matched with feature Chinese character base successively, complete the coarse fractions class to Chinese character base, as a result recognized as next step With object;Secondly, on the basis of previous step, HU moment characteristics values, this Chinese character of deeper sign, until the spy characterized are utilized Reference breath can uniquely identify the Chinese character in image.
(2) classifier design:
Classifier design is the type for differentiating printer for realization in the present embodiment, that is, passes through word to be checked and standard character library The contrast of characteristic value, realizes the discriminating of document printer type.But under many factors restriction, know at present in processing large character set During other problem, minimum distance classifier is often still selected.The present embodiment uses thick, disaggregated classification the plan based on Confidence Analysis Judgement slightly to complete Chinese character generic to be identified.
(2a) rough sort:The purpose of rough sort is that quick in a big character set to select a number comparatively small Subset of candidate words, and ensure that the probability comprising correct classification belonging to character to be identified in Candidate Set is as big as possible.This is required slightly Grader is simple in construction, arithmetic speed is fast.Therefore, we devise a kind of euclidean distance classifier, if MiIt is font to be identified I-th of HU moment characteristics value,It is i-th of standard HU moment characteristics average of kth kind font, when meeting following condition, waits to know Malapropism body is considered as kth0Font is planted, wherein G is font classification number.
(2b) disaggregated classification:Bayes classifier is statistical sorter optimal in theory, when handling practical problem, people Wish to go to approach it as far as possible.When the feature in character is the equal condition of the prior probability of Gaussian Profile and all kinds of feature distributions Under, Bayes classifier is reduced to mahalanobis distance grader.But the condition is generally difficult to meet in practice, and mahalanobis distance The performance of grader serious deterioration with the generation of estimation error of the covarianee matrix.We are using the secondary Discrimination Functions of amendment MQDF is measured as disaggregated classification, and it is a deformation of mahalanobis distance, and its functional form is:
Wherein λijAnd φijThe respectively ith feature value and characteristic vector of the covariance matrix of jth class sample, K is represented The principal subspace dimension of the number of the main eigenvector intercepted, i.e. pattern class, its optimal value is by experiment determination, h2It is to small capital The experiment estimation of value indicative.What MQDF was produced is second judgement curved surface, because need to only estimate before each classification covariance matrix K main Levy vector, it is to avoid the negative effect of small characteristic value evaluated error.MQDF differentiates that distance can be regarded as in d dimension principal subspaces Mahalanobis distance and the Euclidean distance in remaining (d-k) dimension space weighted sum, weighted factor is 1/h2
(2c) confidence calculations:If the output Candidate Set of rough sort device is { (c1, d1), (c2, d2) ... (cn, dn), n is Candidate Set capacity, cnAnd dnRespectively candidate characters and corresponding rough sort distance.
The effect of disaggregated classification device is that rough sort Candidate Set is sorted again according to the distance recalculated, finds input word Most probable classification belonging to symbol.If rough sort result can property it is sufficiently high, in other words, if c1It is the correct classification of input character When, then disaggregated classification need not be carried out completely.Confidence level f according to rough segmentation resultconSize decide whether to be finely divided Class, using the distance of output as measurement, according to lower calculating confidence level:
fcon=(d2-d1)/d1 (7)
When confidence level is less than certain threshold value, rough sort Candidate Set is sent into disaggregated classification device and handled, otherwise directly export thick Classification results.
The present embodiment effect is described:
In order to verify the validity of context of methods, we have chosen 6 kinds of conventional Chinese character styles, the i.e. Song typeface, regular script, black Body, imitation Song-Dynasty-style typeface, lishu and children's circle.Every kind of font divides 4 kinds of fonts, i.e. standard body, runic, italic and bold Italic again, altogether 24 kinds of words Body.Training and test sample are divided into the file and picture that the file and picture and scanned instrument of two classes, i.e. computer generation are obtained.Calculate The file and picture of machine generation is generated by Photoshop cs4.0, and image resolution ratio is 72pixels/inch, grayscale mode.Scanning Image is obtained by HP scanner scannings, and scanning resolution is 96dpi, grayscale mode.Every kind of font is used for the sample trained and tested This tricks is shown in Table 1, often covers sample and includes 3755 chinese characters of national standard one-level.We are respectively with two graders to chinese character Font be identified, sequence length is respectively 30000 and 50000, the results are shown in Table 2, wherein row represent sample true classification, The recognition result of list sample sheet.
The sample tricks that table 1. is trained and tested
The Song typeface Regular script Black matrix Imitation Song-Dynasty-style typeface Lishu Children's circle
Training 180 60 100 50 70 90
Test 20 35 40 20 10 30
The mimeograph documents source printer identification result of table 2.
In the present embodiment, the method proposed has certain advantage in overall recognition result.This paper experiment simultaneously is all It is the average statistics feature based on stroke feature random distribution, sequence length is longer, and average statistics characteristic is better.But when sequence is long When degree reaches certain value, discrimination conversion is little.In addition, the effect of euclidean distance classifier is poorer than the effect of MQDF grader A lot.Reason is that the mode standard of its every class has simply read the average of such sample, without utilizing the letter such as variance Breath.MDQF graders to blocking for small characteristic value then by having highlighted main information, so that increase rate discrimination.
Described above is the preferred embodiment of the present invention, it is noted that for those skilled in the art For, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications are also considered as Protection scope of the present invention.

Claims (1)

1. the mimeograph documents discrimination method analyzed based on printing character library, it is characterised in that:Step includes:Extract the sample of different model Sample training, by learning process, is turned into the different model printer phase used in system by the feature of this printer Chinese character image With the characteristic value storehouse corresponding to Chinese character, i.e. stroke feature and simplify HU moment characteristics values, that is, obtain Chinese character image total pixel number, These statistical informations of number of hits, are matched with feature Chinese character base successively, the coarse fractions class to Chinese character base are completed, as a result as next step The matching object of identification;Then, on the basis of previous step, using HU moment characteristics values, this Chinese character of deeper sign, until The characteristic information of sign can uniquely identify the Chinese character in image;
Its detailed process is:
(1) feature extraction of font is set up with character library:The extraction of Chinese character image feature, is exactly according to the characteristics of Chinese character image, to grind Study carefully and can represent its code word, this code word one chinese character of correspondence learns by training, using identical character representation Method, sets up the Chinese character base for belonging to this type feature, and step is:
(1a) extracts stroke feature sequence:From the signature analysis to Chinese character, stroke direction clue comprehensively, accurately, stably reflects The composition information of Chinese character, by counting the stroke feature of Chinese character file and picture in file to be tested, realizes and distinguishes the different Chinese Word font, and thus carry out type of printer of the interpretation belonging to it, implement step as follows:
1st step, eight regions are equally divided into by Chinese character image, and according to from left to right, order from top to bottom counts each successively Black pixel in individual region, i.e. value are 1 pixel, so, eight can be obtained according to the black pixel count in eight regions Characteristic value;
2nd step, acquisition characteristic value is passed through using stroke, is passed through twice with longitudinal direction from laterally passing through twice, i.e., horizontal 1/3 With 2/3 at pass through respectively, record through stain number, similarly, in longitudinal direction method according to this, so, four features can be obtained again Value;
All black pixel numbers in 3rd step, statistical picture, so, just obtain a characteristic value, add eight spies of the first step again Value indicative, four characteristic values of second step, just have 13 characteristic values altogether;
(1b) extract strokes sequence away from feature:For each stroke feature sequence, intrinsic dimensionality and calculating speed are being considered On the basis of, the first order and second order moments for extracting discrete HU squares are used as characteristic value:Processing to image uses discrete function, if f (x, Y) it is certain two dimensional image function, then its (p+q) rank moment of the orign is defined as:
<mrow> <msub> <mi>m</mi> <mrow> <mi>p</mi> <mi>q</mi> </mrow> </msub> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mi>x</mi> <mi>p</mi> </msup> <msup> <mi>y</mi> <mi>q</mi> </msup> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
<mrow> <msub> <mi>&amp;mu;</mi> <mrow> <mi>p</mi> <mi>q</mi> </mrow> </msub> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mrow> <mo>(</mo> <mi>x</mi> <mo>-</mo> <mover> <mi>x</mi> <mo>&amp;OverBar;</mo> </mover> <mo>)</mo> </mrow> <mi>p</mi> </msup> <msup> <mrow> <mo>(</mo> <mi>y</mi> <mo>-</mo> <mover> <mi>y</mi> <mo>&amp;OverBar;</mo> </mover> <mo>)</mo> </mrow> <mi>q</mi> </msup> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
WhereinThe as centre of moment coordinate in region;Normalized central moment is expressed as η simultaneouslypq, it is defined as:
<mrow> <msub> <mi>&amp;eta;</mi> <mrow> <mi>p</mi> <mi>q</mi> </mrow> </msub> <mo>=</mo> <msub> <mi>&amp;mu;</mi> <mrow> <mi>p</mi> <mi>q</mi> </mrow> </msub> <mo>/</mo> <msubsup> <mi>&amp;mu;</mi> <mn>00</mn> <mi>Y</mi> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
Wherein Y=(p+q)/2;
7 invariant moments group can be exported using second order and three ranks normalization central moment, the exponent number of central moment is bigger, the shape reflected Details is more, but simultaneously more sensitive to noise and computationally intensive, and there was only M in the discrete case1Still there is rotation not Denaturation;From invariant M1, M2, M3, M4;The not bending moment of image has consistency when affine transformation occurs for image, that is, works as image In rotation, translation, uniform stretching, the value of its square will not change, and M1, M2, M3, M4Amount of calculation be not it is too big, It is suitable that it, which is selected, as the constant parameter of identification target, chooses φ1=M1, φ2=M2, φ3=M3, φ4=M4It is used as first 4 Characteristic quantity:
<mrow> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>M</mi> <mn>1</mn> </msub> <mo>=</mo> <msub> <mi>&amp;eta;</mi> <mn>20</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>02</mn> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>M</mi> <mn>2</mn> </msub> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>20</mn> </msub> <mo>-</mo> <msub> <mi>&amp;eta;</mi> <mn>02</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <mn>4</mn> <msubsup> <mi>&amp;eta;</mi> <mn>11</mn> <mn>2</mn> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>M</mi> <mn>3</mn> </msub> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>30</mn> </msub> <mo>-</mo> <mn>3</mn> <msub> <mi>&amp;eta;</mi> <mn>12</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <mn>3</mn> <msub> <mi>&amp;eta;</mi> <mn>31</mn> </msub> <mo>-</mo> <msub> <mi>&amp;eta;</mi> <mn>03</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>M</mi> <mn>4</mn> </msub> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>30</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>12</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>21</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>03</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>M</mi> <mn>5</mn> </msub> <mo>=</mo> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>30</mn> </msub> <mo>-</mo> <mn>3</mn> <msub> <mi>&amp;eta;</mi> <mn>12</mn> </msub> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>30</mn> </msub> <mo>-</mo> <msub> <mi>&amp;eta;</mi> <mn>12</mn> </msub> <mo>)</mo> </mrow> <mo>&amp;lsqb;</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>30</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>12</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mn>3</mn> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>21</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>03</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&amp;rsqb;</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>3</mn> <msub> <mi>&amp;eta;</mi> <mn>21</mn> </msub> <mo>-</mo> <msub> <mi>&amp;eta;</mi> <mn>03</mn> </msub> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>21</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>03</mn> </msub> <mo>)</mo> </mrow> <mo>&amp;lsqb;</mo> <mn>3</mn> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>30</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>12</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>21</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>03</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&amp;rsqb;</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>M</mi> <mn>6</mn> </msub> <mo>=</mo> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>20</mn> </msub> <mo>-</mo> <msub> <mi>&amp;eta;</mi> <mn>02</mn> </msub> <mo>)</mo> </mrow> <mo>&amp;lsqb;</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>30</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>12</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>21</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>03</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&amp;rsqb;</mo> <mo>+</mo> <mn>4</mn> <msub> <mi>&amp;eta;</mi> <mn>11</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>30</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>12</mn> </msub> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>21</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>03</mn> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>M</mi> <mn>7</mn> </msub> <mo>=</mo> <mrow> <mo>(</mo> <mn>3</mn> <msub> <mi>&amp;eta;</mi> <mn>12</mn> </msub> <mo>-</mo> <msub> <mi>&amp;eta;</mi> <mn>30</mn> </msub> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>30</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>12</mn> </msub> <mo>)</mo> </mrow> <mo>&amp;lsqb;</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>30</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>12</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mn>3</mn> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>21</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>03</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&amp;rsqb;</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>(</mo> <mn>3</mn> <msub> <mi>&amp;eta;</mi> <mn>21</mn> </msub> <mo>-</mo> <msub> <mi>&amp;eta;</mi> <mn>03</mn> </msub> <mo>)</mo> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>21</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>03</mn> </msub> <mo>)</mo> <mo>&amp;lsqb;</mo> <mn>3</mn> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>03</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>12</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>&amp;eta;</mi> <mn>12</mn> </msub> <mo>+</mo> <msub> <mi>&amp;eta;</mi> <mn>30</mn> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&amp;rsqb;</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
The foundation of (1c) standard character library:The feature of the sample print machine Chinese character image of different model is extracted, by learning process, Sample training turns into the characteristic value storehouse corresponding to the identical Chinese character of different model printer used in system, i.e., with the most frequently used standard Chinese character is object, and font is respectively the conventional Song typeface, imitation Song-Dynasty-Style typeface, regular script, black matrix, lishu and children's circle, and font size is one to No. six, choosing Simplified HU moment characteristics values are taken, for Chinese character to be identified, by the way of two grades encode:Total pixel of Chinese character image is obtained first Number, number of hits these statistical informations, matchs with feature Chinese character base successively, and as a result completion is used as down to the coarse fractions class of Chinese character base The matching object of one step identification;Secondly, on the basis of previous step, using HU moment characteristics values, this Chinese character of deeper sign, Until the characteristic information characterized can uniquely identify the Chinese character in image;
(2) classifier design:By the contrast of word to be checked and standard word Al Kut value indicative, the mirror of document printer type is realized Not;Under many factors restriction, when handling large character set identification problem, minimum distance classifier is selected;Using based on confidence Thick, disaggregated classification two-stage classification the strategy of analysis is spent to complete the judgement of Chinese character generic to be identified:
(2a) rough sort:A kind of euclidean distance classifier is designed, if MiIt is i-th of HU moment characteristics value of font to be identified,It is I-th of standard HU moment characteristics average of kth kind font, when meeting following condition, font to be identified is considered as kth0Plant word Body, wherein G are font classification number;
<mrow> <msub> <mi>k</mi> <mn>0</mn> </msub> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi>min</mi> </mrow> <mrow> <mn>1</mn> <mo>&amp;le;</mo> <mi>k</mi> <mo>&amp;le;</mo> <mi>G</mi> </mrow> </munder> <mo>{</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mn>4</mn> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>M</mi> <mi>i</mi> </msub> <mo>-</mo> <msubsup> <mi>M</mi> <mi>i</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>}</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>
(2b) disaggregated classification:Measured using secondary Discrimination Functions MQDF is corrected as disaggregated classification, it is a deformation of mahalanobis distance, Its functional form is:
<mrow> <msub> <mi>g</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <msup> <mi>h</mi> <mn>2</mn> </msup> </mfrac> <mo>{</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>d</mi> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>m</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <msup> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mfrac> <msup> <mi>h</mi> <mn>2</mn> </msup> <msub> <mi>&amp;lambda;</mi> <mrow> <mi>j</mi> <mi>j</mi> </mrow> </msub> </mfrac> <mo>)</mo> </mrow> <mn>2</mn> </msup> <msup> <mrow> <mo>&amp;lsqb;</mo> <msup> <mrow> <mo>(</mo> <mi>X</mi> <mo>-</mo> <msub> <mi>M</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msub> <mi>&amp;phi;</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>&amp;rsqb;</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <mi>l</mi> <mi>n</mi> <mrow> <mo>(</mo> <msup> <mi>h</mi> <mrow> <mn>2</mn> <mrow> <mo>(</mo> <mi>d</mi> <mo>-</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> </msup> <munderover> <mo>&amp;Pi;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>K</mi> </munderover> <msub> <mi>&amp;lambda;</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>}</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>
Wherein λijAnd φijThe respectively ith feature value and characteristic vector of the covariance matrix of jth class sample, K represents to be intercepted Main eigenvector number, i.e. pattern class principal subspace dimension, its optimal value by experiment determine, h2It is to small characteristic value Experiment estimation;What MQDF was produced is second judgement curved surface, because need to estimate before each classification covariance matrix K master it is intrinsic to Amount, it is to avoid the negative effect of small characteristic value evaluated error;MQDF differentiate distance regard as d tie up principal subspace in geneva away from From the weighted sum with the Euclidean distance in remaining (d-K) dimension space, weighted factor is 1/h2
(2c) confidence calculations:If the output Candidate Set of rough sort device is { (c1, d1), (c2, d2) ... (cn, dn), n is candidate Collect capacity, dnAnd dnRespectively candidate characters and corresponding rough sort distance;If c1When for the correct classification of input character, then Disaggregated classification need not be carried out;Confidence level f according to rough segmentation resultconSize decide whether to be finely divided class, using output Distance as measurement, according to lower calculating confidence level:
fcon=(d2-d1)/d1 (7)
When confidence level is less than set threshold value, rough sort Candidate Set is sent into disaggregated classification device and handled, rough sort knot is otherwise directly exported Really.
CN201310538041.1A 2013-10-29 2013-10-29 The mimeograph documents discrimination method analyzed based on printing character library Active CN103810484B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310538041.1A CN103810484B (en) 2013-10-29 2013-10-29 The mimeograph documents discrimination method analyzed based on printing character library

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310538041.1A CN103810484B (en) 2013-10-29 2013-10-29 The mimeograph documents discrimination method analyzed based on printing character library

Publications (2)

Publication Number Publication Date
CN103810484A CN103810484A (en) 2014-05-21
CN103810484B true CN103810484B (en) 2017-10-10

Family

ID=50707225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310538041.1A Active CN103810484B (en) 2013-10-29 2013-10-29 The mimeograph documents discrimination method analyzed based on printing character library

Country Status (1)

Country Link
CN (1) CN103810484B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104468090B (en) * 2014-11-12 2017-07-28 辽宁大学 Character cipher coding method based on image pixel coordinates
US9613263B2 (en) * 2015-02-27 2017-04-04 Lenovo (Singapore) Pte. Ltd. Ink stroke grouping based on stroke attributes
CN104965928B (en) * 2015-07-24 2019-01-22 北京航空航天大学 One kind being based on the matched Chinese character image search method of shape
CN105825211B (en) * 2016-03-17 2019-05-31 世纪龙信息网络有限责任公司 Business card identification method, apparatus and system
CN106530317B (en) * 2016-09-23 2019-05-24 南京凡豆信息科技有限公司 A kind of scoring of simple picture computer and auxiliary painting methods
CN111027345A (en) * 2018-10-09 2020-04-17 北京金山办公软件股份有限公司 Font identification method and apparatus
CN110781727B (en) * 2019-09-12 2022-06-17 中国刑事警察学院 Laser printing file quantitative inspection method based on image physical measurement indexes
CN110837326B (en) * 2019-10-24 2021-08-10 浙江大学 Three-dimensional target selection method based on object attribute progressive expression
CN111507332A (en) * 2020-04-17 2020-08-07 上海眼控科技股份有限公司 Vehicle VIN code detection method and equipment
CN113761231B (en) * 2021-09-07 2022-07-12 浙江传媒学院 Text character feature-based text data attribution description and generation method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1110801A (en) * 1994-09-27 1995-10-25 张志国 Processing system for script characters
CN1210296A (en) * 1997-08-29 1999-03-10 王伟 Korean language database under UCDOS platform and its inputting method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1110801A (en) * 1994-09-27 1995-10-25 张志国 Processing system for script characters
CN1210296A (en) * 1997-08-29 1999-03-10 王伟 Korean language database under UCDOS platform and its inputting method

Also Published As

Publication number Publication date
CN103810484A (en) 2014-05-21

Similar Documents

Publication Publication Date Title
CN103810484B (en) The mimeograph documents discrimination method analyzed based on printing character library
US8442319B2 (en) System and method for classifying connected groups of foreground pixels in scanned document images according to the type of marking
Srihari et al. Establishing handwriting individuality using pattern recognition techniques
Burrow Arabic handwriting recognition
Fazilov et al. State of the art of writer identification
CN102982343B (en) Handwritten number recognition and incremental type obscure support vector machine method
Bennour et al. Handwriting based writer recognition using implicit shape codebook
CN102254196A (en) Method for identifying handwritten Chinese character by virtue of computer
Biswas et al. Writer identification of Bangla handwritings by radon transform projection profile
CN101587540B (en) Printer verification method for detecting document source by means of geometric distortion of page document
Tang Document analysis and recognition with wavelet and fractal theories
Srihari et al. Biometric and forensic aspects of digital document processing
CN106650696A (en) Handwritten electrical element identification method based on singular value decomposition
CN108921006B (en) Method for establishing handwritten signature image authenticity identification model and authenticity identification method
Zemouri et al. Machine printed handwritten text discrimination using Radon transform and SVM classifier
Susanto et al. Histogram of gradient in k-nearest neighbor for Javanese alphabet classification
Agarwal et al. Detection of courtesy amount block on bank checks
CN105844299A (en) Image classification method based on bag of words
Zhang et al. Style comparisons in calligraphy
Rajput et al. Printed and handwritten kannada numeral recognition using crack codes and fourier descriptors plate
Halder et al. Individuality of isolated Bangla numerals
Padma et al. Script Identification from Trilingual Documents using Profile Based Features.
Suwanwiwat et al. Off-line handwritten Thai name recognition for student identification in an automated assessment system
Al-Jamimi et al. Arabic character recognition using gabor filters
Zaghloul et al. Recognition of Hindi (Arabic) handwritten numerals

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant