CN106503706B - The method of discrimination of Chinese character pattern cutting result correctness - Google Patents
The method of discrimination of Chinese character pattern cutting result correctness Download PDFInfo
- Publication number
- CN106503706B CN106503706B CN201610847230.0A CN201610847230A CN106503706B CN 106503706 B CN106503706 B CN 106503706B CN 201610847230 A CN201610847230 A CN 201610847230A CN 106503706 B CN106503706 B CN 106503706B
- Authority
- CN
- China
- Prior art keywords
- font
- stroke
- component
- value
- cutting result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Character Discrimination (AREA)
Abstract
Correctness, which is carried out, the invention discloses a kind of result for the font cutting algorithm sentences method for distinguishing, belong to Chinese-character stroke and component automatically extracts field, this method successively includes differentiation process, the differentiation process based on part classification, the differentiation process based on font attribute and the differentiation process based on glyph skeleton rebuild based on font, when the judgement result that any differentiation process is differentiated for font cutting result to be discriminated is wrong cutting result, the font cutting result to be discriminated is determined as font miscut.Provided font cutting result method of discrimination through the invention can recognize that 97% or more wrong cutting result.Therefore, the present invention can effectively differentiate the case where miscut.
Description
Technical field
The invention belongs to Chinese-character strokes and component to automatically extract field, be related to a kind of differentiation side of Chinese character pattern cutting result
Method, and in particular to a kind of to utilize four kinds of font reconstruction, font attribute, part classification and glyph skeleton distinguished numbers to a kind of font
The correctness for the cutting result that cutting algorithm obtains carries out sentencing method for distinguishing.
Background technique
The font cutting technique of Chinese character includes automatically extracting and automatically extracting to Hanzi component to Chinese-character stroke.Chinese character
Font cutting technique is initially as the pre-treatment step in optical character recognition process, the result aided Chinese characters cut using font
The identification of character.With the continuous development of font computing technique, Chinese character pattern cutting technique becomes Hanzi font library and automatically generates skill
Art, handwriting verification, Chinese character auxiliary are write, and the core technology in the research topics such as digital ink, correlative study also occurs vigorously
Development.
(Sun, Hao, Zhouhui Lian, Yingmin Tang, and Jianguo the Xiao. " Non- of document Sun 2014
rigid point set registration for Chinese characters using structure-guided
coherent point drift."2014IEEE International Conference on Image Processing
(ICIP), it describes and a kind of is focused based on the non-rigid point for using structural information to instruct in pp.4752-4756.IEEE, 2014.)
The font cutting method of volume algorithm, this method are always divided into four steps to the cutting of font.The first step, to font to be cut and regular script
Skeletal extraction is carried out to its corresponding template font with component information in database, obtain two skeletal extractions as a result,
We are referred to as data point set and template point set.Second step returns the component of data point set, template point set and template point set
The result of input of the category relationship as the non-rigid point set registration algorithm based on structural information, algorithm output is data point set
Component attaching relation.The attaching relation of data point set is converted into the attaching relation of data profile section by third step.Each portion at this time
It is inc that the corresponding contour segment of part is likely to be interruption.4th step, the contour segment generated to previous step are correctly closed,
To obtain complete component result.
The sub- font that font cutting technique obtains after cutting to font is known as " component ".Due to the complexity of Chinese character pattern,
The influence of the factors such as the accuracy rate of uncertainty and algorithm when people write, the cutting result that font cutting method obtains
It is not always completely correct.For example, to the cutting result that the above-mentioned Sun 2014 font cutting method recorded obtains, the prior art
It is difficult to realize carry out obtained cutting result the differentiation of correctness.Therefore, lack a kind of pair of font cutting method at present to obtain
Cutting result carry out correctness differentiation technology.
Summary of the invention
In order to overcome the above-mentioned deficiencies of the prior art, the present invention provide it is a kind of for Chinese character pattern cutting algorithm result just
The method of discrimination of true property for the font cutting algorithm proposed in Sun 2014, it can be achieved that exist that mistake is cut as a result, carrying out
The differentiation of correctness.
For convenience of description, the present invention arranges following term definition:
Component: the result obtained after being cut to font.
Average stroke width: with the numbers of all black pixel points of Chinese character image divided by of Chinese character image marginal point
Number.
Present invention provide the technical scheme that
A kind of method of discrimination of Chinese character pattern cutting result correctness, successively include based on font rebuild differentiation process,
Differentiation process based on part classification, the differentiation process based on font attribute and the differentiation process based on glyph skeleton;Specific packet
It includes:
1) the differentiation process rebuild based on font:
It cuts each Chinese character pattern to obtain component, be spliced again according to original position, to obtained splicing font
The comparison of pixel scale is carried out with former font, statistics obtains difference pixel value;Difference pixel value threshold value is set, further according to setting
Difference pixel value threshold value is differentiated, the font is determined as wrong cutting result or correct cutting result;
2) the differentiation process based on part classification:
Firstly, one parts data collection being made of correct component of building, one portion of training on the parts data collection
Part classifier;Then, classified using the part classification device that training obtains to font cutting result to be discriminated, classified
Result be correct component or incorrect part;
3) the differentiation process based on font attribute:
The corresponding font attribute of correct component is set, when component does not meet corresponding font attribute, determines the component
It is the result that font miscut generates;
4) the differentiation process based on glyph skeleton:
The detection that the profile middle section of each stroke in Chinese character pattern is carried out to flatness, when there is profile in stroke middle section
When mutation, the font cutting result mistake is determined.
For the method for discrimination of above-mentioned Chinese character pattern cutting result correctness, further, rebuild described based on font
Differentiation process, the differentiation process based on part classification, the differentiation process based on font attribute and the differentiation based on glyph skeleton
It in the process, is mistake cutting for the judgement result that font cutting result to be discriminated is differentiated when any differentiation process
When as a result, the font cutting result to be discriminated is determined as font miscut.
For the method for discrimination of above-mentioned Chinese character pattern cutting result correctness, further, it is described based on font rebuild
During differentiation, the difference pixel value threshold value is set as square of stroke width.
For the method for discrimination of above-mentioned Chinese character pattern cutting result correctness, further, it is described based on font rebuild
Differentiation process specifically comprises the following steps:
11) according to the size of former font, a difference value matrix equal with former word size, the difference value matrix are generated
All elements be initialized as 0;
12) each component that traversal cutting obtains as a result, according to the position in original image, the image of component is corresponding
To one of difference value matrix and the equal sized region of image of component, then by the corresponding region of image of component and difference value matrix
Carry out the cumulative of pixel scale;The cumulative of all components is completed, difference value matrix is obtained;
13) an absent region matrix and an extraneous region matrix equal with former word size are generated respectively;It is described
The all elements of two matrixes are initialized as 0, while traversing each pixel of former font and the correspondence of the difference value matrix
Position pixel;Former font pixel value is set in two kinds of situation;The first situation be former font pixel value be 0, another situation is that
Former font pixel value is 1;For the first situation, when the difference value matrix respective pixel value is 1, by the extraneous region
The value of matrix corresponding position is set as 1, is otherwise set as 0;It, will when the difference value matrix respective pixel is 0 for second situation
The value of the absent region matrix corresponding position is set as 1, is otherwise set as 0;
14) connected region detection is carried out to the absent region matrix and extraneous region matrix respectively, obtains two matrixes
The number of pixels of all connected regions;Connected region pixel threshold is set, when the sum of all pixels of any one connected region is super
When crossing the connected region pixel threshold, which is determined as wrong cutting result, is otherwise determined as correct cutting result.
It is further, described based on part classification for the method for discrimination of above-mentioned Chinese character pattern cutting result correctness
During differentiation, the training of part classification device includes the following steps:
21) image preprocessing performs the following operations:
Image of component is zoomed in and out by the way of non-rigid scaling, each image of component is being normalized to one just
Rectangular image, square side length are denoted as L;
22) select Local Subgraphs picture, to local subgraph carry out local feature extraction, obtain multiple local features to
Amount, is denoted as num_lf for the number of local feature vectors;
23) dictionary constructs:
In obtained local feature vectors, stochastical sampling obtains plurality of local feature as total characteristic set;
The number value range of local feature should be greater than 10000;Maximum can be set to whole local feature vectors number num_lf;It adopts
Num_k cluster centre is obtained with K mean cluster algorithm, as sparse dictionary;The value range of num_k is 256 to whole offices
The number num_lf of portion's feature;
24) rarefaction representation:
According to sparse dictionary obtained in the previous step, carried out using the sparse coding algorithm local feature all to a component
Coding;Then all local features are combined using maximum value pond algorithm, obtaining a dimension is the sparse of num_k
Indicate feature, the quantity of dimension and the quantity of cluster centre are equal;
25) classifier training: the rarefaction representation feature is trained using linear SVM algorithm, obtains portion
Part classifier.
It is further, described based on part classification for the method for discrimination of above-mentioned Chinese character pattern cutting result correctness
During differentiation, the classification specifically calculate the component categories that the font cutting result to be discriminated is obtained using classifier and
Whether the component categories that should belong to are identical to differentiate;When the classification difference that the classification results and component should belong to, which sentences
Not Wei wrong cutting result, when the classification results are identical with the classification that component should belong to, which is determined as correctly cutting and tie
Fruit.
Further, the step 22) local shape factor, specifically performs the following operations:
22a. carries out uniform grid cutting to local subgraph, obtains multiple regions, is set as n*n;It sets a trap portion's subgraph
Side length be L_sub, obtain each region side length be L_sub divided by n;The n*n region is uniformly drawn by Local Subgraphs picture
Point, from each other without intersection, it is combined the Local Subgraphs picture constituted before cutting just;It is calculated in each region using Sobel
Son carries out convolution, obtains the result of amplitude and phase;
Uniform phase is divided into n*n section by 22b., and each interval statistics obtain phase in Local Subgraphs picture and fall in the area
Between pixel range value summation;Obtain the local feature of n*n dimension;
22c. splices the local feature of the respective dimensions in the region n*n, obtains the local feature of multidimensional.
It is further, described based on font attribute for the method for discrimination of above-mentioned Chinese character pattern cutting result correctness
During differentiation, the font attribute includes part dimension attribute and component area attribute.
It is further, described based on glyph skeleton for the method for discrimination of above-mentioned Chinese character pattern cutting result correctness
Specific step is as follows for differentiation process:
41) set in the component that a font cutting result obtains that stroke number is N, on each component outline
Point obtains N number of value, respectively represents minimum distance of the profile point apart from N number of stroke skeleton;
42) minimum distance of a profile point and all stroke skeletons is minimized, the stroke bone of minimum value will be got
The stroke that frame should belong to as profile point, using the minimum value got as the distance of profile point and stroke skeleton;
43) it constructs profile point set: setting N number of profile point set, be initialized as empty set, i-th of set represents i-th
The profile of picture;All profile points are traversed, when profile point belongs to i-th of stroke, this profile point is added to i-th of collection
It closes;It is the starting M% and end M% of stroke by the nearest stroke skeletal point of distance in each set after the building for completing set
Profile point removal, remaining profile point is exactly the profile point in stroke middle section;
44) profile point of each set is acquired to the mode of affiliated stroke skeleton distance, as being averaged for this section of stroke
Stroke width;When profile point has been more than K times of average stroke width to affiliated stroke skeleton distance, determine that the profile point is
It is mutated profile;When the quantity for being mutated profile point has been more than preset mutation profile point amount threshold, the component is sentenced
It Wei not the wrong component cut.
Further, the circular of minimum distance of the step 41) profile point apart from N number of stroke skeleton
It is: for a profile point, traverses N number of stroke skeleton, each stroke skeleton traverses all skeletal points and profile point is calculated
With the distance of these stroke skeletal points, take the minimum value of distance as the profile point apart from the nearest of the stroke skeleton currently traversed
Distance;The value range of the step 43) M is 0 to 50;The value range of the step 44) K is 0.8 to 3;The mutation wheel
Exterior feature point amount threshold is X times of average stroke width, and the value range of X is 0.7 to 3.
Compared with prior art, the beneficial effects of the present invention are:
The present invention, which provides a kind of result for the font cutting algorithm and carries out correctness, sentences method for distinguishing, and this method is successively
Differentiation process including being rebuild based on font, the differentiation process based on part classification, the differentiation process based on font attribute and base
In the differentiation process of glyph skeleton, when any differentiation process is directed to the judgement that font cutting result to be discriminated is differentiated
When being as a result wrong cutting result, the font cutting result to be discriminated is determined as font miscut.Through the invention
Provided font cutting result method of discrimination can recognize that 97% or more wrong cutting result.Therefore, the present invention can
Effective the case where differentiating miscut.
Detailed description of the invention
Fig. 1 is font cutting result example;
Wherein, (a) is the correct font cutting result that font cutting technique obtains;(b) it is obtained for font cutting technique
Mistake font cutting result.
Fig. 2 is the flow diagram of font cutting result correctness method of discrimination provided by the invention.
Fig. 3 is to carry out differentiation process screenshot to font cutting result correctness based on glyph skeleton in the embodiment of the present invention;
Wherein, (a) is image of component to be discriminated;It (b) is the glyph skeleton of the component;(c) for belong to the component the
The schematic diagram of the marginal point of five strokes " cross ";(d) the break edge point to determine.
Specific embodiment
With reference to the accompanying drawing, the present invention, the model of but do not limit the invention in any way are further described by embodiment
It encloses.
All components that former word image, the Chinese character of available Chinese character are cut after the operation of font cutting algorithm is completed
The stroke that classification belonging to image, component and component include.The present invention provide a kind of result for the font cutting algorithm into
Row correctness sentences method for distinguishing, and this method successively includes the differentiation process rebuild based on font, based on the differentiation of part classification
Journey, the differentiation process based on font attribute and the differentiation process based on glyph skeleton, Fig. 2 are font cutting knots provided by the invention
The flow diagram of fruit correctness method of discrimination, detailed process is as follows:
1) the differentiation process rebuild based on font
Differentiation process based on font reconstruction carries out weight by cutting a Chinese character pattern to obtain component, according to original position
New splicing, carries out the comparison of pixel scale to obtained splicing font and former font, statistics correspond to each other but pixel value but not
The number of same pixel, referred to as difference pixel value.Set difference pixel value threshold value, further according to setting difference pixel value threshold value into
Row differentiates.Threshold value can be set as square of stroke width.Following four step can be specifically divided into:
11) according to the size of former font, a difference value matrix equal with former word size, the difference value matrix are generated
All elements be initialized as 0.
12) each component that traversal cutting obtains as a result, according to the position in original image, the image of component is corresponding
To one and the equal sized region of image of component of difference value matrix, then by the correspondence area of image of component and difference value matrix
Domain carries out the cumulative of pixel scale.Complete all components it is cumulative after, obtain final difference value matrix.
13) according to the size of former font, an absent region matrix and a size equal with former word size are generated
Equal extraneous region matrix.The all elements of the two matrixes are initialized as 0.Each pixel of former font is traversed simultaneously,
And the corresponding position pixel of difference value matrix.Next in two kinds of situation, it is 0 that the first situation, which is former font pixel value, separately
A kind of situation is that former font pixel value is 1.In the case of the first, if difference value matrix respective pixel value is 1, by extra area
The value of domain matrix corresponding position is set as 1, is otherwise set as 0.It, will if difference value matrix respective pixel is 0 under second situation
The value of absent region matrix corresponding position is set as 1, is otherwise set as 0.
14) connected region detection is carried out to absent region matrix and extraneous region matrix respectively, it is all obtains two matrixes
Connected region number of pixels.Connected region pixel threshold is set, which is a nonnegative real number, according to the connection of setting
The font is determined as wrong cutting if the sum of all pixels of any one connected region is more than threshold value by area pixel threshold value
As a result, being otherwise determined as correct cutting result.
2) the differentiation process based on part classification
The purpose of distinguished number based on part classification is component class that font cutting result should belong to it in order to obtain
Other similarity degree, similarity degree are measured by the classification results of classifier, if the classification results of classifier are cut with the font
It is identical to cut the component categories that result should belong to, then it is assumed that similarity degree is high, and font cutting result is correct.If the result of classifier
It is different from the component categories that the font cutting result should belong to, then it is assumed that similarity degree is low, font cutting result mistake.Algorithm
Basic ideas are to construct a parts data collection being made of correct component first, and parts data concentration includes that component should belong to
Classification, then one part classification device of training on this parts data collection treated using the obtained part classification device of training
The font cutting result of differentiation is classified, the result classified.If the component that the classification results and parts data are concentrated
The classification that should belong to is different, then is determined as incorrect part.The training step of part classification device is as follows:
21) image preprocessing;
Image of component is zoomed in and out by the way of non-rigid scaling.Will each image of component be normalized to one
Square-shaped image, square side length are denoted as L, and the value standard of L is that image of component will not generate larger distortion, when practical application
It can choose the arbitrary integer between 64 to 256.A benefit using non-rigid scaling is exactly can be indirectly to component diagram
As being corrected.
22) local shape factor;
In local shape factor, the Local Subgraphs picture of this algorithms selection is square, and side length is four points of picture size
One of to any one value between 3/4ths ranges.L can be taken divided by 2 when practical application.On the whole, first to component
Image carries out the stochastical sampling of marginal point, 200 to 600 sampled points of each subassembly selection.It is extracted centered on each sampled point
Then Local Subgraphs picture carries out the extraction of local feature to local subgraph.The extracting mode of local feature is as follows:
22a. carries out the 4 uniform grid cuttings for multiplying 4 to local subgraph, obtains 16 regions.It sets a trap the side of portion's subgraph
A length of L_sub, then the side length in each region is that Local Subgraphs picture is evenly dividing by L_sub divided by 4,16 regions, from each other
Without intersection, it is combined and then constitutes the Local Subgraphs picture before cutting just.Then it is carried out in each region using Sobel operator
Convolution obtains the result of amplitude and phase;
Uniform phase is divided into 16 sections by 22b., and phase falls in the picture in the section in each interval statistics Local Subgraphs picture
The summation of the range value of vegetarian refreshments.The local feature of available one 16 dimension after all pixel is counted.
The 16 dimension local features in 16 regions are spliced to obtain the local feature of 256 dimensions by 22c..
23) dictionary constructs;
After concentrating all components to carry out local shape factors parts data, available num_lf local feature to
Amount obtains several local features as total characteristic set in wherein stochastical sampling, and the number value range of local feature should be big
In 10000, maximum can be set to whole local feature numbers.Then obtained in num_k cluster using K mean cluster algorithm
For the heart as sparse dictionary, the value range of num_k is the 256 number num_lf for arriving whole local features.
24) rarefaction representation;
According to sparse dictionary obtained in the previous step, carried out using the sparse coding algorithm local feature all to a component
All local features are then combined using maximum value pond algorithm by coding, and obtaining a dimension is the sparse of num_k
Indicate feature, the quantity of dimension and the quantity of cluster centre are equal.
25) classifier training;
Rarefaction representation feature is trained using linear SVM algorithm, obtains part classification device.
3) the differentiation process based on font attribute;
Based on the distinguished number of font attribute according to the characteristics of Chinese character and rule that component defines, set a series of correct
The font attribute that component should have, when component does not meet a certain font attribute, then it is assumed that the component is font miscut
The result of generation.Font attribute is defined as follows:
Part dimension attribute: the width and height of correct component are at least greater than one times of average stroke of the affiliated font of component
Width.The component that font cutting technique defines can include at least a stroke, therefore the image of component size that cutting obtains is at least
It is greater than one times of average stroke width.
Component area attribute: the sum of all pixels of the connected region in image of component is at least greater than the flat of one times of stroke width
Side.Obviously, the smallest stroke should be " point " in component, therefore the connected region for cutting obtained image of component is at least greater than
The size of equal to one stroke " point ".Here it is considered that the size of the smallest " point " is approximately equal to square of one times of stroke width.
4) based on the distinguished number of glyph skeleton;
It, will be every based on the distinguished number of glyph skeleton according to the stroke middle section relatively intrinsic feature of this smooth Chinese character
The profile middle section of a stroke carries out the detection of flatness, if there is profile catastrophe in stroke middle section, then it is assumed that font is cut
Cut result mistake.Specific step is as follows:
41) set in the component that a font cutting result obtains that stroke number is N, then on each component outline
Point, obtain N number of value, respectively represent minimum distance of the profile point apart from N number of stroke skeleton.Circular is: for
One profile point, traverses N number of stroke skeleton, and each stroke skeleton traverses all skeletal points and profile point and these pens is calculated
The distance for drawing skeletal point, the minimum distance for the stroke skeleton for taking the minimum value of distance currently to traverse as profile point distance.
42) minimum distance of a profile point and all stroke skeletons is minimized, gets the stroke skeleton of minimum value
It is exactly the stroke that profile point should belong to, the minimum value got is exactly the distance of profile point and stroke skeleton.
43) N number of profile point set is set, empty set is initialized as, i-th of set represents the profile of i-th of stroke.Traversal institute
This profile point is added to i-th of set if profile point belongs to i-th of stroke by some profile points.Complete the building of set
Later, the profile point for originating M% and end M% that the nearest stroke skeletal point of distance in each set is stroke is removed, is left
Be exactly stroke middle section profile point, the value range of M is 0 to 50.
44) it asks the profile point of each set to the mode of affiliated stroke skeleton distance, and thinks that this distance is the section
The average stroke width of stroke.If profile point is recognized to K times that affiliated stroke skeleton distance has been more than average stroke width
It is mutation profile for the profile point, the value range of K is 0.8 to 3.When the quantity of mutation profile point has been more than preset threshold
This component is then determined as the component of mistake cutting by value, and mutation profile point amount threshold is X times of average stroke width, X's
Value range is 0.7 to 3.
5) categorised decision method is set.The categorised decision method of this method is set as, as long as by above-mentioned any algorithm
It is determined as mistake, then font cutting result is determined as mistake.
Fig. 1 is font cutting result example;Wherein, (a) is the correct font cutting result that font cutting technique obtains;
(b) the font cutting result of the mistake obtained for font cutting technique.It is to judge that font is cut using the method provided by the present invention below
Cut the embodiment of result correctness.
1, with the algorithm rebuild based on font, the picture of the connected region of absent region matrix and extraneous region matrix is counted
Plain number.If the sum of all pixels of any one connected region is more than square of one times of stroke width, which is determined as
Mistake cutting result, algorithm terminate.Otherwise enter 2.Specifically comprising the steps of:
A. according to the size of former font, a difference value matrix equal with former word size, the difference value matrix are generated
All elements be initialized as 0.
B. each component that traversal cutting obtains as a result, according to the position in original image, the image of component is corresponding
To one and the equal sized region of image of component of difference value matrix, then by the correspondence area of image of component and difference value matrix
Domain carries out the cumulative of pixel scale.Complete all components it is cumulative after, obtain final difference value matrix.
C. according to the size of former font, an absent region matrix and a size phase equal with former word size are generated
Deng extraneous region matrix.The all elements of the two matrixes initialize 0.Each pixel of former font is traversed simultaneously, and
The corresponding position pixel of difference value matrix.Next in two kinds of situation, it is 0 that the first situation, which is former font pixel value, another
Be former font pixel value be 1.In the case of the first, if difference value matrix respective pixel value is 1, by extraneous region matrix pair
It answers the value of position to be set as 1, is otherwise set as 0.Under second situation, if difference value matrix respective pixel is 0, by absent region
The value of matrix corresponding position is set as 1, is otherwise set as 0.
D. connected region detection is carried out to absent region matrix and extraneous region matrix respectively, two obtained matrixes are all
Connected region number of pixels.According to the connected region pixel threshold of setting, if the pixel of any one connected region is total
The font is then determined as wrong cutting result, is otherwise determined as correct cutting result by number more than square of one times of stroke width.
2. utilizing the distinguished number based on part classification, obtained part classification device divides the classification of each component
Class.If there is the classification that the classification of a component should belong to it is not met, then the font is determined as wrong cutting result,
Algorithm terminates.Otherwise enter 3.
3. the distinguished number based on font attribute is used, as long as there is a component not input any one font attribute,
The font is determined as wrong cutting result, algorithm terminates.Otherwise enter 4.
4. using the distinguished number based on stroke skeleton, specific embodiment is as follows:
A. as shown in Fig. 3, stroke number is 5 in the image of component, available for the point on each component outline
5 values respectively represent minimum distance of the profile point apart from each stroke skeleton.Circular be for a profile point,
5 stroke skeletons are traversed, each stroke skeleton traverses all skeletal points and profile point and these stroke skeletal points is calculated
Distance, the minimum distance for the stroke skeleton for taking the minimum value of distance currently to traverse as profile point distance.
As soon as being b. minimized the minimum distance of a profile point and all stroke skeletons, the stroke skeleton of minimum value is got
It is the stroke that profile point should belong to, the minimum value got is exactly the distance of profile point and stroke skeleton.Set 5 profile point sets
It closes, is initialized as empty set, i-th of set represents the profile of i-th of stroke.All profile points are traversed, if profile point belongs to i-th
This profile point is then added to i-th of set by a stroke.It is after the building for completing set, distance in each set is nearest
Stroke skeletal point is the starting 20% of stroke and the profile point removal at end 20%, and remaining is exactly the profile point in stroke middle section.
C. it asks the profile point of each set to the mode of affiliated stroke skeleton distance, and thinks that this distance is this section of pen
The average stroke width drawn.If profile point is to 1 times that affiliated stroke skeleton distance has been more than average stroke width, then it is assumed that
The profile point is mutation profile.Such as the profile unusual part in attached drawing 3.When the quantity of mutation profile point has been more than 0.8 times of stroke
This component is then determined as the component of mistake cutting by width.
It should be noted that the purpose for publicizing and implementing example is to help to further understand the present invention, but the skill of this field
Art personnel, which are understood that, not to be departed from the present invention and spirit and scope of the appended claims, and various substitutions and modifications are all
It is possible.Therefore, the present invention should not be limited to embodiment disclosure of that, and the scope of protection of present invention is with claim
Subject to the range that book defines.
Claims (10)
1. a kind of method of discrimination of Chinese character pattern cutting result correctness successively includes the differentiation process rebuild based on font, base
In the differentiation process of part classification, the differentiation process based on font attribute and the differentiation process based on glyph skeleton;It specifically includes:
1) the differentiation process rebuild based on font:
It cuts each Chinese character pattern to obtain component, be spliced again according to original position, to obtained splicing font and original
Font carries out the comparison of pixel scale, and statistics obtains difference pixel value;Difference pixel value threshold value is set, further according to the difference of setting
Pixel value threshold value is differentiated, the font is determined as wrong cutting result or correct cutting result;
2) the differentiation process based on part classification:
Firstly, one parts data collection being made of correct component of building, training one component point on the parts data collection
Class device;Then, classified using the part classification device that training obtains to font cutting result to be discriminated, obtained to be discriminated
The classification of font cutting result is correct component or incorrect part;
3) the differentiation process based on font attribute:
The corresponding font attribute of correct component is set, when component does not meet corresponding font attribute, determines that the component is word
The result that shape miscut generates;
4) the differentiation process based on glyph skeleton:
The detection that the profile middle section of each stroke in Chinese character pattern is carried out to flatness, when there is profile mutation in stroke middle section
When, determine the font cutting result mistake.
2. the method for discrimination of Chinese character pattern cutting result correctness as described in claim 1, characterized in that be based on font described
The differentiation process of reconstruction, the differentiation process based on part classification, the differentiation process based on font attribute and based on glyph skeleton
During differentiation, when the judgement result that any differentiation process is differentiated for font cutting result to be discriminated is mistake
When cutting result, the font cutting result to be discriminated is determined as font miscut.
3. the method for discrimination of Chinese character pattern cutting result correctness as described in claim 1, characterized in that described to be based on font weight
During the differentiation built, the difference pixel value threshold value is set as square of stroke width.
4. the method for discrimination of Chinese character pattern cutting result correctness as described in claim 1, characterized in that described to be based on font weight
The differentiation process built specifically comprises the following steps:
11) according to the size of former font, a difference value matrix equal with former word size, the institute of the difference value matrix are generated
There is element to be initialized as 0;
12) traversal cutting obtain each component as a result, according to the position in original image, which is corresponded into difference
One and the equal sized region of image of component of different value matrix, then the corresponding region of image of component and difference value matrix is carried out
Pixel scale adds up;The cumulative of all components is completed, difference value matrix is obtained;
13) an absent region matrix and an extraneous region matrix equal with former word size are generated respectively;Two matrixes
All elements be initialized as 0, while traversing each pixel of former font and the corresponding position pixel of the difference value matrix;
Former font pixel value is set in two kinds of situation;The first situation is that former font pixel value is 0, another situation is that former font pixel
Value is 1;The extraneous region matrix is corresponded into position when the difference value matrix respective pixel value is 1 for the first situation
The value set is set as 1, is otherwise set as 0;For second situation, when the difference value matrix respective pixel is 0, by the missing area
The value of domain matrix corresponding position is set as 1, is otherwise set as 0;
14) connected region detection is carried out to the absent region matrix and extraneous region matrix respectively, it is all obtains two matrixes
Connected region number of pixels;Connected region number of pixels threshold value is set, when total pixel in any one connected region
When number is more than corresponding connected region number of pixels threshold value, which is determined as wrong cutting result, otherwise differentiates and is positive
True cutting result.
5. the method for discrimination of Chinese character pattern cutting result correctness as described in claim 1, characterized in that described based on component point
During the differentiation of class, the training of part classification device includes the following steps:
21) image preprocessing performs the following operations:
Image of component is zoomed in and out by the way of non-rigid scaling, each image of component is normalized to a square
Image, square side length are denoted as L;
22) Local Subgraphs picture is selected, the extraction of local feature is carried out to local subgraph, obtains multiple local feature vectors, it will
The number of local feature vectors is denoted as num_lf;
23) dictionary constructs:
In obtained local feature vectors, stochastical sampling obtains plurality of local feature as total characteristic set;Part
The number value range of feature should be greater than 10000;Maximum can be set to whole local feature vectors number num_lf;Using K
Means clustering algorithm obtains num_k cluster centre, as sparse dictionary;The value range of num_k is 256 special to all parts
The number num_lf of sign;
24) rarefaction representation:
According to sparse dictionary obtained in the previous step, one all local feature of component is compiled using sparse coding algorithm
Code;Then all local features are combined using maximum value pond algorithm, obtain the sparse table that a dimension is num_k
Show that feature, the quantity of dimension and the quantity of cluster centre are equal;
25) classifier training is trained the rarefaction representation feature using linear SVM algorithm, obtains component point
Class device.
6. the method for discrimination of Chinese character pattern cutting result correctness as claimed in claim 5, characterized in that the step 22) office
Portion's feature extraction, specifically performs the following operations:
22a. carries out uniform grid cutting to local subgraph, obtains multiple regions, is set as n*n;It sets a trap the side of portion's subgraph
A length of L_sub, the side length for obtaining each region are L_sub divided by n;Local Subgraphs picture is evenly dividing by the n*n region, mutually
Without intersection between phase, it is combined the Local Subgraphs picture constituted before cutting just;It is carried out in each region using Sobel operator
Convolution obtains the result of amplitude and phase;
Uniform phase is divided into n*n section by 22b., and each interval statistics obtain Local Subgraphs phase as in and fall in the section
The summation of the range value of pixel;Obtain the local feature of n*n dimension;
22c. splices the local feature of the respective dimensions in the region n*n, obtains the local feature of multidimensional.
7. the method for discrimination of Chinese character pattern cutting result correctness as described in claim 1, characterized in that described based on component point
During the differentiation of class, the classification is specifically: being obtained by calculating the font cutting result to be discriminated using classifier
Component categories and whether the component categories that should belong to identical differentiates;When the classification results are different from the classification that component should belong to
When, which is determined as wrong cutting result, and when the classification results are identical as the classification that component should belong to, which is determined as
Correct cutting result.
8. the method for discrimination of Chinese character pattern cutting result correctness as described in claim 1, characterized in that described to be based on font category
During the differentiation of property, the font attribute includes part dimension attribute and component area attribute.
9. the method for discrimination of Chinese character pattern cutting result correctness as described in claim 1, characterized in that described to be based on font bone
Specific step is as follows for the differentiation process of frame:
41) stroke number is set in the component that a font cutting result obtains as N, for the point on each component outline,
N number of value is obtained, minimum distance of the profile point apart from N number of stroke skeleton is respectively represented;
42) minimum value in minimum distance of the profile point apart from all stroke skeletons is taken, the stroke bone of minimum value will be got
The stroke that frame should belong to as profile point, using the minimum value got as profile point and the stroke skeleton for getting minimum value away from
From;
43) it constructs profile point set: setting N number of profile point set, be initialized as empty set, i-th of set represents i-th of stroke
Profile;All profile points are traversed, when profile point belongs to i-th of stroke, this profile point is added to i-th of set;It is complete
After the building of set, by stroke starting point and end end by the profile point in corresponding set remove a segment length, go
Except the ratio that the length of stroke accounts for stroke total length is M%, remaining profile point is exactly the profile point in stroke middle section;
44) profile point of each set is acquired to the middle number of affiliated stroke skeleton distance, as being averaged for corresponding stroke skeleton
Stroke width;When profile point has been more than K times of average stroke width to affiliated stroke skeleton distance, determine that the profile point is
It is mutated profile;When the quantity for being mutated profile point has been more than preset mutation profile point amount threshold, the component is sentenced
It Wei not the wrong component cut.
10. the method for discrimination of Chinese character pattern cutting result correctness as claimed in claim 9, characterized in that the step 41) wheel
The circular of minimum distance of the exterior feature point apart from N number of stroke skeleton is: for a profile point, N number of stroke skeleton is traversed,
Each stroke skeleton traverses the distance that profile point He these stroke skeletal points is calculated in all skeletal points, takes the minimum of distance
The minimum distance for the stroke skeleton that value is currently traversed as profile point distance;The value range of the step 43) M is 0 to 50;
The value range of the step 44) K is 0.8 to 3;The mutation profile point amount threshold is X times of average stroke width, X's
Value range is 0.7 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610847230.0A CN106503706B (en) | 2016-09-23 | 2016-09-23 | The method of discrimination of Chinese character pattern cutting result correctness |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610847230.0A CN106503706B (en) | 2016-09-23 | 2016-09-23 | The method of discrimination of Chinese character pattern cutting result correctness |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106503706A CN106503706A (en) | 2017-03-15 |
CN106503706B true CN106503706B (en) | 2019-06-07 |
Family
ID=58291008
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610847230.0A Active CN106503706B (en) | 2016-09-23 | 2016-09-23 | The method of discrimination of Chinese character pattern cutting result correctness |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106503706B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107092917B (en) * | 2017-03-24 | 2020-06-02 | 北京大学 | Chinese character stroke automatic extraction method based on manifold learning |
CN108154167B (en) * | 2017-12-04 | 2021-08-20 | 昆明理工大学 | Chinese character font similarity calculation method |
CN110210476B (en) * | 2019-05-24 | 2021-04-09 | 北大方正集团有限公司 | Character component clustering method, device, equipment and computer readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101819683A (en) * | 2009-10-26 | 2010-09-01 | 杨光祥 | Method for reconstructing Chinese character font |
CN102968764A (en) * | 2012-10-26 | 2013-03-13 | 北京航空航天大学 | Chinese character image inpainting method based on strokes |
JP2013214188A (en) * | 2012-04-02 | 2013-10-17 | Sharp Corp | Character recognition processing device, character recognition processing method, character recognition processing program, and computer readable recording medium |
CN104182748A (en) * | 2014-08-15 | 2014-12-03 | 电子科技大学 | A method for extracting automatically character strokes based on splitting and matching |
CN104992161A (en) * | 2015-07-17 | 2015-10-21 | 北京航空航天大学 | Chinese character part dividing and structure determination method based on part identification |
-
2016
- 2016-09-23 CN CN201610847230.0A patent/CN106503706B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101819683A (en) * | 2009-10-26 | 2010-09-01 | 杨光祥 | Method for reconstructing Chinese character font |
JP2013214188A (en) * | 2012-04-02 | 2013-10-17 | Sharp Corp | Character recognition processing device, character recognition processing method, character recognition processing program, and computer readable recording medium |
CN102968764A (en) * | 2012-10-26 | 2013-03-13 | 北京航空航天大学 | Chinese character image inpainting method based on strokes |
CN104182748A (en) * | 2014-08-15 | 2014-12-03 | 电子科技大学 | A method for extracting automatically character strokes based on splitting and matching |
CN104992161A (en) * | 2015-07-17 | 2015-10-21 | 北京航空航天大学 | Chinese character part dividing and structure determination method based on part identification |
Non-Patent Citations (2)
Title |
---|
Chinese Character Recognition Based on Character Reconstruction;Yun Li et al.;《2009 International Conference on Communications, Circuits and Systems2009 International Conference on Communications, Circuits and Systems》;20090918;第460-463页 |
基于图形识别的汉字笔画分类方法;赵青 等;《计算机技术与发展》;20091031;第14-17页 |
Also Published As
Publication number | Publication date |
---|---|
CN106503706A (en) | 2017-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Harouni et al. | Online Persian/Arabic script classification without contextual information | |
CN105426919B (en) | The image classification method of non-supervisory feature learning is instructed based on conspicuousness | |
CN101290659B (en) | Hand-written recognition method based on assembled classifier | |
CN106228528B (en) | A kind of multi-focus image fusing method based on decision diagram and rarefaction representation | |
CN104239902B (en) | Hyperspectral image classification method based on non local similitude and sparse coding | |
CN106598920B (en) | A kind of nearly word form classification method of stroke coding combination Chinese character dot matrix | |
JPH06243297A (en) | Method and equipment for automatic handwritten character recognition using static and dynamic parameter | |
CN105139041A (en) | Method and device for recognizing languages based on image | |
CN105893968A (en) | Text-independent end-to-end handwriting recognition method based on deep learning | |
CN111401353A (en) | Method, device and equipment for identifying mathematical formula | |
CN104850838A (en) | Three-dimensional face recognition method based on expression invariant regions | |
CN106909946A (en) | A kind of picking system of multi-modal fusion | |
CN106056082A (en) | Video action recognition method based on sparse low-rank coding | |
CN106055653A (en) | Video synopsis object retrieval method based on image semantic annotation | |
CN106503706B (en) | The method of discrimination of Chinese character pattern cutting result correctness | |
CN109800746A (en) | A kind of hand-written English document recognition methods based on CNN | |
CN105117740A (en) | Font identification method and device | |
CN108664975A (en) | A kind of hand-written Letter Identification Method of Uighur, system and electronic equipment | |
CN106650696A (en) | Handwritten electrical element identification method based on singular value decomposition | |
CN115620322B (en) | Method for identifying table structure of whole-line table based on key point detection | |
CN101655911B (en) | Mode identification method based on immune antibody network | |
Obaidullah et al. | Structural feature based approach for script identification from printed Indian document | |
CN103336830B (en) | Image search method based on structure semantic histogram | |
JPH08508128A (en) | Image classification method and apparatus using distribution map | |
CN101520839A (en) | Human body detection method based on second-generation strip wave conversion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |