CN104156730B - A kind of antinoise Research of Chinese Feature Extraction method based on skeleton - Google Patents

A kind of antinoise Research of Chinese Feature Extraction method based on skeleton Download PDF

Info

Publication number
CN104156730B
CN104156730B CN201410360498.2A CN201410360498A CN104156730B CN 104156730 B CN104156730 B CN 104156730B CN 201410360498 A CN201410360498 A CN 201410360498A CN 104156730 B CN104156730 B CN 104156730B
Authority
CN
China
Prior art keywords
point
pca
classification
points
carried out
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410360498.2A
Other languages
Chinese (zh)
Other versions
CN104156730A (en
Inventor
周元峰
朱东方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201410360498.2A priority Critical patent/CN104156730B/en
Publication of CN104156730A publication Critical patent/CN104156730A/en
Application granted granted Critical
Publication of CN104156730B publication Critical patent/CN104156730B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a kind of antinoise Research of Chinese Feature Extraction method based on skeleton, smooth, image denoising processing is carried out to text gray level image, and carry out binaryzation.Down-sampling is carried out to binary image, is converted into point cloud model.Etching operation is carried out on former binary image and obtains thick axis.PCA analyses are carried out based on the axis, obtain dividing result.Division result is merged, and post-processed to putting cloud-type after merging.B-spline curves fitting is carried out to sorted cloud, obtains skeleton.Chinese character image information is converted into point cloud model, reduces the influence that the factors such as noise are extracted to Chinese character skeleton;Skeleton is fitted using B-spline curves, can preferably keep the feature of original Chinese character;Directly original Chinese character image is handled, it is not necessary to pretreatment is normalized, reduces the difficulty of Chinese character skeleton extraction, improves efficiency.

Description

A kind of antinoise Research of Chinese Feature Extraction method based on skeleton
Technical field
The present invention relates to image processing and pattern recognition field, the Chinese character based on skeleton of specially a kind of robust is special Levy the method automatically extracted.
Background technology
The identification of Chinese character is a field of Text region.Because Hanzi font library is huge, and Chinese character pattern is more, no image of Buddha The alphabet type letters such as English equally have better simply recognizer, therefore the identification to Chinese character is always relatively difficult application Research field.The identification of Chinese character is generally divided into printed Chinese character identification and Handwritten Chinese Character Recognition, the identification for block letter are ground Study carefully more, but handwritten form is due to otherness, therefore discrimination is relatively low.
Identification for Chinese character, feature extraction are one of most important links in identifying system.Can extract different shape, One of good research emphasis being characterized in current Chinese Character Recognition in the case of different-style.In traditional research range, side Widely it is used for extracting the feature of Chinese character to feature, but direction character needs the standardization to Chinese character travel direction and foundation Elastic mesh, and the handwritten Kanji recognition to different shape is more difficult, and the simple Hanzi features based on direction character carries The needs of actual use can not be met by taking.
The direction of another feature extraction is the method based on Chinese character skeleton.The skeleton of Chinese character can be stronger symbolize the Chinese The shape topological structure of word, and geometric properties can be preferably kept, while can also significantly reduce calculating and matching word The difficulty in storehouse.Although the extraction of Chinese character skeleton can be used for showing Hanzi features, because Chinese character especially handwritten Chinese character has There are stronger otherness and low-quality, therefore the extraction to the skeleton progress better quality of Chinese character is still a current difficulty Topic.Focus is placed in the extraction and processing of Chinese character contour by more method, and other method uses the corruption in morphology Erosion method, it is impossible to preferably handle the low-quality Chinese character situation such as noise, sparse, fracture.
The content of the invention
To solve the shortcomings of the prior art, the invention discloses a kind of antinoise Research of Chinese Feature Extraction based on skeleton Method, for Chinese character especially the otherness of low-quality Chinese character the features such as, cover Chinese character using point cloud model, point cloud model has Openness, unconnectedness, and can preferably reduce influence of the noise to feature skeletal extraction.By being converted into point cloud model The extraction of skeleton is carried out, carrying out " division-merging " classification using PCA is handled, and finally uses minimum squared distance side Method matched curve.Reduce the influence that the factors such as noise are extracted to Chinese character skeleton, and reasonably Chinese-character stroke is carried out classification and Curve matching, finally draw the framework characteristic of more fairing.
To achieve the above object, concrete scheme of the invention is as follows:
A kind of antinoise Research of Chinese Feature Extraction method based on skeleton, comprises the following steps:
Step 1:The gray level image for needing text to be processed is pre-processed, including gray level image is smoothly located Reason, and gray level image is subjected to binary conversion treatment;
Step 2:Down-sampling is carried out to the gray level image after binary conversion treatment, generates point cloud model data;
Step 3:Etching operation is carried out to the point cloud model data and obtains thick axis point set;
Step 4:PCA analysis divisions are carried out according to splitting condition based on the axis point set, obtain dividing result;
Step 5:Division result is merged, and the point of intersection after merging is post-processed;
Step 6:B-spline curves fitting is carried out to the cloud data after step 5 processing, obtains the bone as Hanzi features Frame.
Specifically included in the step 1:
The gray level image of the text got to scanning is smoothed, and then the image after smoothing processing is carried out Binarization operation is converted into the bianry image of only black and white, and wherein white pixel is background color, before black picture element is Scape Chinese character color, the processing method of smoothing processing are to carry out Gaussian smoothing to neighborhood using OpenCV cvSmooth methods.
Specifically included in the step 2:
Down-sampling is carried out to the image after binary conversion treatment, sampling is only carried out on black picture element, chooses adopting for setting Image is converted into point cloud model data by sample ratio, the transverse direction of each pixel being sampled and vertical coordinate composition point cloud number According to a point coordinates.
Specifically included in the step 3:
The pixel point cloud that the gray level image after binary conversion treatment is checked using corrosion carries out corrosion set operation, until reaching The condition of etch-stop, obtain final thick axis point set;
The condition of the etch-stop is:There are eight abutment points in image after binary conversion treatment around current point, judge Whether any two adjoining black color dots of current point connect from each other, if do not connected, then it represents that be axial point, be not otherwise Axial point.
Splitting condition is in the step 4:
Two obtained local principal directions are analyzed using to the point cloud subset points progress PCA in local two adjacent circles Angle α sets splitting condition.
PCA analyses division, is specifically included in the step 4:
One untreated point of any selection is concentrated from thick axial point, calculates local principal direction (Main Local Direction, MLD), if current local principal direction Vi and subsequent point part principal direction Vj result for -1 (when Vi with Vj angle is more than θ (θ is pre-defined turning angle)), then it is turning point, is not otherwise, continues to search down along axial point One PCA central point is handled, if search, less than axial point, current point is turning point, finally by any one type i First PCA central point and last PCA central point labeled as starting point x (i) and terminal y (i), and the point between this is set For i types, the PCA central point Center (i) between this two-end-point, radius R (i) and local principal direction Vi are recorded work Merge and fitting is used, then concentrated from thick axial point and arbitrarily select a point to carry out PCA analysis divisions again, until all points It is processed and finishes, by limited number of time iteration, most thick axial point concentrates institute to be a little all disposed at last, obtains final Division set.
The step 5 specifically includes:
Remember that the maximum PCA radiuses during PCA analyses are MaxRadius, remember a certain classification i end points x (i) and y (i), PCA radiuses corresponding to two-end-point are Rx (i) and Ry (i), and dist (x (i), y (j)) returns to the distance of any two ends point, merges behaviour Only occur at the end points of arbitrary classification, by judging whether meet any merging condition therein at end points, when these segmentation symbols match, Complete last union operation.
The merging condition includes:
Condition one:For the i that classifies two end points x (i) and y (i), if dist (x (i), y (i))<=Rx (i)+Ry (i), and classification i whole PCA central points only have two-end-point, i.e., central point number is 2 (other classification at least 3 centers Point), and have second of the classification j intersected with two characteristic circles end points x (j) or y (j), then the i and classification j that classifies meets merging bar Part;
Condition two:For arbitrary classification i and classification j, if arbitrary two-end-point in two classes, it is assumed that for classification i x end points With the y end points of j classes, meet that the vectorial Vx angles for the vectorial Vij and end points x that x (i) and y (j) line are formed are less than θ (θ is pre- The turning angle first defined) and classification i and j end points of classifying at any two points maximum step-length be present and be not more than RectSize/16 The minimum spanning tree of (RectSize is the Euclidean distance of farthest 2 points of distance in point cloud), and (V (x (i)), V (y (j)) folder Angle belongs to [0, θ] or belonged to [180-θ, 180], then it represents that classification i and classification j meets merging condition, and wherein θ is pre-defined Turning angle;
Condition three:In arbitrary classification i and classification j, if any two ends point in two classes, it is assumed that be the x end points and j classes of i classes Y end points, dist (x (i), y (j))<=Rx (i)+Ry (j), and y (j) gets along well in two end points that itself x end points is intersecting, x (i) Itself y end points of getting along well intersects, at the same two-end-point also get along well the third classification PCA unit characters circle it is intersecting, and classify i and classification There is the minimum spanning tree that maximum step-length is not more than RectSize/16 in any two points at j end points, then the i and classification j that classifies meets Merging condition.
The step 6 specifically includes:
Using least square fitting (SDM) method, caused center point set S during PCA is analyzed firstCenter(i)(should Center point set is other points in the point cloud that radius covers around PCA central points) as initial B-spline control point, to control The quantity and position for making point are adjusted, and with SDM method iterative fitting B-spline curves, the B-spline curves finally given are made For the framework characteristic of Chinese character.Relative to other iterative fitting B-spline curves methods, SDM possesses faster iteration speed, more stable Convergence.Last Chinese character skeleton feature is used as by the use of the B-spline curves.
Etching operation is carried out to the gray level image after binary conversion treatment and obtains thick axis point set;Use morphology bianry image Conversion to carry out etching operation to bianry image be set operation to pixel.Use specific corrosion core (or being template) Set operation is carried out to pixel, so that it may so that border is internally shunk, by limited number of time etching operation, finally give axis. Shown in the set of corrosion is defined as follows:
S'=S- ∑ S φ i formula 1
S' is the pixel set after once corroding in above-mentioned formula, and S is original pixels collection, and φ i are corrosion core in i positions Set." " operation represents:If it is number c that core is corroded with S intersection points number in current location, the intersection point is returned, is otherwise returned Return 0.Pti represents the pixel on the bianry image of i points.By limited number of time etching operation, final axis can be obtained.
PCA analyses (principal component analysis, PCA) are also referred to as principal component analysis, are multivariables point Basic fundamental in analysis.The method that this patent proposes is carried out PCA minutes using to the point cloud point set in local two adjacent circles The obtained angle α of two local principal direction is analysed to set splitting condition, while the average central point in PCA analyses is by as most Control point during B-spline curves fitting is carried out afterwards.The point cloud point set in a radius of circle R is carried out PCA minutes in this patent Analysis is referred to as a PCA unit, and a PCA unit includes characteristic radius R, average central point χ, is radius using χ as the center of circle using R Characteristic circle and local direction V.
PCA is to centered on a point, radius R is that the process of the point cloud point set progress PCA analyses in circle is:
(1) the average χ of sampled point is calculated first, as shown in Equation 3:
χ=(∑ Xi)/N formula 3
Xi is the point in radius circle, and N is the point number in radius circle.
(2) and then original vector matrix X and χ deviation matrix C=X-χ are calculated, then calculates covariance matrix T=C CT
(3) eigenvalue λ and characteristic vector M of covariance matrix are finally calculated using SVD decomposition methods.Matrix C is carried out strange The formula of different decomposition (SVD decomposition) is:
C=MSVTFormula 4
Wherein, M is covariance matrix T characteristic vector, and column vector arrangement, S is to decompose obtained diagonal by unusual Matrix, VTIt is a square formation, while V column vector is CTC characteristic vector.By the PCA analyses to cloud data, we obtain Local principal direction (Main Local Direction, MLD) is arrived, direction V is the principal element for setting splitting condition.
Division result is merged, and the point of intersection after merging is post-processed, in all possible intersection Point, and the bigger place of aim curve curvature may be split off being labeled as turning point, but in the bigger place of curvature And intersection may some classification be to belong to of a sort, therefore the type of " consistent " should be merged, to reduce The quantity of classification.Assuming that any preferable matched curve length of point cloud classifications is no less than being equal to its mean breadth.
After merging fixed point cloud-type, error is there may be, it is necessary to crosspoint due to putting cloud division under different radiuses The point cloud at place is post-processed, and processing method is based on the crosspoint method for relocating apart from weights.
Beneficial effects of the present invention:
Text image by smooth, denoising, is then converted to bianry image, then bianry image is entered by the present invention Row down-sampling is converted into cloud data model, and PCA analyses are carried out on the point cloud model, is operated by " division-merging " to point Cloud data are classified, and the fitting of B-spline curves is finally carried out on the basis of classification, Chinese character is used as using the matched curve Framework characteristic, carry out Chinese Character Recognition sort operation.Have the following advantages that:
(1) the problem of handling Chinese character image is converted into point cloud model processing, can preferably reduce noise to Chinese character structure model teaching aid The influence of frame extraction.Also there is relatively good result to the more violent situation of grey scale change.
(2) curve fitted and original point cloud data and the ideal curve goodness of fit are higher, and can handle well The situation of infall, it can preferably show the framework characteristic of Chinese character.
(3) normalization for not requiring to carry out Chinese character position pre-processes, it is possible to rational Chinese character skeleton is extracted, can be with The identification of characteristic point is carried out to Chinese character skeleton in follow-up processing and the Classification and Identification of Chinese character operates.
Brief description of the drawings
Fig. 1 is Research of Chinese Feature Extraction and the overview flow chart of identification on the basis of the present invention;
Fig. 2 is the feature extraction schematic flow sheet of the present invention;
Fig. 3 is the Chinese character skeleton feature extraction example realized according to the present invention;
Fig. 4 (a) carries out the X-type corrosion core used during etching operation;
Fig. 4 (b) carries out the cross corrosion core used during etching operation;
Fig. 4 (c) carries out the full eight neighborhood type used during etching operation.
Embodiment:
The present invention is described in detail below in conjunction with the accompanying drawings:
As shown in figure 1, the Research of Chinese Feature Extraction based on the present invention comprises the following steps with the overall procedure identified:
A. to needing text to be processed to be scanned, gray level image is obtained.
B. gray level image is pre-processed, such as smooth, binaryzation obtains bianry image.
C. the extraction of feature is carried out to the image comprising character, obtains characteristic vector set.
D. it is compared according to this feature vector set with priori Hanzi features storehouse, matching operation, Chinese character is identified.
E. the Chinese character identified is post-processed to obtain final text.
Present disclosure mainly focuses on the framework characteristic for how fast and effectively extracting Chinese character, is represented with this Information, the architectural feature of Chinese character.As shown in Fig. 2 comprising the following steps that for the present invention is shown:
Step 1:Smooth, image denoising processing is carried out to text gray level image, and carries out binaryzation.
Step 2:Down-sampling is carried out to binary image, is converted into point cloud model.
Step 3:Etching operation is carried out on point cloud model and obtains thick axis.
Step 4:PCA analyses are carried out based on the axis, obtain dividing result.
Step 5:Division result is merged, and post-processed to putting cloud-type after merging.
Step 6:B-spline curves fitting is carried out to sorted cloud, obtains skeleton.
In step 1, smooth, image denoising processing is carried out to text gray level image, and carry out the detailed process of binaryzation such as Under:
The cvSmooth methods in OpenCV are used to carry out 5X5 neighborhood sizes for the gray-scale map comprising text got Gaussian smoothing.Then bianry image is converted into the gray-scale map after smoothing denoising using fixed threshold value, bianry image only has The pixel of the pixel of black and white, wherein black represents the region that the text in original gray level image is covered.
In step 2, down-sampling is carried out to binary image, the detailed process for being converted into point cloud model is as follows:
Down-sampling is carried out for the black picture element in the bianry image that is got in step 1, different sampling ratios can be chosen Such as 1/5,1/8 etc. of pixel, will sample obtained black pixel point coordinate transformation is point cloud model data, the cloud data The feature of text can preferably be characterized.As shown in Figure 3.
In step 3, on point cloud model carry out etching operation obtain thick axis detailed process it is as follows:
Corrosion set operation is carried out to pixel point cloud using specific corrosion core (or being template), obtain it is rough in Axle.Simultaneously using three kinds of corrosion cores as shown in Fig. 4 (a)-Fig. 4 (c), using the eight neighborhood of current point as differentiation, respectively cross Type, X-type and full eight neighborhood type.Multiple etching operation, the condition until reaching etch-stop are carried out according to ABC-ABC order. It is effectively to eliminate using the advantages of above-mentioned three kinds corrosion core and covers serrated boundary caused by bianry image.
The decision condition of the definition of etch-stop operating condition, i.e. axial point (core point) is as follows:In bianry image There are eight abutment points around current point, judge whether its any two adjoining black color dots connect from each other, if do not connected (connection here also includes diagonal and connected) then represents it is axial point, is not otherwise axial point.By limited number of time etching operation, Final axis can be obtained.
Step 4 is based on the axis and carries out PCA analyses, and the detailed process for obtaining dividing result is as follows:
According to the thick skeleton is corroded in step 3, PCA analyses are carried out on skeletal point.Estimating the PCA of some point During radius, an initial radium R is first provided, then using the axial point as the center of circle, using R as in the circle of radius, if white pixel (Σ pti are white pixel point number sum to the ratio between point and black pixel point β=Σ pti/ Σ ptj, and Σ ptj are black pixel point The conjunction of number) more than a certain threshold value, such as choose β and be equal to 0.15, then the radius that radius R ' is handled as PCA, otherwise increase The radius.Central point during PCA analyses is carried out, is obtained in the axial point obtained from step 2.When we can be in any When axle point carries out PCA analyses, it is possible to judge whether two neighboring principal direction V is harmonious.It is given below in the next PCA of search Heart point Center (i+1) and judge the definition whether two principal directions Vi, Vj are harmonious.
Define axis point set Saxis, current PC A central point Center (i), current PC A treatment radius R (i), at current PC A Manage local principal direction vector Vi, alternative axis point set Salt, SaltIn any point be designated as Ptj, and define angle (Vi, Vj) to calculate vectorial Vi and Vj angle, threshold angle θ, wherein θ ∈ (0,45) are defined.In order that obtain the partial points of PCA processing Cloud data can be preferably covered, sets the optional axial point Center (i+1) of next forward direction and reversely optional axial point Center (i-1) and distance dist (i, i+1), the dist (i, i-1) of currently processed central point Center (i) belong to [1, R (i)+0.5*R(i)-1]。
Positive optional point set:
Sop_altThe point concentrated for positive optional point set, Pt (x) for optional point, angle ((Pt (x)-Center (i) sharp angle between Pt (x) points and current PC A central point Center (i))) is represented.
Reverse optional point set:
Sneg_alt=Pt (x) | angle ((Pt (x)-Center (i)), Vi)
∈[180-θ,180],Pt(x)∈SaxisFormula 6
Sneg_altFor reverse optional point set, Pt (x) is the point that optional point is concentrated, downward a central point:
Reverse next central point:
Any PCA central points handle function:
The algorithm that PCA processing and division are carried out to axis point is as follows:Defining classification counter m, initial m are 0.
# steps 1.m adds 1, the PCA processing since untreated axial point concentration is appointed and taken a little, and it is m to mark such.
# steps 2. handle current point according to formula 9, if result is -1, terminate such search, when the place of front direction Reason terminates, and goes to step 1.If result is 1 or 2,3 are gone to step.
# steps 3. are according to formula 5,6,7,8 points of both direction recursive search processing subsequent points, if subsequent point is searched 2 are gone to step, is otherwise stopped when front direction searches element, the processing in direction terminates, and goes to step 1.
By limited number of time iteration, most axial point concentrates institute to be a little all disposed at last, obtains final division set.
Step 5 merges to division result, and as follows to the process that point cloud-type is post-processed after merging:
Dividing all possible crosspoint that, and the place that aim curve curvature is bigger, be all split into transferring Point, but curvature it is bigger place and intersection may some classification be to belong to of a sort, therefore should general " one The type of cause " merges, to reduce the quantity of classification.Assuming that any preferable matched curve length of point cloud classifications is no less than Equal to its mean breadth.Remember that the maximum PCA radiuses during PCA analyses are MaxRadius, remember a certain classification i end points x (i) With y (i), PCA radiuses corresponding to two-end-point are Rx (i) and Ry (i), and dist (x (i), y (j)) returns to the distance of any two ends point. Notice that union operation is only occurred at the end points of arbitrary classification.The definition of " uniformity " is given below.
Condition one:For the i that classifies two end points x (i) and y (i), if dist (x (i), y (i))<=Rx (i)+Ry (i), and classification i whole PCA central points only have two-end-point, and have intersect with two characteristic circles second j that classifies end points x (j) or y (j), the then i that classifies are consistent with classification j.
Condition two:For arbitrary classification i and classification j, if 2 class any two ends points (it is assumed herein that for i classes x end points and The y end points of j classes) meet that the vectorial Vx angles for the vectorial Vij and end points x that x (i) and y (j) line are formed are less than θ (θ is advance The turning angle of definition) and classification i and j end points of classifying at any two points maximum step-length be present no more than RectSize/16 Minimum spanning tree 1., and angle (V (x (i)), V (y (j)) ∈ [0, θ] or ∈ [180-θ, 180] 2., then it represents that classification i It is consistent with classification j, wherein θ is previously defined turning angle.
Condition three:In arbitrary classification i and classification j, if 2 class any two ends points are (it is assumed herein that x end points and j for i classes The y end points of class) dist (x (i), y (j))<3., and y's=Rx (i)+Ry (j) (j) gets along well that itself x end points is intersecting, x in two end points (i) itself y end points of getting along well intersects, at the same two-end-point also get along well the third classification PCA unit characters circle it is intersecting 4., and the i that classifies With classification j end points at any two points exist maximum step-length be not more than avgRectSize minimum spanning tree, then classify i and point Class j is consistent.
The adequacy of three merging conditions of brief analysis:Because the maximum radius of PCA units is MaxRadius, therefore need Any two class end points x (i) of union operation is judged whether to, y (j) meets inequality:
Dist (xi, yj)-Rx (i)-Ry (j) <=2*MaxRadius formula 10
For condition one, because the preferable matched curve length of any point cloud classifications is no less than the vacation of its mean breadth If therefore the PCA units of any point cloud classifications is not less than two, it can thus be concluded that situation of the PCA units less than two must By unreasonable classification caused by axis influence of noise, it is necessary to be merged with closest type when being division.
Condition two is consistent to intersection direction but merged by the type of another kind of cut-off.Appoint for intersection Meaning needs two classes that merge, it is necessary to ensure that MLD angles at two class end points are less than the threshold value (in condition two formula 2.) of definition, and The point of two classes is continuous (in condition two formula 1.).
Condition three is to originally belonging to same type but being split into more because adjacent PCA units MLD angles exceed threshold value The situation of class merges.Wherein 3. formula shows that two-end-point PCA unit circles intersect, 4. formula shows any one at two-end-point Class right and wrong are from ring, and two classes the 3rd class of getting along well intersects, while should also meet the continuity at the midpoint of condition two.
Defined for " uniformity " in above-mentioned condition, represent that two classes are the classification that can merge, such as classify i and classification j It is consistent, then i classes is included into j classes.Consistent type can be combined into by one kind by union operation.
It is to the method that is post-processed of point cloud after merging:
For any one infall point pt (k), the initial B of the multiclass point datas such as type i, j that itself and the point belong to is calculated The minimum range D of SPLpdk, and estimate this institute with the PCA treatment radius Ri (k), Rj (k) etc. nearest apart from the point Belong to the mean radius of the class at pt (k) points such as type i, j.Assuming that certain point pt (k) belongs to two classes i, j, and apart from initial B SPL Curve (i) and Curve (j) minimum range is respectively DpdkAnd D (i)pdk(j) the nearest of i and j classes, is belonged to PCA processing centers point radius is Ri (k), Rj (k), and note weighted minimum distance ratio is λD
λD=(Dpdk(i)/Dpdk(j)) × (Rj (k)/Ri (k)) formula 11
If λDLess than threshold value ratio either more than if 1/ratio, then i and j classes are belonged to, if λDMore than threshold value The ratio points belong to j classes, if λDLess than 1/ratio, belong to i classes.
Step 6 carries out B-spline curves fitting to sorted cloud, and the process for obtaining skeleton as Hanzi features is as follows:
Classification results are carried out with the fitting of B-spline curves, using SDM methods, is produced during foregoing PCA is analyzed first Center point set SCenter(i)As initial control point, the quantity and position at control point are adjusted, and with SDM methods Iterative fitting B-spline curves, until its squared distance (SD) error is less than certain threshold epsilon or carries out multiple Its squared distance (SD) error changes are less than certain threshold value ζ after iteration.Final B-spline curves are as most The framework characteristic of whole Chinese character.
Although described above combines design sketch and flow chart has carried out detailed description, but is not the guarantor to the present invention The limitation that scope is carried out is protected, technical staff can modify or deform to algorithm on the basis of the present invention, but acquired results Still it is within the scope of the present invention.

Claims (1)

1. a kind of antinoise Research of Chinese Feature Extraction method based on skeleton, comprises the following steps:
Step 1:The gray level image for needing text to be processed is pre-processed, including gray level image is smoothed, and Gray level image is subjected to binary conversion treatment;
Step 2:Down-sampling is carried out to the gray level image after binary conversion treatment, generates point cloud model data;
Step 3:Etching operation is carried out to the point cloud model data and obtains thick axis point set;
Step 4:PCA analysis divisions are carried out according to splitting condition based on the axial point cloud, obtain dividing result;
Step 5:Division result is merged, and the point of intersection after merging is post-processed;
Step 6:B-spline curves fitting is carried out to the point cloud after step 5 processing, obtains the skeleton as Hanzi features;
Specifically included in the step 1:
The gray level image of the text got to scanning is smoothed, and then carries out two-value to the image after smoothing processing Change the bianry image that operation is converted into only black and white, wherein white pixel is background color, and black picture element is the prospect Chinese Word color, the processing method of smoothing processing are to carry out Gaussian smoothing to neighborhood using OpenCV cvSmooth methods;
Specifically included in the step 2:
Down-sampling is carried out to the image after binary conversion treatment, sampling is only carried out on black picture element, chooses the sampling ratio of setting Image is converted into point cloud model data by example, and the transverse direction of each pixel being sampled and vertical coordinate form cloud data One point coordinates;
Specifically included in the step 3:
The pixel point cloud that the gray level image after binary conversion treatment is checked using corrosion carries out corrosion set operation, is terminated until reaching The condition of corrosion, obtain final thick axis point set;
Shown in the set of corrosion is defined as follows:
S'=S- ∑ S φ i formula 1
S' is the pixel set after once corroding in above-mentioned formula, and S is original pixels collection, and φ i are collection of the corrosion core in i positions Close, "●" operation represents:If it is number c that core is corroded with S intersection points number in current location, the intersection point is returned, otherwise returns to 0, Pti represents the pixel on the bianry image of i points;
PCA analyses division, is specifically included in the step 4:
One untreated point of any selection is concentrated from thick axial point, calculates local principal direction, if current local principal direction Vi Result with subsequent point part principal direction Vj is -1, then is turning point, is not otherwise, is continued next along axial point search PCA central points are handled, if search, less than axial point, current point is turning point, finally by the of any one type i Point between this is set to i by one PCA central point and last PCA central point labeled as starting point x (i) and terminal y (i) Type, the PCA central point Center (i) between this two-end-point, radius R (i) and local principal direction Vi are recorded and closed And and fitting be used, then concentrated from thick axial point and arbitrarily select point to carry out PCA analysis divisions again, until all points all It is processed, by limited number of time iteration, most thick axial point concentrates institute to be a little all disposed at last, obtains final point Split set;The splitting condition is:Two parts for analyzing to obtain using the point cloud in local two adjacent circles is carried out PCA The angle α of principal direction sets splitting condition;
The step 5 specifically includes:
Remember that the maximum PCA radiuses during PCA analyses are MaxRadius, remember a certain classification i end points x (i) and y (i), both ends PCA radiuses corresponding to point are Rx (i) and Ry (i), and dist (x (i), y (j)) returns to the distance of any two ends point, and union operation is only Occur at the end points of arbitrary classification, by judging whether meet any merging condition therein at end points, when these segmentation symbols match, complete Last union operation;
The merging condition includes:
Condition one:For the i that classifies two end points x (i) and y (i), if dist (x (i), y (i))<=Rx (i)+Ry (i), and Classification i whole PCA central points only have two-end-point, i.e. central point number is 2, and have second of the classification intersected with two characteristic circles J end points x (j) or y (j), the then i and classification j that classifies meet merging condition;
Condition two:For arbitrary classification i and classification j, if arbitrary two-end-point in two classes, it is assumed that for classification i x end points and j The y end points of class, meet that the vectorial Vx angles for the vectorial Vij and end points x that x (i) and y (j) line is formed are less than θ, θ is fixed in advance The turning angle of justice, and maximum step-length be present no more than RectSize/16 most in any two points at classify i and j end points of classifying Small spanning tree, RectSize are the Euclidean distance of farthest 2 points of distance in point cloud;And (V (x (i)), V (y (j)) angle category In [0, θ] or belong to [180-θ, 180], then it represents that classification i and classification j meet merging condition, and wherein θ is previously defined turnover Angle;
Condition three:In arbitrary classification i and classification j, if any two ends point in two classes, it is assumed that the y ends of x end points and j classes for i classes Point, dist (x (i), y (j))<=Rx (i)+Ry (j), and in two end points y (j) get along well itself x end points is intersecting, x (i) discord from Body y end points intersects, at the same two-end-point also get along well the third classification PCA unit characters circle it is intersecting, and classify i and classification j end points There is the minimum spanning tree that maximum step-length is not more than avgRectSize in any two points at place, avgRectSize is 2 points in point cloud Between Euclidean distance average value, then classify i and classification j meet merging condition;
The step 6 specifically includes:
Using minimum squared distance approximating method, caused center point set S during PCA is analyzed firstCenter(i)As initial B-spline control point, the quantity and position at control point are adjusted, and with SDM method iterative fitting B-spline curves, most Framework characteristic of the B-spline curves obtained eventually as Chinese character;
The condition of the etch-stop is:There are eight abutment points in image after binary conversion treatment around current point, judge current Whether any two adjoining black color dots of point connect from each other, if do not connected, then it represents that be axial point, be not otherwise axis Point.
CN201410360498.2A 2014-07-25 2014-07-25 A kind of antinoise Research of Chinese Feature Extraction method based on skeleton Active CN104156730B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410360498.2A CN104156730B (en) 2014-07-25 2014-07-25 A kind of antinoise Research of Chinese Feature Extraction method based on skeleton

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410360498.2A CN104156730B (en) 2014-07-25 2014-07-25 A kind of antinoise Research of Chinese Feature Extraction method based on skeleton

Publications (2)

Publication Number Publication Date
CN104156730A CN104156730A (en) 2014-11-19
CN104156730B true CN104156730B (en) 2017-12-01

Family

ID=51882227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410360498.2A Active CN104156730B (en) 2014-07-25 2014-07-25 A kind of antinoise Research of Chinese Feature Extraction method based on skeleton

Country Status (1)

Country Link
CN (1) CN104156730B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780412B (en) * 2016-11-28 2020-04-14 西安精雕软件科技有限公司 Method for generating machining path by utilizing handwritten body skeleton line
CN108171144B (en) * 2017-12-26 2020-12-11 四川大学 Information processing method, information processing device, electronic equipment and storage medium
CN109147469B (en) * 2018-07-09 2021-07-30 安徽慧视金瞳科技有限公司 Calligraphy practicing method
CN109409211B (en) * 2018-09-11 2020-09-18 北京语言大学 Processing method, processing device and storage medium for Chinese character skeleton stroke segments
CN109712147A (en) * 2018-12-19 2019-05-03 广东工业大学 A kind of interference fringe center line approximating method extracted based on Zhang-Suen image framework
CN110246104B (en) * 2019-06-13 2023-04-25 大连民族大学 Chinese character image processing method
CN113647212A (en) * 2021-05-07 2021-11-16 天津理工大学 Weeding robot and weeding method based on crop stem positioning
CN114494704A (en) * 2022-02-16 2022-05-13 重庆大学 Method and system for extracting framework from binary image in anti-noise manner

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186787A (en) * 2011-12-31 2013-07-03 廖志武 Low-quality Chinese character primary skeleton extraction algorithm based on point cloud model
CN103268631A (en) * 2013-05-23 2013-08-28 中国科学院深圳先进技术研究院 Method and device for extracting point cloud framework

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186787A (en) * 2011-12-31 2013-07-03 廖志武 Low-quality Chinese character primary skeleton extraction algorithm based on point cloud model
CN103268631A (en) * 2013-05-23 2013-08-28 中国科学院深圳先进技术研究院 Method and device for extracting point cloud framework

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Control point adjustment for B-spline curve approximation;Huaiping Yang等;《Computer-Aided Design》;20040630;第36卷(第7期);正文第641页左侧第3段第4-10行,第4段1-2行和右侧第1段 *
Fitting multiple curves to point clouds with complicated topological structures;Dongfang Zhu等;《Computer-Aided Design andcomputer Graphics, IEEE》;20140515;正文第63页左侧倒数第1段和右侧第1段,图5(a)(b) *
低质汉字骨架提取研究;侯显玲;《中国优秀硕士学位论文全文数据库信息科技辑》;20120515(第5期);全文 *
骨架提取算法研究与应用;张静;《中国优秀硕士学位论文全文数据库信息科技辑》;20140715(第7期);全文 *

Also Published As

Publication number Publication date
CN104156730A (en) 2014-11-19

Similar Documents

Publication Publication Date Title
CN104156730B (en) A kind of antinoise Research of Chinese Feature Extraction method based on skeleton
CN108256456B (en) Finger vein identification method based on multi-feature threshold fusion
CN101561866B (en) Character recognition method based on SIFT feature and gray scale difference value histogram feature
CN108509881A (en) A kind of the Off-line Handwritten Chinese text recognition method of no cutting
CN109685065B (en) Layout analysis method and system for automatically classifying test paper contents
CN113033567B (en) Oracle bone rubbing image character training method fusing segmentation network and generation network
US9224207B2 (en) Segmentation co-clustering
Akhand et al. Convolutional Neural Network based Handwritten Bengali and Bengali-English Mixed Numeral Recognition.
Dai Nguyen et al. Recognition of online handwritten math symbols using deep neural networks
CN105117707A (en) Regional image-based facial expression recognition method
Al Abodi et al. An effective approach to offline Arabic handwriting recognition
Rosyda et al. A review of various handwriting recognition methods
Vithlani et al. Structural and statistical feature extraction methods for character and digit recognition
CN114387592A (en) Character positioning and identifying method under complex background
CN107609482B (en) Chinese text image inversion discrimination method based on Chinese character stroke characteristics
Aravinda et al. Template matching method for Kannada handwritten recognition based on correlation analysis
Jameel et al. Offline recognition of handwritten urdu characters using b spline curves: A survey
Ismail et al. Geometrical-matrix feature extraction for on-line handwritten characters recognition
CN110378337B (en) Visual input method and system for drawing identification information of metal cutting tool
Al-shatnawi et al. The Thinning Problem in Arabic Text Recognition-A Comprehensive Review
CN111652287A (en) Hand-drawing cross pentagon classification method for AD (analog-to-digital) scale based on convolution depth neural network
Edan Cuneiform symbols recognition based on k-means and neural network
CN111325270B (en) Dongba text recognition method based on template matching and BP neural network
Eraqi et al. HMM-based offline Arabic handwriting recognition: Using new feature extraction and lexicon ranking techniques
Naz et al. Challenges in baseline detection of cursive script languages

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant