CN104156730B - A kind of antinoise Research of Chinese Feature Extraction method based on skeleton - Google Patents
A kind of antinoise Research of Chinese Feature Extraction method based on skeleton Download PDFInfo
- Publication number
- CN104156730B CN104156730B CN201410360498.2A CN201410360498A CN104156730B CN 104156730 B CN104156730 B CN 104156730B CN 201410360498 A CN201410360498 A CN 201410360498A CN 104156730 B CN104156730 B CN 104156730B
- Authority
- CN
- China
- Prior art keywords
- point
- pca
- classification
- points
- carried out
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a kind of antinoise Research of Chinese Feature Extraction method based on skeleton, smooth, image denoising processing is carried out to text gray level image, and carry out binaryzation.Down-sampling is carried out to binary image, is converted into point cloud model.Etching operation is carried out on former binary image and obtains thick axis.PCA analyses are carried out based on the axis, obtain dividing result.Division result is merged, and post-processed to putting cloud-type after merging.B-spline curves fitting is carried out to sorted cloud, obtains skeleton.Chinese character image information is converted into point cloud model, reduces the influence that the factors such as noise are extracted to Chinese character skeleton;Skeleton is fitted using B-spline curves, can preferably keep the feature of original Chinese character;Directly original Chinese character image is handled, it is not necessary to pretreatment is normalized, reduces the difficulty of Chinese character skeleton extraction, improves efficiency.
Description
Technical field
The present invention relates to image processing and pattern recognition field, the Chinese character based on skeleton of specially a kind of robust is special
Levy the method automatically extracted.
Background technology
The identification of Chinese character is a field of Text region.Because Hanzi font library is huge, and Chinese character pattern is more, no image of Buddha
The alphabet type letters such as English equally have better simply recognizer, therefore the identification to Chinese character is always relatively difficult application
Research field.The identification of Chinese character is generally divided into printed Chinese character identification and Handwritten Chinese Character Recognition, the identification for block letter are ground
Study carefully more, but handwritten form is due to otherness, therefore discrimination is relatively low.
Identification for Chinese character, feature extraction are one of most important links in identifying system.Can extract different shape,
One of good research emphasis being characterized in current Chinese Character Recognition in the case of different-style.In traditional research range, side
Widely it is used for extracting the feature of Chinese character to feature, but direction character needs the standardization to Chinese character travel direction and foundation
Elastic mesh, and the handwritten Kanji recognition to different shape is more difficult, and the simple Hanzi features based on direction character carries
The needs of actual use can not be met by taking.
The direction of another feature extraction is the method based on Chinese character skeleton.The skeleton of Chinese character can be stronger symbolize the Chinese
The shape topological structure of word, and geometric properties can be preferably kept, while can also significantly reduce calculating and matching word
The difficulty in storehouse.Although the extraction of Chinese character skeleton can be used for showing Hanzi features, because Chinese character especially handwritten Chinese character has
There are stronger otherness and low-quality, therefore the extraction to the skeleton progress better quality of Chinese character is still a current difficulty
Topic.Focus is placed in the extraction and processing of Chinese character contour by more method, and other method uses the corruption in morphology
Erosion method, it is impossible to preferably handle the low-quality Chinese character situation such as noise, sparse, fracture.
The content of the invention
To solve the shortcomings of the prior art, the invention discloses a kind of antinoise Research of Chinese Feature Extraction based on skeleton
Method, for Chinese character especially the otherness of low-quality Chinese character the features such as, cover Chinese character using point cloud model, point cloud model has
Openness, unconnectedness, and can preferably reduce influence of the noise to feature skeletal extraction.By being converted into point cloud model
The extraction of skeleton is carried out, carrying out " division-merging " classification using PCA is handled, and finally uses minimum squared distance side
Method matched curve.Reduce the influence that the factors such as noise are extracted to Chinese character skeleton, and reasonably Chinese-character stroke is carried out classification and
Curve matching, finally draw the framework characteristic of more fairing.
To achieve the above object, concrete scheme of the invention is as follows:
A kind of antinoise Research of Chinese Feature Extraction method based on skeleton, comprises the following steps:
Step 1:The gray level image for needing text to be processed is pre-processed, including gray level image is smoothly located
Reason, and gray level image is subjected to binary conversion treatment;
Step 2:Down-sampling is carried out to the gray level image after binary conversion treatment, generates point cloud model data;
Step 3:Etching operation is carried out to the point cloud model data and obtains thick axis point set;
Step 4:PCA analysis divisions are carried out according to splitting condition based on the axis point set, obtain dividing result;
Step 5:Division result is merged, and the point of intersection after merging is post-processed;
Step 6:B-spline curves fitting is carried out to the cloud data after step 5 processing, obtains the bone as Hanzi features
Frame.
Specifically included in the step 1:
The gray level image of the text got to scanning is smoothed, and then the image after smoothing processing is carried out
Binarization operation is converted into the bianry image of only black and white, and wherein white pixel is background color, before black picture element is
Scape Chinese character color, the processing method of smoothing processing are to carry out Gaussian smoothing to neighborhood using OpenCV cvSmooth methods.
Specifically included in the step 2:
Down-sampling is carried out to the image after binary conversion treatment, sampling is only carried out on black picture element, chooses adopting for setting
Image is converted into point cloud model data by sample ratio, the transverse direction of each pixel being sampled and vertical coordinate composition point cloud number
According to a point coordinates.
Specifically included in the step 3:
The pixel point cloud that the gray level image after binary conversion treatment is checked using corrosion carries out corrosion set operation, until reaching
The condition of etch-stop, obtain final thick axis point set;
The condition of the etch-stop is:There are eight abutment points in image after binary conversion treatment around current point, judge
Whether any two adjoining black color dots of current point connect from each other, if do not connected, then it represents that be axial point, be not otherwise
Axial point.
Splitting condition is in the step 4:
Two obtained local principal directions are analyzed using to the point cloud subset points progress PCA in local two adjacent circles
Angle α sets splitting condition.
PCA analyses division, is specifically included in the step 4:
One untreated point of any selection is concentrated from thick axial point, calculates local principal direction (Main Local
Direction, MLD), if current local principal direction Vi and subsequent point part principal direction Vj result for -1 (when Vi with
Vj angle is more than θ (θ is pre-defined turning angle)), then it is turning point, is not otherwise, continues to search down along axial point
One PCA central point is handled, if search, less than axial point, current point is turning point, finally by any one type i
First PCA central point and last PCA central point labeled as starting point x (i) and terminal y (i), and the point between this is set
For i types, the PCA central point Center (i) between this two-end-point, radius R (i) and local principal direction Vi are recorded work
Merge and fitting is used, then concentrated from thick axial point and arbitrarily select a point to carry out PCA analysis divisions again, until all points
It is processed and finishes, by limited number of time iteration, most thick axial point concentrates institute to be a little all disposed at last, obtains final
Division set.
The step 5 specifically includes:
Remember that the maximum PCA radiuses during PCA analyses are MaxRadius, remember a certain classification i end points x (i) and y (i),
PCA radiuses corresponding to two-end-point are Rx (i) and Ry (i), and dist (x (i), y (j)) returns to the distance of any two ends point, merges behaviour
Only occur at the end points of arbitrary classification, by judging whether meet any merging condition therein at end points, when these segmentation symbols match,
Complete last union operation.
The merging condition includes:
Condition one:For the i that classifies two end points x (i) and y (i), if dist (x (i), y (i))<=Rx (i)+Ry
(i), and classification i whole PCA central points only have two-end-point, i.e., central point number is 2 (other classification at least 3 centers
Point), and have second of the classification j intersected with two characteristic circles end points x (j) or y (j), then the i and classification j that classifies meets merging bar
Part;
Condition two:For arbitrary classification i and classification j, if arbitrary two-end-point in two classes, it is assumed that for classification i x end points
With the y end points of j classes, meet that the vectorial Vx angles for the vectorial Vij and end points x that x (i) and y (j) line are formed are less than θ (θ is pre-
The turning angle first defined) and classification i and j end points of classifying at any two points maximum step-length be present and be not more than RectSize/16
The minimum spanning tree of (RectSize is the Euclidean distance of farthest 2 points of distance in point cloud), and (V (x (i)), V (y (j)) folder
Angle belongs to [0, θ] or belonged to [180-θ, 180], then it represents that classification i and classification j meets merging condition, and wherein θ is pre-defined
Turning angle;
Condition three:In arbitrary classification i and classification j, if any two ends point in two classes, it is assumed that be the x end points and j classes of i classes
Y end points, dist (x (i), y (j))<=Rx (i)+Ry (j), and y (j) gets along well in two end points that itself x end points is intersecting, x (i)
Itself y end points of getting along well intersects, at the same two-end-point also get along well the third classification PCA unit characters circle it is intersecting, and classify i and classification
There is the minimum spanning tree that maximum step-length is not more than RectSize/16 in any two points at j end points, then the i and classification j that classifies meets
Merging condition.
The step 6 specifically includes:
Using least square fitting (SDM) method, caused center point set S during PCA is analyzed firstCenter(i)(should
Center point set is other points in the point cloud that radius covers around PCA central points) as initial B-spline control point, to control
The quantity and position for making point are adjusted, and with SDM method iterative fitting B-spline curves, the B-spline curves finally given are made
For the framework characteristic of Chinese character.Relative to other iterative fitting B-spline curves methods, SDM possesses faster iteration speed, more stable
Convergence.Last Chinese character skeleton feature is used as by the use of the B-spline curves.
Etching operation is carried out to the gray level image after binary conversion treatment and obtains thick axis point set;Use morphology bianry image
Conversion to carry out etching operation to bianry image be set operation to pixel.Use specific corrosion core (or being template)
Set operation is carried out to pixel, so that it may so that border is internally shunk, by limited number of time etching operation, finally give axis.
Shown in the set of corrosion is defined as follows:
S'=S- ∑ S φ i formula 1
S' is the pixel set after once corroding in above-mentioned formula, and S is original pixels collection, and φ i are corrosion core in i positions
Set." " operation represents:If it is number c that core is corroded with S intersection points number in current location, the intersection point is returned, is otherwise returned
Return 0.Pti represents the pixel on the bianry image of i points.By limited number of time etching operation, final axis can be obtained.
PCA analyses (principal component analysis, PCA) are also referred to as principal component analysis, are multivariables point
Basic fundamental in analysis.The method that this patent proposes is carried out PCA minutes using to the point cloud point set in local two adjacent circles
The obtained angle α of two local principal direction is analysed to set splitting condition, while the average central point in PCA analyses is by as most
Control point during B-spline curves fitting is carried out afterwards.The point cloud point set in a radius of circle R is carried out PCA minutes in this patent
Analysis is referred to as a PCA unit, and a PCA unit includes characteristic radius R, average central point χ, is radius using χ as the center of circle using R
Characteristic circle and local direction V.
PCA is to centered on a point, radius R is that the process of the point cloud point set progress PCA analyses in circle is:
(1) the average χ of sampled point is calculated first, as shown in Equation 3:
χ=(∑ Xi)/N formula 3
Xi is the point in radius circle, and N is the point number in radius circle.
(2) and then original vector matrix X and χ deviation matrix C=X-χ are calculated, then calculates covariance matrix T=C
CT。
(3) eigenvalue λ and characteristic vector M of covariance matrix are finally calculated using SVD decomposition methods.Matrix C is carried out strange
The formula of different decomposition (SVD decomposition) is:
C=MSVTFormula 4
Wherein, M is covariance matrix T characteristic vector, and column vector arrangement, S is to decompose obtained diagonal by unusual
Matrix, VTIt is a square formation, while V column vector is CTC characteristic vector.By the PCA analyses to cloud data, we obtain
Local principal direction (Main Local Direction, MLD) is arrived, direction V is the principal element for setting splitting condition.
Division result is merged, and the point of intersection after merging is post-processed, in all possible intersection
Point, and the bigger place of aim curve curvature may be split off being labeled as turning point, but in the bigger place of curvature
And intersection may some classification be to belong to of a sort, therefore the type of " consistent " should be merged, to reduce
The quantity of classification.Assuming that any preferable matched curve length of point cloud classifications is no less than being equal to its mean breadth.
After merging fixed point cloud-type, error is there may be, it is necessary to crosspoint due to putting cloud division under different radiuses
The point cloud at place is post-processed, and processing method is based on the crosspoint method for relocating apart from weights.
Beneficial effects of the present invention:
Text image by smooth, denoising, is then converted to bianry image, then bianry image is entered by the present invention
Row down-sampling is converted into cloud data model, and PCA analyses are carried out on the point cloud model, is operated by " division-merging " to point
Cloud data are classified, and the fitting of B-spline curves is finally carried out on the basis of classification, Chinese character is used as using the matched curve
Framework characteristic, carry out Chinese Character Recognition sort operation.Have the following advantages that:
(1) the problem of handling Chinese character image is converted into point cloud model processing, can preferably reduce noise to Chinese character structure model teaching aid
The influence of frame extraction.Also there is relatively good result to the more violent situation of grey scale change.
(2) curve fitted and original point cloud data and the ideal curve goodness of fit are higher, and can handle well
The situation of infall, it can preferably show the framework characteristic of Chinese character.
(3) normalization for not requiring to carry out Chinese character position pre-processes, it is possible to rational Chinese character skeleton is extracted, can be with
The identification of characteristic point is carried out to Chinese character skeleton in follow-up processing and the Classification and Identification of Chinese character operates.
Brief description of the drawings
Fig. 1 is Research of Chinese Feature Extraction and the overview flow chart of identification on the basis of the present invention;
Fig. 2 is the feature extraction schematic flow sheet of the present invention;
Fig. 3 is the Chinese character skeleton feature extraction example realized according to the present invention;
Fig. 4 (a) carries out the X-type corrosion core used during etching operation;
Fig. 4 (b) carries out the cross corrosion core used during etching operation;
Fig. 4 (c) carries out the full eight neighborhood type used during etching operation.
Embodiment:
The present invention is described in detail below in conjunction with the accompanying drawings:
As shown in figure 1, the Research of Chinese Feature Extraction based on the present invention comprises the following steps with the overall procedure identified:
A. to needing text to be processed to be scanned, gray level image is obtained.
B. gray level image is pre-processed, such as smooth, binaryzation obtains bianry image.
C. the extraction of feature is carried out to the image comprising character, obtains characteristic vector set.
D. it is compared according to this feature vector set with priori Hanzi features storehouse, matching operation, Chinese character is identified.
E. the Chinese character identified is post-processed to obtain final text.
Present disclosure mainly focuses on the framework characteristic for how fast and effectively extracting Chinese character, is represented with this
Information, the architectural feature of Chinese character.As shown in Fig. 2 comprising the following steps that for the present invention is shown:
Step 1:Smooth, image denoising processing is carried out to text gray level image, and carries out binaryzation.
Step 2:Down-sampling is carried out to binary image, is converted into point cloud model.
Step 3:Etching operation is carried out on point cloud model and obtains thick axis.
Step 4:PCA analyses are carried out based on the axis, obtain dividing result.
Step 5:Division result is merged, and post-processed to putting cloud-type after merging.
Step 6:B-spline curves fitting is carried out to sorted cloud, obtains skeleton.
In step 1, smooth, image denoising processing is carried out to text gray level image, and carry out the detailed process of binaryzation such as
Under:
The cvSmooth methods in OpenCV are used to carry out 5X5 neighborhood sizes for the gray-scale map comprising text got
Gaussian smoothing.Then bianry image is converted into the gray-scale map after smoothing denoising using fixed threshold value, bianry image only has
The pixel of the pixel of black and white, wherein black represents the region that the text in original gray level image is covered.
In step 2, down-sampling is carried out to binary image, the detailed process for being converted into point cloud model is as follows:
Down-sampling is carried out for the black picture element in the bianry image that is got in step 1, different sampling ratios can be chosen
Such as 1/5,1/8 etc. of pixel, will sample obtained black pixel point coordinate transformation is point cloud model data, the cloud data
The feature of text can preferably be characterized.As shown in Figure 3.
In step 3, on point cloud model carry out etching operation obtain thick axis detailed process it is as follows:
Corrosion set operation is carried out to pixel point cloud using specific corrosion core (or being template), obtain it is rough in
Axle.Simultaneously using three kinds of corrosion cores as shown in Fig. 4 (a)-Fig. 4 (c), using the eight neighborhood of current point as differentiation, respectively cross
Type, X-type and full eight neighborhood type.Multiple etching operation, the condition until reaching etch-stop are carried out according to ABC-ABC order.
It is effectively to eliminate using the advantages of above-mentioned three kinds corrosion core and covers serrated boundary caused by bianry image.
The decision condition of the definition of etch-stop operating condition, i.e. axial point (core point) is as follows:In bianry image
There are eight abutment points around current point, judge whether its any two adjoining black color dots connect from each other, if do not connected
(connection here also includes diagonal and connected) then represents it is axial point, is not otherwise axial point.By limited number of time etching operation,
Final axis can be obtained.
Step 4 is based on the axis and carries out PCA analyses, and the detailed process for obtaining dividing result is as follows:
According to the thick skeleton is corroded in step 3, PCA analyses are carried out on skeletal point.Estimating the PCA of some point
During radius, an initial radium R is first provided, then using the axial point as the center of circle, using R as in the circle of radius, if white pixel
(Σ pti are white pixel point number sum to the ratio between point and black pixel point β=Σ pti/ Σ ptj, and Σ ptj are black pixel point
The conjunction of number) more than a certain threshold value, such as choose β and be equal to 0.15, then the radius that radius R ' is handled as PCA, otherwise increase
The radius.Central point during PCA analyses is carried out, is obtained in the axial point obtained from step 2.When we can be in any
When axle point carries out PCA analyses, it is possible to judge whether two neighboring principal direction V is harmonious.It is given below in the next PCA of search
Heart point Center (i+1) and judge the definition whether two principal directions Vi, Vj are harmonious.
Define axis point set Saxis, current PC A central point Center (i), current PC A treatment radius R (i), at current PC A
Manage local principal direction vector Vi, alternative axis point set Salt, SaltIn any point be designated as Ptj, and define angle (Vi,
Vj) to calculate vectorial Vi and Vj angle, threshold angle θ, wherein θ ∈ (0,45) are defined.In order that obtain the partial points of PCA processing
Cloud data can be preferably covered, sets the optional axial point Center (i+1) of next forward direction and reversely optional axial point
Center (i-1) and distance dist (i, i+1), the dist (i, i-1) of currently processed central point Center (i) belong to [1, R
(i)+0.5*R(i)-1]。
Positive optional point set:
Sop_altThe point concentrated for positive optional point set, Pt (x) for optional point, angle ((Pt (x)-Center
(i) sharp angle between Pt (x) points and current PC A central point Center (i))) is represented.
Reverse optional point set:
Sneg_alt=Pt (x) | angle ((Pt (x)-Center (i)), Vi)
∈[180-θ,180],Pt(x)∈SaxisFormula 6
Sneg_altFor reverse optional point set, Pt (x) is the point that optional point is concentrated, downward a central point:
Reverse next central point:
Any PCA central points handle function:
The algorithm that PCA processing and division are carried out to axis point is as follows:Defining classification counter m, initial m are 0.
# steps 1.m adds 1, the PCA processing since untreated axial point concentration is appointed and taken a little, and it is m to mark such.
# steps 2. handle current point according to formula 9, if result is -1, terminate such search, when the place of front direction
Reason terminates, and goes to step 1.If result is 1 or 2,3 are gone to step.
# steps 3. are according to formula 5,6,7,8 points of both direction recursive search processing subsequent points, if subsequent point is searched
2 are gone to step, is otherwise stopped when front direction searches element, the processing in direction terminates, and goes to step 1.
By limited number of time iteration, most axial point concentrates institute to be a little all disposed at last, obtains final division set.
Step 5 merges to division result, and as follows to the process that point cloud-type is post-processed after merging:
Dividing all possible crosspoint that, and the place that aim curve curvature is bigger, be all split into transferring
Point, but curvature it is bigger place and intersection may some classification be to belong to of a sort, therefore should general " one
The type of cause " merges, to reduce the quantity of classification.Assuming that any preferable matched curve length of point cloud classifications is no less than
Equal to its mean breadth.Remember that the maximum PCA radiuses during PCA analyses are MaxRadius, remember a certain classification i end points x (i)
With y (i), PCA radiuses corresponding to two-end-point are Rx (i) and Ry (i), and dist (x (i), y (j)) returns to the distance of any two ends point.
Notice that union operation is only occurred at the end points of arbitrary classification.The definition of " uniformity " is given below.
Condition one:For the i that classifies two end points x (i) and y (i), if dist (x (i), y (i))<=Rx (i)+Ry
(i), and classification i whole PCA central points only have two-end-point, and have intersect with two characteristic circles second j that classifies end points x
(j) or y (j), the then i that classifies are consistent with classification j.
Condition two:For arbitrary classification i and classification j, if 2 class any two ends points (it is assumed herein that for i classes x end points and
The y end points of j classes) meet that the vectorial Vx angles for the vectorial Vij and end points x that x (i) and y (j) line are formed are less than θ (θ is advance
The turning angle of definition) and classification i and j end points of classifying at any two points maximum step-length be present no more than RectSize/16
Minimum spanning tree 1., and angle (V (x (i)), V (y (j)) ∈ [0, θ] or ∈ [180-θ, 180] 2., then it represents that classification i
It is consistent with classification j, wherein θ is previously defined turning angle.
Condition three:In arbitrary classification i and classification j, if 2 class any two ends points are (it is assumed herein that x end points and j for i classes
The y end points of class) dist (x (i), y (j))<3., and y's=Rx (i)+Ry (j) (j) gets along well that itself x end points is intersecting, x in two end points
(i) itself y end points of getting along well intersects, at the same two-end-point also get along well the third classification PCA unit characters circle it is intersecting 4., and the i that classifies
With classification j end points at any two points exist maximum step-length be not more than avgRectSize minimum spanning tree, then classify i and point
Class j is consistent.
The adequacy of three merging conditions of brief analysis:Because the maximum radius of PCA units is MaxRadius, therefore need
Any two class end points x (i) of union operation is judged whether to, y (j) meets inequality:
Dist (xi, yj)-Rx (i)-Ry (j) <=2*MaxRadius formula 10
For condition one, because the preferable matched curve length of any point cloud classifications is no less than the vacation of its mean breadth
If therefore the PCA units of any point cloud classifications is not less than two, it can thus be concluded that situation of the PCA units less than two must
By unreasonable classification caused by axis influence of noise, it is necessary to be merged with closest type when being division.
Condition two is consistent to intersection direction but merged by the type of another kind of cut-off.Appoint for intersection
Meaning needs two classes that merge, it is necessary to ensure that MLD angles at two class end points are less than the threshold value (in condition two formula 2.) of definition, and
The point of two classes is continuous (in condition two formula 1.).
Condition three is to originally belonging to same type but being split into more because adjacent PCA units MLD angles exceed threshold value
The situation of class merges.Wherein 3. formula shows that two-end-point PCA unit circles intersect, 4. formula shows any one at two-end-point
Class right and wrong are from ring, and two classes the 3rd class of getting along well intersects, while should also meet the continuity at the midpoint of condition two.
Defined for " uniformity " in above-mentioned condition, represent that two classes are the classification that can merge, such as classify i and classification j
It is consistent, then i classes is included into j classes.Consistent type can be combined into by one kind by union operation.
It is to the method that is post-processed of point cloud after merging:
For any one infall point pt (k), the initial B of the multiclass point datas such as type i, j that itself and the point belong to is calculated
The minimum range D of SPLpdk, and estimate this institute with the PCA treatment radius Ri (k), Rj (k) etc. nearest apart from the point
Belong to the mean radius of the class at pt (k) points such as type i, j.Assuming that certain point pt (k) belongs to two classes i, j, and apart from initial B
SPL Curve (i) and Curve (j) minimum range is respectively DpdkAnd D (i)pdk(j) the nearest of i and j classes, is belonged to
PCA processing centers point radius is Ri (k), Rj (k), and note weighted minimum distance ratio is λD:
λD=(Dpdk(i)/Dpdk(j)) × (Rj (k)/Ri (k)) formula 11
If λDLess than threshold value ratio either more than if 1/ratio, then i and j classes are belonged to, if λDMore than threshold value
The ratio points belong to j classes, if λDLess than 1/ratio, belong to i classes.
Step 6 carries out B-spline curves fitting to sorted cloud, and the process for obtaining skeleton as Hanzi features is as follows:
Classification results are carried out with the fitting of B-spline curves, using SDM methods, is produced during foregoing PCA is analyzed first
Center point set SCenter(i)As initial control point, the quantity and position at control point are adjusted, and with SDM methods
Iterative fitting B-spline curves, until its squared distance (SD) error is less than certain threshold epsilon or carries out multiple
Its squared distance (SD) error changes are less than certain threshold value ζ after iteration.Final B-spline curves are as most
The framework characteristic of whole Chinese character.
Although described above combines design sketch and flow chart has carried out detailed description, but is not the guarantor to the present invention
The limitation that scope is carried out is protected, technical staff can modify or deform to algorithm on the basis of the present invention, but acquired results
Still it is within the scope of the present invention.
Claims (1)
1. a kind of antinoise Research of Chinese Feature Extraction method based on skeleton, comprises the following steps:
Step 1:The gray level image for needing text to be processed is pre-processed, including gray level image is smoothed, and
Gray level image is subjected to binary conversion treatment;
Step 2:Down-sampling is carried out to the gray level image after binary conversion treatment, generates point cloud model data;
Step 3:Etching operation is carried out to the point cloud model data and obtains thick axis point set;
Step 4:PCA analysis divisions are carried out according to splitting condition based on the axial point cloud, obtain dividing result;
Step 5:Division result is merged, and the point of intersection after merging is post-processed;
Step 6:B-spline curves fitting is carried out to the point cloud after step 5 processing, obtains the skeleton as Hanzi features;
Specifically included in the step 1:
The gray level image of the text got to scanning is smoothed, and then carries out two-value to the image after smoothing processing
Change the bianry image that operation is converted into only black and white, wherein white pixel is background color, and black picture element is the prospect Chinese
Word color, the processing method of smoothing processing are to carry out Gaussian smoothing to neighborhood using OpenCV cvSmooth methods;
Specifically included in the step 2:
Down-sampling is carried out to the image after binary conversion treatment, sampling is only carried out on black picture element, chooses the sampling ratio of setting
Image is converted into point cloud model data by example, and the transverse direction of each pixel being sampled and vertical coordinate form cloud data
One point coordinates;
Specifically included in the step 3:
The pixel point cloud that the gray level image after binary conversion treatment is checked using corrosion carries out corrosion set operation, is terminated until reaching
The condition of corrosion, obtain final thick axis point set;
Shown in the set of corrosion is defined as follows:
S'=S- ∑ S φ i formula 1
S' is the pixel set after once corroding in above-mentioned formula, and S is original pixels collection, and φ i are collection of the corrosion core in i positions
Close, "●" operation represents:If it is number c that core is corroded with S intersection points number in current location, the intersection point is returned, otherwise returns to 0,
Pti represents the pixel on the bianry image of i points;
PCA analyses division, is specifically included in the step 4:
One untreated point of any selection is concentrated from thick axial point, calculates local principal direction, if current local principal direction Vi
Result with subsequent point part principal direction Vj is -1, then is turning point, is not otherwise, is continued next along axial point search
PCA central points are handled, if search, less than axial point, current point is turning point, finally by the of any one type i
Point between this is set to i by one PCA central point and last PCA central point labeled as starting point x (i) and terminal y (i)
Type, the PCA central point Center (i) between this two-end-point, radius R (i) and local principal direction Vi are recorded and closed
And and fitting be used, then concentrated from thick axial point and arbitrarily select point to carry out PCA analysis divisions again, until all points all
It is processed, by limited number of time iteration, most thick axial point concentrates institute to be a little all disposed at last, obtains final point
Split set;The splitting condition is:Two parts for analyzing to obtain using the point cloud in local two adjacent circles is carried out PCA
The angle α of principal direction sets splitting condition;
The step 5 specifically includes:
Remember that the maximum PCA radiuses during PCA analyses are MaxRadius, remember a certain classification i end points x (i) and y (i), both ends
PCA radiuses corresponding to point are Rx (i) and Ry (i), and dist (x (i), y (j)) returns to the distance of any two ends point, and union operation is only
Occur at the end points of arbitrary classification, by judging whether meet any merging condition therein at end points, when these segmentation symbols match, complete
Last union operation;
The merging condition includes:
Condition one:For the i that classifies two end points x (i) and y (i), if dist (x (i), y (i))<=Rx (i)+Ry (i), and
Classification i whole PCA central points only have two-end-point, i.e. central point number is 2, and have second of the classification intersected with two characteristic circles
J end points x (j) or y (j), the then i and classification j that classifies meet merging condition;
Condition two:For arbitrary classification i and classification j, if arbitrary two-end-point in two classes, it is assumed that for classification i x end points and j
The y end points of class, meet that the vectorial Vx angles for the vectorial Vij and end points x that x (i) and y (j) line is formed are less than θ, θ is fixed in advance
The turning angle of justice, and maximum step-length be present no more than RectSize/16 most in any two points at classify i and j end points of classifying
Small spanning tree, RectSize are the Euclidean distance of farthest 2 points of distance in point cloud;And (V (x (i)), V (y (j)) angle category
In [0, θ] or belong to [180-θ, 180], then it represents that classification i and classification j meet merging condition, and wherein θ is previously defined turnover
Angle;
Condition three:In arbitrary classification i and classification j, if any two ends point in two classes, it is assumed that the y ends of x end points and j classes for i classes
Point, dist (x (i), y (j))<=Rx (i)+Ry (j), and in two end points y (j) get along well itself x end points is intersecting, x (i) discord from
Body y end points intersects, at the same two-end-point also get along well the third classification PCA unit characters circle it is intersecting, and classify i and classification j end points
There is the minimum spanning tree that maximum step-length is not more than avgRectSize in any two points at place, avgRectSize is 2 points in point cloud
Between Euclidean distance average value, then classify i and classification j meet merging condition;
The step 6 specifically includes:
Using minimum squared distance approximating method, caused center point set S during PCA is analyzed firstCenter(i)As initial
B-spline control point, the quantity and position at control point are adjusted, and with SDM method iterative fitting B-spline curves, most
Framework characteristic of the B-spline curves obtained eventually as Chinese character;
The condition of the etch-stop is:There are eight abutment points in image after binary conversion treatment around current point, judge current
Whether any two adjoining black color dots of point connect from each other, if do not connected, then it represents that be axial point, be not otherwise axis
Point.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410360498.2A CN104156730B (en) | 2014-07-25 | 2014-07-25 | A kind of antinoise Research of Chinese Feature Extraction method based on skeleton |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410360498.2A CN104156730B (en) | 2014-07-25 | 2014-07-25 | A kind of antinoise Research of Chinese Feature Extraction method based on skeleton |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104156730A CN104156730A (en) | 2014-11-19 |
CN104156730B true CN104156730B (en) | 2017-12-01 |
Family
ID=51882227
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410360498.2A Active CN104156730B (en) | 2014-07-25 | 2014-07-25 | A kind of antinoise Research of Chinese Feature Extraction method based on skeleton |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104156730B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106780412B (en) * | 2016-11-28 | 2020-04-14 | 西安精雕软件科技有限公司 | Method for generating machining path by utilizing handwritten body skeleton line |
CN108171144B (en) * | 2017-12-26 | 2020-12-11 | 四川大学 | Information processing method, information processing device, electronic equipment and storage medium |
CN109147469B (en) * | 2018-07-09 | 2021-07-30 | 安徽慧视金瞳科技有限公司 | Calligraphy practicing method |
CN109409211B (en) * | 2018-09-11 | 2020-09-18 | 北京语言大学 | Processing method, processing device and storage medium for Chinese character skeleton stroke segments |
CN109712147A (en) * | 2018-12-19 | 2019-05-03 | 广东工业大学 | A kind of interference fringe center line approximating method extracted based on Zhang-Suen image framework |
CN110246104B (en) * | 2019-06-13 | 2023-04-25 | 大连民族大学 | Chinese character image processing method |
CN113647212A (en) * | 2021-05-07 | 2021-11-16 | 天津理工大学 | Weeding robot and weeding method based on crop stem positioning |
CN114494704A (en) * | 2022-02-16 | 2022-05-13 | 重庆大学 | Method and system for extracting framework from binary image in anti-noise manner |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103186787A (en) * | 2011-12-31 | 2013-07-03 | 廖志武 | Low-quality Chinese character primary skeleton extraction algorithm based on point cloud model |
CN103268631A (en) * | 2013-05-23 | 2013-08-28 | 中国科学院深圳先进技术研究院 | Method and device for extracting point cloud framework |
-
2014
- 2014-07-25 CN CN201410360498.2A patent/CN104156730B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103186787A (en) * | 2011-12-31 | 2013-07-03 | 廖志武 | Low-quality Chinese character primary skeleton extraction algorithm based on point cloud model |
CN103268631A (en) * | 2013-05-23 | 2013-08-28 | 中国科学院深圳先进技术研究院 | Method and device for extracting point cloud framework |
Non-Patent Citations (4)
Title |
---|
Control point adjustment for B-spline curve approximation;Huaiping Yang等;《Computer-Aided Design》;20040630;第36卷(第7期);正文第641页左侧第3段第4-10行,第4段1-2行和右侧第1段 * |
Fitting multiple curves to point clouds with complicated topological structures;Dongfang Zhu等;《Computer-Aided Design andcomputer Graphics, IEEE》;20140515;正文第63页左侧倒数第1段和右侧第1段,图5(a)(b) * |
低质汉字骨架提取研究;侯显玲;《中国优秀硕士学位论文全文数据库信息科技辑》;20120515(第5期);全文 * |
骨架提取算法研究与应用;张静;《中国优秀硕士学位论文全文数据库信息科技辑》;20140715(第7期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN104156730A (en) | 2014-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104156730B (en) | A kind of antinoise Research of Chinese Feature Extraction method based on skeleton | |
CN108256456B (en) | Finger vein identification method based on multi-feature threshold fusion | |
CN101561866B (en) | Character recognition method based on SIFT feature and gray scale difference value histogram feature | |
CN108509881A (en) | A kind of the Off-line Handwritten Chinese text recognition method of no cutting | |
CN109685065B (en) | Layout analysis method and system for automatically classifying test paper contents | |
CN113033567B (en) | Oracle bone rubbing image character training method fusing segmentation network and generation network | |
US9224207B2 (en) | Segmentation co-clustering | |
Akhand et al. | Convolutional Neural Network based Handwritten Bengali and Bengali-English Mixed Numeral Recognition. | |
Dai Nguyen et al. | Recognition of online handwritten math symbols using deep neural networks | |
CN105117707A (en) | Regional image-based facial expression recognition method | |
Al Abodi et al. | An effective approach to offline Arabic handwriting recognition | |
Rosyda et al. | A review of various handwriting recognition methods | |
Vithlani et al. | Structural and statistical feature extraction methods for character and digit recognition | |
CN114387592A (en) | Character positioning and identifying method under complex background | |
CN107609482B (en) | Chinese text image inversion discrimination method based on Chinese character stroke characteristics | |
Aravinda et al. | Template matching method for Kannada handwritten recognition based on correlation analysis | |
Jameel et al. | Offline recognition of handwritten urdu characters using b spline curves: A survey | |
Ismail et al. | Geometrical-matrix feature extraction for on-line handwritten characters recognition | |
CN110378337B (en) | Visual input method and system for drawing identification information of metal cutting tool | |
Al-shatnawi et al. | The Thinning Problem in Arabic Text Recognition-A Comprehensive Review | |
CN111652287A (en) | Hand-drawing cross pentagon classification method for AD (analog-to-digital) scale based on convolution depth neural network | |
Edan | Cuneiform symbols recognition based on k-means and neural network | |
CN111325270B (en) | Dongba text recognition method based on template matching and BP neural network | |
Eraqi et al. | HMM-based offline Arabic handwriting recognition: Using new feature extraction and lexicon ranking techniques | |
Naz et al. | Challenges in baseline detection of cursive script languages |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |