CN104992161B - A kind of Hanzi component segmentation and structural determination method based on part identification - Google Patents
A kind of Hanzi component segmentation and structural determination method based on part identification Download PDFInfo
- Publication number
- CN104992161B CN104992161B CN201510424057.9A CN201510424057A CN104992161B CN 104992161 B CN104992161 B CN 104992161B CN 201510424057 A CN201510424057 A CN 201510424057A CN 104992161 B CN104992161 B CN 104992161B
- Authority
- CN
- China
- Prior art keywords
- stroke
- chinese character
- hanzi component
- hanzi
- relation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/32—Digital ink
- G06V30/333—Preprocessing; Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/32—Digital ink
- G06V30/36—Matching; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/28—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
- G06V30/287—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters
Abstract
The invention discloses a kind of Hanzi component segmentation based on part identification and structural determination method, carries out processing completion Hanzi component modeling to the Hanzi component picture of specific font first;Secondly, it is each section to input Chinese character decomposition, each part obtains the maximum pen section set of one group of similitude in control storehouse, while obtains inputting the part recognition result of Chinese character by optimum combination strategy, additionally obtains inputting in Chinese character the relation of the initial pen section set detected and corresponding component;Then, using contour detecting algorithm and border following algorithm, the profile information using storage into chain form, the corresponding relation of each profile point and nearest skeletal point is found, completes the cutting operation to Hanzi component;Finally, using the gold case theory of Chinese character configuration, the layout of part and the structure of Chinese character are accurately analyzed, so as to complete the structural determination to Chinese character and classification.The present invention preferably can be split and structural determination compared to the method for other images segmentation to Hanzi component.
Description
Technical field
The invention belongs to the image that the part of virtual reality technology and computer vision field, particularly Chinese character image identifies
Process field and part segmentation and the area of pattern recognition judged.
Background technology
Chinese character has larger difference compared with language and characters in structure.Chinese character is different by characterizing as pictograph
The all parts of implication are combined by different combinations.For example, " woods " and " China fir " two words, there is a part
" wood ", the meaning of tree is all represent in practice.Therefore, how by Chinese character separating into the part with specific semantic information be learn
Practise the important component of Chinese.Meanwhile correctly Hanzi component is split and structural determination also contribute to promote Chinese character
Internationalization, it helps propagate Chinese culture.
At present, the part identification of Chinese character picture is broadly divided into two kinds of Statistics-Based Method and structure-based method.
Based on statistical method be by same quasi-mode there is same alike result based on recognition methods, using Chinese character image as an entirety come
Processing, the characteristic vector of extraction reflection image Global Information, feature based vector are identified.Currently conventional statistical method can
It is divided into Furthest Neighbor, decision function method, bayes decision method and the class of modelling four.Statistics-Based Method extraction feature is easy, and
And the deformation of noise and Chinese character can be tolerated, there is preferable robustness and antijamming capability, but this method distinguishes likeness in form word
Ability is poor, for example " thousand " and " dry " such likeness in form word just easily causes the upper mistake of identification.Based on structural approach by Chinese character
Image understanding is the assembly of a smaller ones (being referred to as primitive), and number, type and its correlation of primitive form Chinese character
Structure, be identified based on the expression to Hanzi structure.The recognition methods of structure is currently based on firstly the need of for pen for writing Chinese characters
The extraction of picture, the extraction to stroke mainly have top-down and bottom-up two kinds of strategies.Structure-based recognition methods can be compared with
The architectural feature of things is reacted well, likeness in form word can be more distinguished compared with Statistics-Based Method, but structure-based method is not
Enough robusts, it is easily affected by noise, in addition, stable effective extraction structural motif is extremely difficult from image.
Hanzi component is split according to part recognition result and detected firstly the need of the edge to part and profile,
According to the skeleton of obtained target component, it is corresponding with skeleton to complete marginal point.Existing method is directly entered using Canny algorithms
Row rim detection, obtain result and include much false edges as caused by noise or other reasonses, caused image border may not
Closure.According to Hanzi component segmentation result, relative position relation is upper and lower relation or left-right relation etc. between two parts, can letter
Singly it is determined as up-down structure or tiled configuration, but the relation of the encirclement structure for complexity, using simple bounding volume method simultaneously
These class Chinese characters can not be distinguished well, can not direct judging part structure.
The content of the invention
The invention solves technical problem to be:A kind of overcome the deficiencies in the prior art, there is provided Chinese character based on part identification
Part split with structural determination method, effectively the part of Chinese character can be split and structural determination.
Technical scheme is used by the present invention solves above-mentioned technical problem:A kind of Hanzi component point based on part identification
Cut with structural determination method, realize that step is as follows:
Step (1), the modeling of part statistical framework, describe the stroke and structural relation in Hanzi component, generate matching block
Candidate's stroke of middle mark stroke;
Step (2), according to the candidate's stroke obtained in step (1), utilize Dynamic Programming Idea and the life of optimum combination strategy
Into the optimum combination result of part, the result as Hanzi component identification;
Step (3), according to the Hanzi component recognition result obtained in step (2), it is right based on profile and skeleton corresponding relation
Hanzi component is split;
Step (4), according to the Hanzi component segmentation result obtained in step (3), theoretical accurate point according to gold lattice
The layout and Hanzi structure of part are separated out, structural determination is carried out to Chinese character.
Specific step is as follows:
Step (1), the modeling of part statistical framework, generate candidate's stroke:To 687 in existing standard Chinese character part library
Hanzi component picture carries out extracting skeleton pretreatment, detects the crosspoint between the end points of stroke and stroke as characteristic point, leads to
The line crossed between characteristic point obtains initial pen section;The initial pen section for merging to obtain by interactive operation, obtains the Chinese marked
The stroke of word part;According to the direction character of Chinese character unit stroke, Gabor characteristic extraction is carried out to Chinese character unit stroke, completed
The statistical modeling of Chinese character unit stroke;Using principle of maximum entropy, chosen by using approximate construction relation pair neighbours' stroke;
When identifying some target component, each stroke of corresponding component is obtained first, and one group of possible solution is generated for each stroke,
These solutions are likely to be initial pen section, it is also possible to which the combination of initial pen section, the rule for defining initial pen section combination is two pens
Section joins end to end and direction difference is no more than 15 °, or one of pen section is sufficiently small, and two such pen section is combined into possibility
Stroke matching solution be added in candidate's stroke queue;
Step (2), part is identified based on the optimal principle:Search graph is built, Hanzi component matching problem is turned
The search procedure of figure is turned to, in the search procedure of figure, if being taken with matching candidate stroke in the initial pen section to inputting Chinese character
It is upper with above the stroke of candidate conflicts, then this section can not be chosen.To in search graph, being searched for from first row to last one row
All feasible solutions gone out, using optimum combination strategy, find optimum combination as the recognition result for inputting Hanzi component, pass through by
The problem is described as a knapsack problem to solve;
Step (3), based on profile and skeleton corresponding relation, Hanzi component is split:To being obtained most in step (2)
Excellent part recognition result, the profile of part is obtained using Canny edge detection algorithms;The profile directly obtained is not deposited actually
In forerunner and the link information of follow-up marginal point, therefore utilize the profile point chain tissue shape of border following algorithm searching correspondence profile
Formula, the segmentation of subsequent parts is can be used for after the objective contour information storage of obtained chain;Find pair of profile point and skeleton
It should be related to, the profile point corresponding to part skeleton is extracted and connected in the position of disconnection, forms the result of part segmentation.
It is step (4), theoretical according to gold lattice, Hanzi structure is judged:According to the relative position between part
Relation, common Hanzi structure can be divided into 13 kinds, the palace lattice principle formed using Chinese character, to describe the structure of Chinese character spy
Sign, using based on improved gold case theory in nine grids theoretical foundation, is analyzed Hanzi structure;It is accurate to build structural determination
Then, structural determination rule is converted into algorithm feasible on computer, relation between Hanzi component is quickly judged by index value method.
Further, part statistical framework models in the step (1), and the particular content of generation candidate's stroke is:
Step (A1), image thinning and skeletal extraction are carried out to the part picture in standard part base, detect the end of stroke
Crosspoint between point and stroke obtains initial pen section as characteristic point by the line between characteristic point;Pass through interactive operation
Merge obtained initial pen section, form the stroke of the Hanzi component of standard of comparison;
Step (A2), the direction character according to Chinese character unit stroke, Gabor characteristic extraction is carried out to Chinese character unit stroke,
Obtain each o'clock 0, π/4, pi/2,3 π/4 response, complete the statistical modeling of Chinese character unit stroke;Using principle of maximum entropy,
Chosen by using approximate construction relation pair neighbours' stroke, approximate construction relation be in Hanzi component a stroke and other
The structural relation of all strokes is approximately relative to the structural relation of oneself neighbour, passes through conditional probability description scheme relation;
Step (A3), by calculating the local feature of two neighbours' strokes of neighbours each other help the Chinese character of identified input
Part, local feature include center relative position, differential seat angle and length ratio etc.;
Step (A4), when identifying some target component, each stroke of corresponding component is obtained first, for each stroke
One group of possible solution is generated, these solutions are probably initial pen section, it is also possible to the combined result of initial pen section, define initial pen section
The rule of combination is that two pen sections join end to end and direction difference is no more than 15 °, or one of pen section is sufficiently small, such two
Individual pen section is combined into possible stroke matching solution and is added in candidate's stroke queue.
Further, the step of in the step (2) based on the optimal principle generation identification component, is as follows:
Step (B1), structure search graph, are described as to the search graph:Each column represents each of part to be matched in the figure
The individual stroke marked, and every a line in each column is expressed as initial pen section institute of the stroke for this part by input Chinese character
Candidate's stroke of generation, such matching problem are converted into the search procedure of figure, to look for each row all to find a point, from the
One row, which are found, solves the maximum solution of similarity in all feasible solutions of last row;
Step (B2), search procedure rule is as follows in the search graph of construction:First, when matching certain unicursal, if conduct
The stroke of the neighbours of this stroke has been selected, then to be calculated by conditional probability, and considers the local feature stored before
Information, calculate the center relative position relation, stroke length of this candidate's stroke to be matched and above matched candidate's stroke
Degree ratio etc., and relatively carry out description similarity with the local feature information of storage;Second, when matching certain unicursal, if to be matched
Candidate's stroke takes in the initial pen section to inputting Chinese character to conflict with the candidate's stroke above chosen, then candidate's pen section
It can not be selected;
Step (B3), to all possibility solutions corresponding to each Hanzi component for being obtained in search graph, find optimal set
Cooperate to input the recognition result of Hanzi component;Solve the problem using Dynamic Programming Idea, the problem is described as one first
Individual knapsack problem, knapsack capacity are initial hop count mesh for inputting Chinese character, and each possible part identification solution both corresponds to one
Individual mark array identifies this possible solution to inputting the occupancy situation of the initial pen section of Chinese character, and how this problem equivalent is in choosing not
Several articles of conflict are put into knapsack, knapsack could be piled as far as possible.
Further, the step (3) is based on profile and skeleton corresponding relation, and the step of splitting to Hanzi component is such as
Under:
Step (C1), the optimal part recognition result to being obtained in step (2), are obtained using Canny edge detection algorithms
The profile of part;Including filtering and three steps of noise reduction, enhancing and detection, dual threashold is used to the result obtained after Canny is handled
The detection of value method connects with edge completes whole detection process;
The profile obtained in step (C2), step (C1) is actually and in the absence of the connection letter of forerunner and follow-up marginal point
Breath, therefore utilize border following algorithm to find the profile point chain organizational form of correspondence profile, utilize the wheel based on eight connectivity region
Wide track algorithm, object edge is reorganized, and has obtained the objective contour information of chain, can be used for after storage follow-up
The segmentation of part;
Step (C3), pair in the middle skeleton and target component for obtaining inputting Chinese character by the optimal principle of step (2)
After should being related to, the corresponding relation of profile point and skeleton is found, will be right using the obtained edge point set with chain form
It should extract in the profile point of part skeleton and be connected in the position of disconnection, form the result of part segmentation.
Further, the step (4) is theoretical according to gold lattice, and the step of judging Hanzi structure is as follows:
Step (D1), Chinese character are combined by various parts, according to the relative position relation between part, the common Chinese
Word structure can be divided into 13 kinds, the palace lattice principle formed using Chinese character, to describe the architectural feature of Chinese character, using based on nine palaces
Improved gold case theory on the basis of case theory, original grid is divided into 9 grids using golden section ratio, Chinese character is reasonable
Layout in target square frame, Hanzi structure is analyzed;Structural determination criterion is built, structural determination rule is converted into meter
Feasible algorithm on calculation machine, relation between Hanzi component is quickly judged by index value method.
The present invention compared with prior art the advantages of be:
(1) present invention is in image component identification, use condition probability description structural relation, calculates two of neighbours each other
The local feature of neighbours' stroke helps the Hanzi component of identified input, uses the optimal part identification knot of optimum combination policy selection
Fruit, improve the discrimination and accuracy rate of part.
(2) present invention uses the architectural feature of gold lattice theoretical description Chinese character, using golden section ratio by original-party
Lattice are divided into 9 palace lattice, structure structural determination criterion feasible on computers, Hanzi component structure are judged, can be with
The accurate structure for judging Chinese character.
Brief description of the drawings
Fig. 1 is the overall process schematic diagram of the present invention;
Fig. 2 is candidate's stroke result figure of the part generation of the present invention;
Fig. 3 is the Hanzi component recognition result figure obtained based on the optimal principle of the present invention;
Fig. 4 is the profile point schematic diagram corresponding with skeletal point of the present invention;
Fig. 5 is the result figure of the Hanzi component segmentation generation of the present invention;
Fig. 6 is the gold lattice dividing method and palace lattice weight setting figure of the present invention;
The Hanzi component that Fig. 7 is the present invention merges and structural determination flow chart;
Fig. 8 is the Hanzi structure result of determination figure of the present invention.
Embodiment
The present invention is described in further detail with example below in conjunction with the accompanying drawings:
Implementation process of the present invention includes four key steps:Part statistical framework models and generates candidate's stroke, based on most
Excellent combinatorial principle generation identification component, Hanzi component segmentation and the Hanzi structure based on gold lattice theory judge.
As shown in figure 1, the present invention is implemented as follows:
Step 1:Hanzi component statistical framework models to be generated with candidate's stroke:
Image thinning and skeletal extraction are carried out to the part picture in standard part base, detect stroke end points and stroke it
Between crosspoint as characteristic point, initial pen section is obtained by the line between characteristic point;Merge what is obtained by interactive operation
Initial pen section, form the stroke of the Hanzi component of standard of comparison.Think that the stroke of part all obeys one 4 dimension height in the present invention
This distribution, is expressed as X~N (μ, Σ), and this 4 dimensional vector is weighted to obtain by the 4 dimensional vector values each put on stroke, in the present invention
Detect at each o'clock in the response at 0, π/4, pi/2,3 π/4 on four direction using Gabor filtering.For Chinese character to be matched
Part C and an auxiliary input Chinese character picture S, the similarity between them is represented using the joint probability shown in formula (1),
Wherein riAnd siThe stroke in the stroke and input Chinese character in part is represented respectively.
Pr(S=C) ≡ Pr(s1=r1,s2=r2,...,sn=rn) (1)
Afterwards, joint probability distribution is represented with conditional probability, obtains the method for calculating unit similarity, then formula (1)
It is converted into shown in formula (2):
Joint probability density can be calculated by formula (2), but computation complexity is higher, it is former using maximum entropy in the present invention
Manage influences a maximum stroke to each stroke is further converted to as neighbours, approximate description scheme relation, formula (2)
Formula (3), strong multi-factor structure transformation are one group of diadactic structure relation.
It is defeated to help to identify by the local feature for calculating two neighbours' strokes of neighbours each other when generating candidate's stroke
The Hanzi component entered, local feature include center relative position, differential seat angle and length ratio etc..When identifying some target component,
Each stroke of corresponding component is obtained first, generates one group of possible solution for each stroke, these solutions are likely to be initial
Section, it is also possible to which the combined result of initial pen section, the rule for defining initial pen section combination join end to end and side for two pen sections
To difference no more than 15 °, or one of pen section is sufficiently small, and two such pen section is combined into possible stroke matching solution and added
Enter into candidate's stroke queue.Fig. 2 illustrates the stroke for the overstriking in target component " bar ", input Chinese character picture " father "
Candidate's stroke generates situation.
Step 2:Identification component is generated based on the optimal principle:
Obtain in target component after candidate's stroke set corresponding to each stroke, passing through optimal combination plan in step 1
Slightly, one group of candidate's stroke is selected from candidate's stroke set so that the similarity of target component and candidate's stroke set is maximum.This step
Realized by building search graph and principle of optimality.Each column represents the good pen of each mark of part to be matched in search graph
Draw, and every a line in each column is expressed as the candidate's pen generated for the stroke of this part by the initial pen section of input Chinese character
Draw.Find optimal solution be exactly a point is found from each row, from first row to last one row be possible to phase is solved in solution
Like the solution that degree is maximum.
When search matches certain unicursal, if candidate's stroke to be matched is taken with before in the initial pen section to inputting Chinese character
Candidate's stroke conflict that face has been chosen, then candidate's pen section can not be selected.If in addition, when matching certain unicursal, if conduct
Neighbours' stroke of this unicursal has been selected, then needs by probability calculation and consider local feature come description similarity.Make
Optimum combination is found as the recognition result of input Hanzi component by the use of Dynamic Programming.The optimal principle obtains final identification knot
The schematic diagram of fruit as shown in figure 3, for input Hanzi component " fruit ", for several parts " mouth ", " wood ", " people ", " field " and
" soil ", all possible solution can be obtained, pass through optimum combination strategy, it can be deduced that " field " and " wood " is the input Chinese character " fruit "
Optimal part recognition result.
Step 3:Based on profile and skeleton corresponding relation, Hanzi component is split:
After optimal part recognition result is obtained in step 2, part is obtained first by Canny contour detecting algorithms
Profile, the process include filtering and three steps of noise reduction, enhancing and detection.Original image is filtered using convolution operation, formula
(4) and formula (5) gives the gaussian kernel function of the peacekeeping two dimension used in convolution operation.
Profile after Canny algorithm process and in the absence of forerunner and the link information of follow-up marginal point, therefore utilize base
The profile point chain organizational form of correspondence profile is found in the border following algorithm of 8 connected regions, specific method is:(1) really
Determine starting point.It is starting point to find the point that the most upper left pixel value of image is 1, stores its coordinate value and is designated as P0;(2) from P0
0 neighborhood of point starts, and counterclockwise detects its 8 neighborhood point value, finds the point that first pixel value is 1, stores its coordinate and remember
For P1, claim P0For P1First object neighborhood point;(3) often walked all from the next neighborhood for finding first object neighborhood point a little after
Point starts to search for, and finds next target neighborhood point;(4) circulation step (3), until PjFirst object neighborhood point be P0And Pj+1's
First object neighborhood point is P1When, tracking terminates.After border following algorithm is handled, object edge can be subjected to group again
Knit, obtained the objective contour information of chain, can be used for doing the segmentation of subsequent parts after storage.
Obtained by the optimal principle after inputting the skeleton of Chinese character and the corresponding relation of target component, find profile point
With the corresponding relation of skeleton, so after successfully corresponding relation is calculated, the profile point corresponding to part skeleton is extracted
And connected in the position of disconnection, form the result of part segmentation.Its process is as shown in figure 4, (c) (d) represents " arrow " respectively in Fig. 4
The result corresponding with skeletal point with the profile point of two cross sections to be split of "Yes".Because the algorithm that the present invention uses is being sought
Profile point is marked when looking for corresponding to, the value of mark is consistent with the numbering of the skeletal point nearest from it, and whole profile is chain
The storage of shape form, after traversal, it is exactly the profile point that separate to find the point that same numbering disconnects, and is connected to next
The secondary profile point for occurring equally numbering, so can preferably realizes intersection in intersection after the connection of iteration
The separation of part.Fig. 5 show the result of " fortune ", " foot ", " monarch " and " hindering " four word part segmentations.
Step 4:It is theoretical according to gold lattice, Hanzi structure is judged:
Chinese character is combined by various parts, and the relative position relation between part determines feature of Chinese characters structure.Portion
Relative position is fixed between part, and for whole Chinese character literary style with the presence of very strong spatial layout feature, present invention employs managed based on nine grids
The improved gold case theory on the basis of, by analyzing a large amount of Hanzi structures, has summed up golden section approach for analyzing structure of Chinese character,
Gold lattice dividing method and palace lattice weight setting are as shown in fig. 6, the present invention uses the golden section Hanzi structure of " first four method "
Analytic approach.
In the part segmentation described by step 3, Chinese character has actually been splitted into each sub-portion in units of part
Point, and the judgement of structure is exactly the recursive process for merging these subdivisions and finally obtaining an entirety, during merging,
Judge the structural relation of two parts merged every time, can both obtain the hierarchy information of Chinese character composition.Recurrence of the present invention
Two subdivisions of selection merge, it is necessary first to choose most suitable two and merge, by observation analysis, merging
Part majority has bounding box to be positioned proximate to, and horizontal (perpendicular) to when merging, accordingly vertically to the bounding box border of (level to)
There is the ratio close to 1, so when selecting part to be combined, calculate bounding box adjoining distance and bounding box length two-by-two
Degree, the similarity of width, selecting most similar two and merge, Fig. 7 is the flow chart of Hanzi component merging and structural determination,
Recurrence is gone down always, is terminated when to the last all parts merge into one.Fig. 8 be " prisoner ", " ", " stubble " and " caye " four
The structural determination result schematic diagram of Chinese character.
The content not being elaborated in description of the invention belongs to prior art known to professional and technical personnel in the field.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should
It is considered as protection scope of the present invention.
Claims (3)
1. a kind of Hanzi component segmentation and structural determination method based on part identification, it is characterised in that step is as follows:
Step(1), part statistical framework modeling, describe Hanzi component in stroke and structural relation, generation matching block acceptance of the bid
Note candidate's stroke of stroke;
Step(2), according to step(1)In obtained candidate's stroke, utilize Dynamic Programming Idea and optimum combination strategy generating portion
The optimum combination result of part, the result as Hanzi component identification;
Step(3), according to step(2)In obtained Hanzi component recognition result, based on profile and skeleton corresponding relation, to Chinese character
Part is split;
Step(4), according to step(3)In obtained Hanzi component segmentation result, part is gone out according to gold lattice theory analysis
Layout and Hanzi structure, to Chinese character carry out structural determination;
Wherein, the step(2)In the step of result is identified based on the optimal principle it is as follows:
Step(B1), structure search graph, each column represents the good stroke of each mark of part to be matched in the figure, and each column
In every a line be expressed as candidate's stroke that the stroke for this part is generated by the initial pen section of input Chinese character, so matching
Problem is converted into the search procedure of figure, and each row all find a point, and all feasible solutions of last row are found from first row
It is middle to solve the maximum solution of similarity;
Step(B2), step(B1)The search procedure rule of middle figure is as follows:First, when matching some stroke, if to be matched
Candidate's stroke takes in the initial pen section to inputting Chinese character to conflict with the candidate's stroke above chosen, then candidate's stroke
It can not be selected;Second,, if the stroke as the neighbours of this stroke has been selected when matching some stroke
Calculated by conditional probability, and use the local feature information that stores before, calculate this candidate's stroke to be matched and above
The center relative position relation of matched candidate's stroke, stroke length ratio, and relatively come with the local feature information of storage
Description similarity;
Step(B3), to step(B1)In the obtained feasible solution of each Hanzi component, find optimum combination as input Chinese character portion
The recognition result of part;Solve the problem using Dynamic Programming Idea, the problem is described as a knapsack problem first, knapsack holds
Amount is to input initial hop count mesh of Chinese character, and each possible part identification solution both corresponds to a mark array to identify this
Occupancy situation of the possible solution to the input initial pen section of Chinese character;
The step(4)Middle theoretical according to gold lattice, the step of judging Hanzi structure, is as follows:
Step(D1), Chinese character be to be combined by various parts, according to the relative position relation between part, common Chinese character knot
Structure can be divided into 13 kinds, the palace lattice principle formed using Chinese character, to describe the architectural feature of Chinese character, be managed using based on nine grids
Improved gold case theory, is analyzed Hanzi structure on the basis of;Structural determination criterion is built, structural determination rule is turned
Algorithm feasible on computer is turned to, relation between Hanzi component is quickly judged by index value method.
2. a kind of Hanzi component segmentation and structural determination method, its feature based on part identification according to claim 1
It is:The step(1)Described in stroke and structural relation in Hanzi component, the particular content for generating candidate's stroke is as follows:
Step(A1), to 687 Hanzi component pictures in existing standard Chinese character part library carry out extract skeleton pretreatment, inspection
The crosspoint surveyed between the end points and stroke of stroke obtains initial pen section as characteristic point by the line between characteristic point;It is logical
The initial pen section that interactive operation merges to obtain is crossed, obtains the stroke of Hanzi component marked;
Step(A2), according to the direction character of Chinese character unit stroke, Gabor characteristic extraction is carried out to Chinese character unit stroke, completed
The statistical modeling of Chinese character unit stroke;Using principle of maximum entropy, chosen by using approximate construction relation pair neighbours' stroke,
Approximate construction relation is that the structural relation of a stroke and other all strokes in Hanzi component is approximately relative to oneself neighbour
Structural relation, pass through conditional probability description scheme relation;
Step(A3), by calculating the local feature of two neighbours' strokes of neighbours each other help the Chinese character portion of identified input
Part, local feature include center relative position, differential seat angle and length ratio;
Step(A4), when identifying some target component, obtain each stroke of corresponding component first, generated for each stroke
One group of possible solution, these solutions are initial pen section, or the combination of initial pen section, and the rule for defining initial pen section combination is two pens
Section joins end to end and direction difference is no more than 15 °, and two such pen section is combined into possible stroke matching solution and is added to candidate's pen
Draw in queue.
3. a kind of Hanzi component segmentation and structural determination method, its feature based on part identification according to claim 1
It is:The step(3)In based on profile and skeleton corresponding relation, the step of splitting to Hanzi component, is as follows:
Step(C1), to step(2)In obtained optimal part recognition result, obtain part using Canny edge detection algorithms
Profile;
Step(C2), step(C1)In obtained profile actually and in the absence of forerunner and the link information of follow-up marginal point, therefore
The profile point chain organizational form of correspondence profile, the objective contour information storage of obtained chain are found using border following algorithm
It is used for the segmentation of subsequent parts afterwards;
Step(C3), in step(2)The middle skeleton pass corresponding with target component for obtaining inputting Chinese character by the optimal principle
After system, the corresponding relation of profile point and skeleton is found, the profile point corresponding to part skeleton is extracted and in the position of disconnection
Connection is put, forms the result of part segmentation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510424057.9A CN104992161B (en) | 2015-07-17 | 2015-07-17 | A kind of Hanzi component segmentation and structural determination method based on part identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510424057.9A CN104992161B (en) | 2015-07-17 | 2015-07-17 | A kind of Hanzi component segmentation and structural determination method based on part identification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104992161A CN104992161A (en) | 2015-10-21 |
CN104992161B true CN104992161B (en) | 2018-04-06 |
Family
ID=54303974
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510424057.9A Active CN104992161B (en) | 2015-07-17 | 2015-07-17 | A kind of Hanzi component segmentation and structural determination method based on part identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104992161B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106503706B (en) * | 2016-09-23 | 2019-06-07 | 北京大学 | The method of discrimination of Chinese character pattern cutting result correctness |
CN108491444B (en) * | 2018-02-12 | 2019-03-12 | 龙马智芯(珠海横琴)科技有限公司 | The generation method and device of solution |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968619A (en) * | 2012-11-13 | 2013-03-13 | 北京航空航天大学 | Recognition method for components of Chinese character pictures |
CN104182748A (en) * | 2014-08-15 | 2014-12-03 | 电子科技大学 | A method for extracting automatically character strokes based on splitting and matching |
CN104200210A (en) * | 2014-08-12 | 2014-12-10 | 合肥工业大学 | License plate character segmentation method based on parts |
-
2015
- 2015-07-17 CN CN201510424057.9A patent/CN104992161B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968619A (en) * | 2012-11-13 | 2013-03-13 | 北京航空航天大学 | Recognition method for components of Chinese character pictures |
CN104200210A (en) * | 2014-08-12 | 2014-12-10 | 合肥工业大学 | License plate character segmentation method based on parts |
CN104182748A (en) * | 2014-08-15 | 2014-12-03 | 电子科技大学 | A method for extracting automatically character strokes based on splitting and matching |
Non-Patent Citations (3)
Title |
---|
基于部件复用的分级汉字字库的设计与实现;冯万仁;《万方数据知识服务平台》;20110630;第1-55页 * |
基于骨架图匹配的汉字变形技术;刘敏等;《北京航空航天大学学报》;20150228;第41卷(第2期);第364-368页 * |
手写体汉字识别研究;何坚;《中国优秀硕士学位论文全文数据库 信息科技辑》;20100415(第04期);第I138-524页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104992161A (en) | 2015-10-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Qiao et al. | LGPMA: complicated table structure recognition with local and global pyramid mask alignment | |
CN105931295B (en) | A kind of geologic map Extracting Thematic Information method | |
Li et al. | A three-step approach for TLS point cloud classification | |
Rezvanifar et al. | Symbol spotting on digital architectural floor plans using a deep learning-based framework | |
CN107871124A (en) | A kind of Remote Sensing Target detection method based on deep neural network | |
CN103098100A (en) | Method for analyzing 3D model shape based on perceptual information | |
CN103366160A (en) | Objectionable image distinguishing method integrating skin color, face and sensitive position detection | |
Zhao et al. | Accurate pedestrian detection by human pose regression | |
Shivakumara et al. | New gradient-spatial-structural features for video script identification | |
CN102968619B (en) | Recognition method for components of Chinese character pictures | |
CN101763647A (en) | Real-time video camera tracking method based on key frames | |
Zhu et al. | Deep residual text detection network for scene text | |
Lian et al. | Weakly supervised road segmentation in high-resolution remote sensing images using point annotations | |
Li et al. | A complex junction recognition method based on GoogLeNet model | |
Jiang et al. | Learning to transfer focus of graph neural network for scene graph parsing | |
CN110008900A (en) | A kind of visible remote sensing image candidate target extracting method by region to target | |
CN113505670A (en) | Remote sensing image weak supervision building extraction method based on multi-scale CAM and super-pixels | |
CN101276370A (en) | Three-dimensional human body movement data retrieval method based on key frame | |
CN104992161B (en) | A kind of Hanzi component segmentation and structural determination method based on part identification | |
CN108319961A (en) | A kind of image ROI rapid detection methods based on local feature region | |
Steinherz et al. | Offline loop investigation for handwriting analysis | |
Chen et al. | Headnet: pedestrian head detection utilizing body in context | |
CN103235945A (en) | Method for recognizing handwritten mathematical formulas and generating MathML (mathematical makeup language) based on Android system | |
Zhu et al. | Scene text relocation with guidance | |
Wang et al. | A Dense-aware Cross-splitNet for Object Detection and Recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |