CN104992161B - A kind of Hanzi component segmentation and structural determination method based on part identification - Google Patents

A kind of Hanzi component segmentation and structural determination method based on part identification Download PDF

Info

Publication number
CN104992161B
CN104992161B CN201510424057.9A CN201510424057A CN104992161B CN 104992161 B CN104992161 B CN 104992161B CN 201510424057 A CN201510424057 A CN 201510424057A CN 104992161 B CN104992161 B CN 104992161B
Authority
CN
China
Prior art keywords
stroke
chinese character
hanzi component
hanzi
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510424057.9A
Other languages
Chinese (zh)
Other versions
CN104992161A (en
Inventor
齐越
包永堂
于博文
胡山峰
沈方阳
丁志鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201510424057.9A priority Critical patent/CN104992161B/en
Publication of CN104992161A publication Critical patent/CN104992161A/en
Application granted granted Critical
Publication of CN104992161B publication Critical patent/CN104992161B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/333Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/36Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters

Abstract

The invention discloses a kind of Hanzi component segmentation based on part identification and structural determination method, carries out processing completion Hanzi component modeling to the Hanzi component picture of specific font first;Secondly, it is each section to input Chinese character decomposition, each part obtains the maximum pen section set of one group of similitude in control storehouse, while obtains inputting the part recognition result of Chinese character by optimum combination strategy, additionally obtains inputting in Chinese character the relation of the initial pen section set detected and corresponding component;Then, using contour detecting algorithm and border following algorithm, the profile information using storage into chain form, the corresponding relation of each profile point and nearest skeletal point is found, completes the cutting operation to Hanzi component;Finally, using the gold case theory of Chinese character configuration, the layout of part and the structure of Chinese character are accurately analyzed, so as to complete the structural determination to Chinese character and classification.The present invention preferably can be split and structural determination compared to the method for other images segmentation to Hanzi component.

Description

A kind of Hanzi component segmentation and structural determination method based on part identification
Technical field
The invention belongs to the image that the part of virtual reality technology and computer vision field, particularly Chinese character image identifies Process field and part segmentation and the area of pattern recognition judged.
Background technology
Chinese character has larger difference compared with language and characters in structure.Chinese character is different by characterizing as pictograph The all parts of implication are combined by different combinations.For example, " woods " and " China fir " two words, there is a part " wood ", the meaning of tree is all represent in practice.Therefore, how by Chinese character separating into the part with specific semantic information be learn Practise the important component of Chinese.Meanwhile correctly Hanzi component is split and structural determination also contribute to promote Chinese character Internationalization, it helps propagate Chinese culture.
At present, the part identification of Chinese character picture is broadly divided into two kinds of Statistics-Based Method and structure-based method. Based on statistical method be by same quasi-mode there is same alike result based on recognition methods, using Chinese character image as an entirety come Processing, the characteristic vector of extraction reflection image Global Information, feature based vector are identified.Currently conventional statistical method can It is divided into Furthest Neighbor, decision function method, bayes decision method and the class of modelling four.Statistics-Based Method extraction feature is easy, and And the deformation of noise and Chinese character can be tolerated, there is preferable robustness and antijamming capability, but this method distinguishes likeness in form word Ability is poor, for example " thousand " and " dry " such likeness in form word just easily causes the upper mistake of identification.Based on structural approach by Chinese character Image understanding is the assembly of a smaller ones (being referred to as primitive), and number, type and its correlation of primitive form Chinese character Structure, be identified based on the expression to Hanzi structure.The recognition methods of structure is currently based on firstly the need of for pen for writing Chinese characters The extraction of picture, the extraction to stroke mainly have top-down and bottom-up two kinds of strategies.Structure-based recognition methods can be compared with The architectural feature of things is reacted well, likeness in form word can be more distinguished compared with Statistics-Based Method, but structure-based method is not Enough robusts, it is easily affected by noise, in addition, stable effective extraction structural motif is extremely difficult from image.
Hanzi component is split according to part recognition result and detected firstly the need of the edge to part and profile, According to the skeleton of obtained target component, it is corresponding with skeleton to complete marginal point.Existing method is directly entered using Canny algorithms Row rim detection, obtain result and include much false edges as caused by noise or other reasonses, caused image border may not Closure.According to Hanzi component segmentation result, relative position relation is upper and lower relation or left-right relation etc. between two parts, can letter Singly it is determined as up-down structure or tiled configuration, but the relation of the encirclement structure for complexity, using simple bounding volume method simultaneously These class Chinese characters can not be distinguished well, can not direct judging part structure.
The content of the invention
The invention solves technical problem to be:A kind of overcome the deficiencies in the prior art, there is provided Chinese character based on part identification Part split with structural determination method, effectively the part of Chinese character can be split and structural determination.
Technical scheme is used by the present invention solves above-mentioned technical problem:A kind of Hanzi component point based on part identification Cut with structural determination method, realize that step is as follows:
Step (1), the modeling of part statistical framework, describe the stroke and structural relation in Hanzi component, generate matching block Candidate's stroke of middle mark stroke;
Step (2), according to the candidate's stroke obtained in step (1), utilize Dynamic Programming Idea and the life of optimum combination strategy Into the optimum combination result of part, the result as Hanzi component identification;
Step (3), according to the Hanzi component recognition result obtained in step (2), it is right based on profile and skeleton corresponding relation Hanzi component is split;
Step (4), according to the Hanzi component segmentation result obtained in step (3), theoretical accurate point according to gold lattice The layout and Hanzi structure of part are separated out, structural determination is carried out to Chinese character.
Specific step is as follows:
Step (1), the modeling of part statistical framework, generate candidate's stroke:To 687 in existing standard Chinese character part library Hanzi component picture carries out extracting skeleton pretreatment, detects the crosspoint between the end points of stroke and stroke as characteristic point, leads to The line crossed between characteristic point obtains initial pen section;The initial pen section for merging to obtain by interactive operation, obtains the Chinese marked The stroke of word part;According to the direction character of Chinese character unit stroke, Gabor characteristic extraction is carried out to Chinese character unit stroke, completed The statistical modeling of Chinese character unit stroke;Using principle of maximum entropy, chosen by using approximate construction relation pair neighbours' stroke; When identifying some target component, each stroke of corresponding component is obtained first, and one group of possible solution is generated for each stroke, These solutions are likely to be initial pen section, it is also possible to which the combination of initial pen section, the rule for defining initial pen section combination is two pens Section joins end to end and direction difference is no more than 15 °, or one of pen section is sufficiently small, and two such pen section is combined into possibility Stroke matching solution be added in candidate's stroke queue;
Step (2), part is identified based on the optimal principle:Search graph is built, Hanzi component matching problem is turned The search procedure of figure is turned to, in the search procedure of figure, if being taken with matching candidate stroke in the initial pen section to inputting Chinese character It is upper with above the stroke of candidate conflicts, then this section can not be chosen.To in search graph, being searched for from first row to last one row All feasible solutions gone out, using optimum combination strategy, find optimum combination as the recognition result for inputting Hanzi component, pass through by The problem is described as a knapsack problem to solve;
Step (3), based on profile and skeleton corresponding relation, Hanzi component is split:To being obtained most in step (2) Excellent part recognition result, the profile of part is obtained using Canny edge detection algorithms;The profile directly obtained is not deposited actually In forerunner and the link information of follow-up marginal point, therefore utilize the profile point chain tissue shape of border following algorithm searching correspondence profile Formula, the segmentation of subsequent parts is can be used for after the objective contour information storage of obtained chain;Find pair of profile point and skeleton It should be related to, the profile point corresponding to part skeleton is extracted and connected in the position of disconnection, forms the result of part segmentation.
It is step (4), theoretical according to gold lattice, Hanzi structure is judged:According to the relative position between part Relation, common Hanzi structure can be divided into 13 kinds, the palace lattice principle formed using Chinese character, to describe the structure of Chinese character spy Sign, using based on improved gold case theory in nine grids theoretical foundation, is analyzed Hanzi structure;It is accurate to build structural determination Then, structural determination rule is converted into algorithm feasible on computer, relation between Hanzi component is quickly judged by index value method.
Further, part statistical framework models in the step (1), and the particular content of generation candidate's stroke is:
Step (A1), image thinning and skeletal extraction are carried out to the part picture in standard part base, detect the end of stroke Crosspoint between point and stroke obtains initial pen section as characteristic point by the line between characteristic point;Pass through interactive operation Merge obtained initial pen section, form the stroke of the Hanzi component of standard of comparison;
Step (A2), the direction character according to Chinese character unit stroke, Gabor characteristic extraction is carried out to Chinese character unit stroke, Obtain each o'clock 0, π/4, pi/2,3 π/4 response, complete the statistical modeling of Chinese character unit stroke;Using principle of maximum entropy, Chosen by using approximate construction relation pair neighbours' stroke, approximate construction relation be in Hanzi component a stroke and other The structural relation of all strokes is approximately relative to the structural relation of oneself neighbour, passes through conditional probability description scheme relation;
Step (A3), by calculating the local feature of two neighbours' strokes of neighbours each other help the Chinese character of identified input Part, local feature include center relative position, differential seat angle and length ratio etc.;
Step (A4), when identifying some target component, each stroke of corresponding component is obtained first, for each stroke One group of possible solution is generated, these solutions are probably initial pen section, it is also possible to the combined result of initial pen section, define initial pen section The rule of combination is that two pen sections join end to end and direction difference is no more than 15 °, or one of pen section is sufficiently small, such two Individual pen section is combined into possible stroke matching solution and is added in candidate's stroke queue.
Further, the step of in the step (2) based on the optimal principle generation identification component, is as follows:
Step (B1), structure search graph, are described as to the search graph:Each column represents each of part to be matched in the figure The individual stroke marked, and every a line in each column is expressed as initial pen section institute of the stroke for this part by input Chinese character Candidate's stroke of generation, such matching problem are converted into the search procedure of figure, to look for each row all to find a point, from the One row, which are found, solves the maximum solution of similarity in all feasible solutions of last row;
Step (B2), search procedure rule is as follows in the search graph of construction:First, when matching certain unicursal, if conduct The stroke of the neighbours of this stroke has been selected, then to be calculated by conditional probability, and considers the local feature stored before Information, calculate the center relative position relation, stroke length of this candidate's stroke to be matched and above matched candidate's stroke Degree ratio etc., and relatively carry out description similarity with the local feature information of storage;Second, when matching certain unicursal, if to be matched Candidate's stroke takes in the initial pen section to inputting Chinese character to conflict with the candidate's stroke above chosen, then candidate's pen section It can not be selected;
Step (B3), to all possibility solutions corresponding to each Hanzi component for being obtained in search graph, find optimal set Cooperate to input the recognition result of Hanzi component;Solve the problem using Dynamic Programming Idea, the problem is described as one first Individual knapsack problem, knapsack capacity are initial hop count mesh for inputting Chinese character, and each possible part identification solution both corresponds to one Individual mark array identifies this possible solution to inputting the occupancy situation of the initial pen section of Chinese character, and how this problem equivalent is in choosing not Several articles of conflict are put into knapsack, knapsack could be piled as far as possible.
Further, the step (3) is based on profile and skeleton corresponding relation, and the step of splitting to Hanzi component is such as Under:
Step (C1), the optimal part recognition result to being obtained in step (2), are obtained using Canny edge detection algorithms The profile of part;Including filtering and three steps of noise reduction, enhancing and detection, dual threashold is used to the result obtained after Canny is handled The detection of value method connects with edge completes whole detection process;
The profile obtained in step (C2), step (C1) is actually and in the absence of the connection letter of forerunner and follow-up marginal point Breath, therefore utilize border following algorithm to find the profile point chain organizational form of correspondence profile, utilize the wheel based on eight connectivity region Wide track algorithm, object edge is reorganized, and has obtained the objective contour information of chain, can be used for after storage follow-up The segmentation of part;
Step (C3), pair in the middle skeleton and target component for obtaining inputting Chinese character by the optimal principle of step (2) After should being related to, the corresponding relation of profile point and skeleton is found, will be right using the obtained edge point set with chain form It should extract in the profile point of part skeleton and be connected in the position of disconnection, form the result of part segmentation.
Further, the step (4) is theoretical according to gold lattice, and the step of judging Hanzi structure is as follows:
Step (D1), Chinese character are combined by various parts, according to the relative position relation between part, the common Chinese Word structure can be divided into 13 kinds, the palace lattice principle formed using Chinese character, to describe the architectural feature of Chinese character, using based on nine palaces Improved gold case theory on the basis of case theory, original grid is divided into 9 grids using golden section ratio, Chinese character is reasonable Layout in target square frame, Hanzi structure is analyzed;Structural determination criterion is built, structural determination rule is converted into meter Feasible algorithm on calculation machine, relation between Hanzi component is quickly judged by index value method.
The present invention compared with prior art the advantages of be:
(1) present invention is in image component identification, use condition probability description structural relation, calculates two of neighbours each other The local feature of neighbours' stroke helps the Hanzi component of identified input, uses the optimal part identification knot of optimum combination policy selection Fruit, improve the discrimination and accuracy rate of part.
(2) present invention uses the architectural feature of gold lattice theoretical description Chinese character, using golden section ratio by original-party Lattice are divided into 9 palace lattice, structure structural determination criterion feasible on computers, Hanzi component structure are judged, can be with The accurate structure for judging Chinese character.
Brief description of the drawings
Fig. 1 is the overall process schematic diagram of the present invention;
Fig. 2 is candidate's stroke result figure of the part generation of the present invention;
Fig. 3 is the Hanzi component recognition result figure obtained based on the optimal principle of the present invention;
Fig. 4 is the profile point schematic diagram corresponding with skeletal point of the present invention;
Fig. 5 is the result figure of the Hanzi component segmentation generation of the present invention;
Fig. 6 is the gold lattice dividing method and palace lattice weight setting figure of the present invention;
The Hanzi component that Fig. 7 is the present invention merges and structural determination flow chart;
Fig. 8 is the Hanzi structure result of determination figure of the present invention.
Embodiment
The present invention is described in further detail with example below in conjunction with the accompanying drawings:
Implementation process of the present invention includes four key steps:Part statistical framework models and generates candidate's stroke, based on most Excellent combinatorial principle generation identification component, Hanzi component segmentation and the Hanzi structure based on gold lattice theory judge.
As shown in figure 1, the present invention is implemented as follows:
Step 1:Hanzi component statistical framework models to be generated with candidate's stroke:
Image thinning and skeletal extraction are carried out to the part picture in standard part base, detect stroke end points and stroke it Between crosspoint as characteristic point, initial pen section is obtained by the line between characteristic point;Merge what is obtained by interactive operation Initial pen section, form the stroke of the Hanzi component of standard of comparison.Think that the stroke of part all obeys one 4 dimension height in the present invention This distribution, is expressed as X~N (μ, Σ), and this 4 dimensional vector is weighted to obtain by the 4 dimensional vector values each put on stroke, in the present invention Detect at each o'clock in the response at 0, π/4, pi/2,3 π/4 on four direction using Gabor filtering.For Chinese character to be matched Part C and an auxiliary input Chinese character picture S, the similarity between them is represented using the joint probability shown in formula (1), Wherein riAnd siThe stroke in the stroke and input Chinese character in part is represented respectively.
Pr(S=C) ≡ Pr(s1=r1,s2=r2,...,sn=rn) (1)
Afterwards, joint probability distribution is represented with conditional probability, obtains the method for calculating unit similarity, then formula (1) It is converted into shown in formula (2):
Joint probability density can be calculated by formula (2), but computation complexity is higher, it is former using maximum entropy in the present invention Manage influences a maximum stroke to each stroke is further converted to as neighbours, approximate description scheme relation, formula (2) Formula (3), strong multi-factor structure transformation are one group of diadactic structure relation.
It is defeated to help to identify by the local feature for calculating two neighbours' strokes of neighbours each other when generating candidate's stroke The Hanzi component entered, local feature include center relative position, differential seat angle and length ratio etc..When identifying some target component, Each stroke of corresponding component is obtained first, generates one group of possible solution for each stroke, these solutions are likely to be initial Section, it is also possible to which the combined result of initial pen section, the rule for defining initial pen section combination join end to end and side for two pen sections To difference no more than 15 °, or one of pen section is sufficiently small, and two such pen section is combined into possible stroke matching solution and added Enter into candidate's stroke queue.Fig. 2 illustrates the stroke for the overstriking in target component " bar ", input Chinese character picture " father " Candidate's stroke generates situation.
Step 2:Identification component is generated based on the optimal principle:
Obtain in target component after candidate's stroke set corresponding to each stroke, passing through optimal combination plan in step 1 Slightly, one group of candidate's stroke is selected from candidate's stroke set so that the similarity of target component and candidate's stroke set is maximum.This step Realized by building search graph and principle of optimality.Each column represents the good pen of each mark of part to be matched in search graph Draw, and every a line in each column is expressed as the candidate's pen generated for the stroke of this part by the initial pen section of input Chinese character Draw.Find optimal solution be exactly a point is found from each row, from first row to last one row be possible to phase is solved in solution Like the solution that degree is maximum.
When search matches certain unicursal, if candidate's stroke to be matched is taken with before in the initial pen section to inputting Chinese character Candidate's stroke conflict that face has been chosen, then candidate's pen section can not be selected.If in addition, when matching certain unicursal, if conduct Neighbours' stroke of this unicursal has been selected, then needs by probability calculation and consider local feature come description similarity.Make Optimum combination is found as the recognition result of input Hanzi component by the use of Dynamic Programming.The optimal principle obtains final identification knot The schematic diagram of fruit as shown in figure 3, for input Hanzi component " fruit ", for several parts " mouth ", " wood ", " people ", " field " and " soil ", all possible solution can be obtained, pass through optimum combination strategy, it can be deduced that " field " and " wood " is the input Chinese character " fruit " Optimal part recognition result.
Step 3:Based on profile and skeleton corresponding relation, Hanzi component is split:
After optimal part recognition result is obtained in step 2, part is obtained first by Canny contour detecting algorithms Profile, the process include filtering and three steps of noise reduction, enhancing and detection.Original image is filtered using convolution operation, formula (4) and formula (5) gives the gaussian kernel function of the peacekeeping two dimension used in convolution operation.
Profile after Canny algorithm process and in the absence of forerunner and the link information of follow-up marginal point, therefore utilize base The profile point chain organizational form of correspondence profile is found in the border following algorithm of 8 connected regions, specific method is:(1) really Determine starting point.It is starting point to find the point that the most upper left pixel value of image is 1, stores its coordinate value and is designated as P0;(2) from P0 0 neighborhood of point starts, and counterclockwise detects its 8 neighborhood point value, finds the point that first pixel value is 1, stores its coordinate and remember For P1, claim P0For P1First object neighborhood point;(3) often walked all from the next neighborhood for finding first object neighborhood point a little after Point starts to search for, and finds next target neighborhood point;(4) circulation step (3), until PjFirst object neighborhood point be P0And Pj+1's First object neighborhood point is P1When, tracking terminates.After border following algorithm is handled, object edge can be subjected to group again Knit, obtained the objective contour information of chain, can be used for doing the segmentation of subsequent parts after storage.
Obtained by the optimal principle after inputting the skeleton of Chinese character and the corresponding relation of target component, find profile point With the corresponding relation of skeleton, so after successfully corresponding relation is calculated, the profile point corresponding to part skeleton is extracted And connected in the position of disconnection, form the result of part segmentation.Its process is as shown in figure 4, (c) (d) represents " arrow " respectively in Fig. 4 The result corresponding with skeletal point with the profile point of two cross sections to be split of "Yes".Because the algorithm that the present invention uses is being sought Profile point is marked when looking for corresponding to, the value of mark is consistent with the numbering of the skeletal point nearest from it, and whole profile is chain The storage of shape form, after traversal, it is exactly the profile point that separate to find the point that same numbering disconnects, and is connected to next The secondary profile point for occurring equally numbering, so can preferably realizes intersection in intersection after the connection of iteration The separation of part.Fig. 5 show the result of " fortune ", " foot ", " monarch " and " hindering " four word part segmentations.
Step 4:It is theoretical according to gold lattice, Hanzi structure is judged:
Chinese character is combined by various parts, and the relative position relation between part determines feature of Chinese characters structure.Portion Relative position is fixed between part, and for whole Chinese character literary style with the presence of very strong spatial layout feature, present invention employs managed based on nine grids The improved gold case theory on the basis of, by analyzing a large amount of Hanzi structures, has summed up golden section approach for analyzing structure of Chinese character, Gold lattice dividing method and palace lattice weight setting are as shown in fig. 6, the present invention uses the golden section Hanzi structure of " first four method " Analytic approach.
In the part segmentation described by step 3, Chinese character has actually been splitted into each sub-portion in units of part Point, and the judgement of structure is exactly the recursive process for merging these subdivisions and finally obtaining an entirety, during merging, Judge the structural relation of two parts merged every time, can both obtain the hierarchy information of Chinese character composition.Recurrence of the present invention Two subdivisions of selection merge, it is necessary first to choose most suitable two and merge, by observation analysis, merging Part majority has bounding box to be positioned proximate to, and horizontal (perpendicular) to when merging, accordingly vertically to the bounding box border of (level to) There is the ratio close to 1, so when selecting part to be combined, calculate bounding box adjoining distance and bounding box length two-by-two Degree, the similarity of width, selecting most similar two and merge, Fig. 7 is the flow chart of Hanzi component merging and structural determination, Recurrence is gone down always, is terminated when to the last all parts merge into one.Fig. 8 be " prisoner ", " ", " stubble " and " caye " four The structural determination result schematic diagram of Chinese character.
The content not being elaborated in description of the invention belongs to prior art known to professional and technical personnel in the field.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (3)

1. a kind of Hanzi component segmentation and structural determination method based on part identification, it is characterised in that step is as follows:
Step(1), part statistical framework modeling, describe Hanzi component in stroke and structural relation, generation matching block acceptance of the bid Note candidate's stroke of stroke;
Step(2), according to step(1)In obtained candidate's stroke, utilize Dynamic Programming Idea and optimum combination strategy generating portion The optimum combination result of part, the result as Hanzi component identification;
Step(3), according to step(2)In obtained Hanzi component recognition result, based on profile and skeleton corresponding relation, to Chinese character Part is split;
Step(4), according to step(3)In obtained Hanzi component segmentation result, part is gone out according to gold lattice theory analysis Layout and Hanzi structure, to Chinese character carry out structural determination;
Wherein, the step(2)In the step of result is identified based on the optimal principle it is as follows:
Step(B1), structure search graph, each column represents the good stroke of each mark of part to be matched in the figure, and each column In every a line be expressed as candidate's stroke that the stroke for this part is generated by the initial pen section of input Chinese character, so matching Problem is converted into the search procedure of figure, and each row all find a point, and all feasible solutions of last row are found from first row It is middle to solve the maximum solution of similarity;
Step(B2), step(B1)The search procedure rule of middle figure is as follows:First, when matching some stroke, if to be matched Candidate's stroke takes in the initial pen section to inputting Chinese character to conflict with the candidate's stroke above chosen, then candidate's stroke It can not be selected;Second,, if the stroke as the neighbours of this stroke has been selected when matching some stroke Calculated by conditional probability, and use the local feature information that stores before, calculate this candidate's stroke to be matched and above The center relative position relation of matched candidate's stroke, stroke length ratio, and relatively come with the local feature information of storage Description similarity;
Step(B3), to step(B1)In the obtained feasible solution of each Hanzi component, find optimum combination as input Chinese character portion The recognition result of part;Solve the problem using Dynamic Programming Idea, the problem is described as a knapsack problem first, knapsack holds Amount is to input initial hop count mesh of Chinese character, and each possible part identification solution both corresponds to a mark array to identify this Occupancy situation of the possible solution to the input initial pen section of Chinese character;
The step(4)Middle theoretical according to gold lattice, the step of judging Hanzi structure, is as follows:
Step(D1), Chinese character be to be combined by various parts, according to the relative position relation between part, common Chinese character knot Structure can be divided into 13 kinds, the palace lattice principle formed using Chinese character, to describe the architectural feature of Chinese character, be managed using based on nine grids Improved gold case theory, is analyzed Hanzi structure on the basis of;Structural determination criterion is built, structural determination rule is turned Algorithm feasible on computer is turned to, relation between Hanzi component is quickly judged by index value method.
2. a kind of Hanzi component segmentation and structural determination method, its feature based on part identification according to claim 1 It is:The step(1)Described in stroke and structural relation in Hanzi component, the particular content for generating candidate's stroke is as follows:
Step(A1), to 687 Hanzi component pictures in existing standard Chinese character part library carry out extract skeleton pretreatment, inspection The crosspoint surveyed between the end points and stroke of stroke obtains initial pen section as characteristic point by the line between characteristic point;It is logical The initial pen section that interactive operation merges to obtain is crossed, obtains the stroke of Hanzi component marked;
Step(A2), according to the direction character of Chinese character unit stroke, Gabor characteristic extraction is carried out to Chinese character unit stroke, completed The statistical modeling of Chinese character unit stroke;Using principle of maximum entropy, chosen by using approximate construction relation pair neighbours' stroke, Approximate construction relation is that the structural relation of a stroke and other all strokes in Hanzi component is approximately relative to oneself neighbour Structural relation, pass through conditional probability description scheme relation;
Step(A3), by calculating the local feature of two neighbours' strokes of neighbours each other help the Chinese character portion of identified input Part, local feature include center relative position, differential seat angle and length ratio;
Step(A4), when identifying some target component, obtain each stroke of corresponding component first, generated for each stroke One group of possible solution, these solutions are initial pen section, or the combination of initial pen section, and the rule for defining initial pen section combination is two pens Section joins end to end and direction difference is no more than 15 °, and two such pen section is combined into possible stroke matching solution and is added to candidate's pen Draw in queue.
3. a kind of Hanzi component segmentation and structural determination method, its feature based on part identification according to claim 1 It is:The step(3)In based on profile and skeleton corresponding relation, the step of splitting to Hanzi component, is as follows:
Step(C1), to step(2)In obtained optimal part recognition result, obtain part using Canny edge detection algorithms Profile;
Step(C2), step(C1)In obtained profile actually and in the absence of forerunner and the link information of follow-up marginal point, therefore The profile point chain organizational form of correspondence profile, the objective contour information storage of obtained chain are found using border following algorithm It is used for the segmentation of subsequent parts afterwards;
Step(C3), in step(2)The middle skeleton pass corresponding with target component for obtaining inputting Chinese character by the optimal principle After system, the corresponding relation of profile point and skeleton is found, the profile point corresponding to part skeleton is extracted and in the position of disconnection Connection is put, forms the result of part segmentation.
CN201510424057.9A 2015-07-17 2015-07-17 A kind of Hanzi component segmentation and structural determination method based on part identification Active CN104992161B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510424057.9A CN104992161B (en) 2015-07-17 2015-07-17 A kind of Hanzi component segmentation and structural determination method based on part identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510424057.9A CN104992161B (en) 2015-07-17 2015-07-17 A kind of Hanzi component segmentation and structural determination method based on part identification

Publications (2)

Publication Number Publication Date
CN104992161A CN104992161A (en) 2015-10-21
CN104992161B true CN104992161B (en) 2018-04-06

Family

ID=54303974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510424057.9A Active CN104992161B (en) 2015-07-17 2015-07-17 A kind of Hanzi component segmentation and structural determination method based on part identification

Country Status (1)

Country Link
CN (1) CN104992161B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503706B (en) * 2016-09-23 2019-06-07 北京大学 The method of discrimination of Chinese character pattern cutting result correctness
CN108491444B (en) * 2018-02-12 2019-03-12 龙马智芯(珠海横琴)科技有限公司 The generation method and device of solution

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968619A (en) * 2012-11-13 2013-03-13 北京航空航天大学 Recognition method for components of Chinese character pictures
CN104182748A (en) * 2014-08-15 2014-12-03 电子科技大学 A method for extracting automatically character strokes based on splitting and matching
CN104200210A (en) * 2014-08-12 2014-12-10 合肥工业大学 License plate character segmentation method based on parts

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968619A (en) * 2012-11-13 2013-03-13 北京航空航天大学 Recognition method for components of Chinese character pictures
CN104200210A (en) * 2014-08-12 2014-12-10 合肥工业大学 License plate character segmentation method based on parts
CN104182748A (en) * 2014-08-15 2014-12-03 电子科技大学 A method for extracting automatically character strokes based on splitting and matching

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于部件复用的分级汉字字库的设计与实现;冯万仁;《万方数据知识服务平台》;20110630;第1-55页 *
基于骨架图匹配的汉字变形技术;刘敏等;《北京航空航天大学学报》;20150228;第41卷(第2期);第364-368页 *
手写体汉字识别研究;何坚;《中国优秀硕士学位论文全文数据库 信息科技辑》;20100415(第04期);第I138-524页 *

Also Published As

Publication number Publication date
CN104992161A (en) 2015-10-21

Similar Documents

Publication Publication Date Title
Qiao et al. LGPMA: complicated table structure recognition with local and global pyramid mask alignment
CN105931295B (en) A kind of geologic map Extracting Thematic Information method
Li et al. A three-step approach for TLS point cloud classification
Rezvanifar et al. Symbol spotting on digital architectural floor plans using a deep learning-based framework
CN107871124A (en) A kind of Remote Sensing Target detection method based on deep neural network
CN103098100A (en) Method for analyzing 3D model shape based on perceptual information
CN103366160A (en) Objectionable image distinguishing method integrating skin color, face and sensitive position detection
Zhao et al. Accurate pedestrian detection by human pose regression
Shivakumara et al. New gradient-spatial-structural features for video script identification
CN102968619B (en) Recognition method for components of Chinese character pictures
CN101763647A (en) Real-time video camera tracking method based on key frames
Zhu et al. Deep residual text detection network for scene text
Lian et al. Weakly supervised road segmentation in high-resolution remote sensing images using point annotations
Li et al. A complex junction recognition method based on GoogLeNet model
Jiang et al. Learning to transfer focus of graph neural network for scene graph parsing
CN110008900A (en) A kind of visible remote sensing image candidate target extracting method by region to target
CN113505670A (en) Remote sensing image weak supervision building extraction method based on multi-scale CAM and super-pixels
CN101276370A (en) Three-dimensional human body movement data retrieval method based on key frame
CN104992161B (en) A kind of Hanzi component segmentation and structural determination method based on part identification
CN108319961A (en) A kind of image ROI rapid detection methods based on local feature region
Steinherz et al. Offline loop investigation for handwriting analysis
Chen et al. Headnet: pedestrian head detection utilizing body in context
CN103235945A (en) Method for recognizing handwritten mathematical formulas and generating MathML (mathematical makeup language) based on Android system
Zhu et al. Scene text relocation with guidance
Wang et al. A Dense-aware Cross-splitNet for Object Detection and Recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant