CN104992161A - Chinese character part dividing and structure determination method based on part identification - Google Patents

Chinese character part dividing and structure determination method based on part identification Download PDF

Info

Publication number
CN104992161A
CN104992161A CN201510424057.9A CN201510424057A CN104992161A CN 104992161 A CN104992161 A CN 104992161A CN 201510424057 A CN201510424057 A CN 201510424057A CN 104992161 A CN104992161 A CN 104992161A
Authority
CN
China
Prior art keywords
stroke
chinese character
component
hanzi
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510424057.9A
Other languages
Chinese (zh)
Other versions
CN104992161B (en
Inventor
齐越
包永堂
于博文
胡山峰
沈方阳
丁志鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201510424057.9A priority Critical patent/CN104992161B/en
Publication of CN104992161A publication Critical patent/CN104992161A/en
Application granted granted Critical
Publication of CN104992161B publication Critical patent/CN104992161B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/333Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/36Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a Chinese character part dividing and structure determination method based on part identification. The Chinese character part dividing and structure determination method comprises: firstly, processing a Chinese character part with a specific font to finish Chinese character part modeling; secondly, decomposing an input Chinese character into stroke segments; obtaining one group of a stroke segment set with the maximum similarity corresponding to each part in a contrast library; meanwhile, obtaining a part identification result of the input Chinese character through an optimal combination policy; furthermore, obtaining a relation between an initial stroke segment set detected in the input Chinese character and the corresponding part; thirdly, adopting an outline detection algorithm and an edge tracking algorithm and utilizing outline information which is stored in a chain-shaped form to find out a corresponding relation between each outline point and a newest framework point, so as to finish dividing operation of the Chinese character part; and finally, accurately analyzing the layout of the part and the structure of the Chinese character by using a Chinese character golden section theory of a Chinese character form so as to finish the structure determination and classification of the Chinese character. Compared with other image dividing methods, the Chinese character part can be subjected to division and structure judgement very well.

Description

A kind of segmentation of the Hanzi component based on component identification and structural determination method
Technical field
The invention belongs to the image processing field of the component identification of virtual reality technology and computer vision field, particularly Chinese character image and the area of pattern recognition of parts segmentation and judgement.
Background technology
Chinese character structurally has larger difference compared with language and characters.Chinese character, as pictograph, is combined by different array modes by all parts characterizing different implication.Such as, " woods " and " China fir " two words, have a parts " wood ", all represent the meaning of tree in reality.Therefore, the parts how Chinese character separating being become to have concrete semantic information are important component parts of learning Chinese., correct Hanzi component to be split and structural determination also contributes to promoting Chinese character internationalization meanwhile, also contribute to propagating Chinese culture.
At present, the component identification of Chinese character picture is broadly divided into the method two kinds of Statistics-Based Method and structure based.Corpus--based Method method has recognition methods based on same alike result by same quasi-mode, integrally processed by Chinese character image, and extract the proper vector of reflection integral image information, feature based vector identifies.Current conventional statistical method can be divided into Furthest Neighbor, decision function method, bayes decision method and modelling four class.It is easy that Statistics-Based Method extracts feature, and the deformation of noise and Chinese character can be tolerated, have good robustness and antijamming capability, but the ability that the method distinguishes likeness in form word is poor, the such likeness in form word of such as " thousand " and " do " is just easy to cause identification to go up mistake.Chinese character image is interpreted as the assembly of a more fraction (being called primitive) by structure based method, and the number of primitive, type and mutual relationship thereof form the structure of Chinese character, identify based on to the expression of Hanzi structure.First the recognition methods of current structure based needs the extraction for Chinese-character stroke, mainly contains top-down and bottom-up two kinds of strategies to the extraction of stroke.The recognition methods of structure based better can react the architectural feature of things, more likeness in form word can be distinguished compared with Statistics-Based Method, but the inadequate robust of the method for structure based, easily affected by noise, in addition, from image, stable effective extraction structural motif is very difficult.
According to component identification result Hanzi component split and first need to detect the edge of parts and profile, according to the skeleton of the target component obtained, complete marginal point corresponding with skeleton.Existing method directly adopts Canny algorithm to carry out rim detection, and obtain result and comprise the false edge much caused by noise or other reasons, the image border of generation may not close.According to Hanzi component segmentation result, between two parts, relative position relation is upper and lower relation or left-right relation etc., can simply be judged to be up-down structure or tiled configuration, but the relation of the encirclement structure for complexity, simple bounding volume method is adopted well not distinguish these class Chinese characters, cannot the structure of direct judging part.
Summary of the invention
The present invention wants technical solution problem to be: overcome the deficiencies in the prior art, provides a kind of Hanzi component based on component identification to split and structural determination method, can effectively split and structural determination the parts of Chinese character.
The present invention solves the problems of the technologies described above adopted technical scheme: a kind of segmentation of the Hanzi component based on component identification and structural determination method, and performing step is as follows:
Step (1), the modeling of parts statistical framework, describe the stroke in Hanzi component and structural relation, generates the candidate's stroke marking stroke in matching block;
Step (2), according to the candidate's stroke obtained in step (1), utilize the optimum combination result of Dynamic Programming Idea and optimum combination strategy generating parts, as the result of Hanzi component identification;
Step (3), according to the Hanzi component recognition result obtained in step (2), based on profile and skeleton corresponding relation, Hanzi component to be split;
Step (4), according to the Hanzi component segmentation result obtained in step (3), go out layout and the Hanzi structure of parts according to the theoretical accurate analysis of gold lattice, structural determination is carried out to Chinese character.
Concrete step is as follows:
Step (1), the modeling of parts statistical framework, generate candidate's stroke: the pre-service of extraction skeleton is carried out to 687 Hanzi component pictures in existing standard Chinese character part library, point of crossing between the end points of detection stroke and stroke, as unique point, obtains initial pen section by the line between unique point; Merged the initial pen section obtained by interactive operation, obtain the stroke of the Hanzi component marked; According to the direction character of Chinese character unit stroke, Gabor characteristic extraction is carried out to Chinese character unit stroke, complete the statistical modeling of Chinese character unit stroke; Using principle of maximum entropy, by utilizing approximate construction relation, neighbours' stroke being chosen; When identifying certain target component, first each stroke of corresponding component is obtained, one group of possible solution is generated for each stroke, these solutions are likely initial pen sections, also may be the combination of initial pen section, the rule defining the combination of initial pen section is that two pen sections join end to end and direction difference is no more than 15 °, or one of them section is enough little, and such two pen sections are combined into possible stroke coupling solution and join in the queue of candidate's stroke;
Step (2), obtain identification component based on the optimal principle: build search graph, Hanzi component matching problem is converted into the search procedure of figure, in the search procedure of figure, if band matching candidate stroke the initial pen section of input Chinese character is taken with above the stroke of candidate conflict, then this section can not be chosen.To in search graph, arrange to last all feasible solutions searched out from first row, use optimum combination strategy, finding the recognition result of optimum combination as input Hanzi component, solving by this problem being described as a knapsack problem;
Step (3), based on profile and skeleton corresponding relation, Hanzi component to be split: to the optimum component identification result obtained in step (2), use Canny edge detection algorithm to obtain the profile of parts; In fact there is not the link information of forerunner and follow-up marginal point in the profile directly obtained, therefore utilize border following algorithm to find the point chain organizational form of corresponding profile, may be used for the segmentation of subsequent parts after the objective contour information storage of the chain obtained; Find the corresponding relation of point and skeleton, the point corresponding to parts skeleton is extracted and connects in the position disconnected, the result of forming member segmentation.
Step (4), theoretical according to gold lattice, Hanzi structure is judged: according to the relative position relation between parts, common Hanzi structure can be divided into 13 kinds, the palace lattice principle utilizing Chinese character to form, the architectural feature of Chinese character is described, adopt based on the gold case theory that nine grids theoretical foundation is improved, Hanzi structure is analyzed; Build structural determination criterion, structural determination rule is converted into algorithm feasible on computing machine, judges relation between Hanzi component fast by index value method.
Further, parts statistical framework modeling in described step (1), the particular content generating candidate's stroke is:
Step (A1), image thinning and skeletal extraction are carried out to the parts picture in standard part base, detect point of crossing between the end points of stroke and stroke as unique point, obtain initial pen section by the line between unique point; Merged the initial pen section obtained by interactive operation, form the stroke of the Hanzi component of standard of comparison;
Step (A2), direction character according to Chinese character unit stroke, carry out Gabor characteristic extraction to Chinese character unit stroke, obtain each o'clock 0, π/4, pi/2,3 π/4 response, complete the statistical modeling of Chinese character unit stroke; Use principle of maximum entropy, by utilizing approximate construction relation, neighbours' stroke is chosen, approximate construction relation is the structural relation that the structural relation of a stroke and other all strokes in Hanzi component is approximately relative to oneself neighbour, by conditional probability description scheme relation;
Step (A3), by the local feature of two the neighbours' strokes calculating neighbours each other help identify input Hanzi component, local feature comprises center relative position, differential seat angle and length ratio etc.;
Step (A4), when identifying certain target component, first each stroke of corresponding component is obtained, one group of possible solution is generated for each stroke, these solutions may be initial pen sections, also may be the combined result of initial pen section, the rule defining the combination of initial pen section is that two pen sections join end to end and direction difference is no more than 15 °, or one of them section is enough little, and such two pen sections are combined into possible stroke coupling solution and join in the queue of candidate's stroke.
Further, the step based on the optimal principle generation identification component in described step (2) is as follows:
Step (B1), structure search graph, this search graph is described as: each stroke marked of parts to be matched is shown in every list in the figure, and the every a line in often arranging is expressed as candidate's stroke that the stroke for these parts is generated by the initial pen section inputting Chinese character, such matching problem is converted into the search procedure of figure, look for each row all to find a point, from all feasible solutions that first row finds last to arrange, solve the maximum solution of similarity;
Step (B2), structure search graph in search procedure rule as follows: first, when mating certain unicursal, if be selected as the stroke of the neighbours of this stroke, then to be calculated by conditional probability, and the local feature information stored before considering, the center relative position relation, stroke length ratio etc. of candidate's stroke that calculate this candidate's stroke to be matched and mated above, and compare to come description similarity with the local feature information stored; The second, when mating certain unicursal, if candidate's stroke to be matched is taking conflict with the candidate's stroke chosen above to the initial pen section of input Chinese character, then this candidate's pen section can not be selected;
Step (B3), to all may the separating corresponding to each Hanzi component obtained in search graph, find optimum combination as input Hanzi component recognition result; Dynamic Programming Idea is used to solve this problem, first this problem is described as a knapsack problem, knapsack capacity is initial hop count order of input Chinese character, each possible component identification solution corresponds to a mark array and identifies this possible solution and take situation to input Chinese character initial pen section, this problem equivalent puts into knapsack in how choosing several the article do not conflicted, and knapsack just can be made to pile as much as possible.
Further, described step (3) is based on profile and skeleton corresponding relation, and the step split Hanzi component is as follows:
Step (C1), to the optimum component identification result obtained in step (2), Canny edge detection algorithm is used to obtain the profile of parts; Comprise filtering and noise reduction, enhancing and detection three step, use dual-threshold voltage detection and edge conjunction to complete whole testing process to the result obtained after Canny process;
In fact there is not the link information of forerunner and follow-up marginal point in the profile obtained in step (C2), step (C1), therefore utilize border following algorithm to find the point chain organizational form of corresponding profile, utilize the contour following algorithm based on eight connectivity region, object edge is reorganized, obtain the objective contour information of chain, after storage, may be used for the segmentation of subsequent parts;
Step (C3), in step (2) through the optimal principle obtain input the skeleton of Chinese character and the corresponding relation of target component after, find the corresponding relation of point and skeleton, utilize the edge point set with chain form obtained, the point corresponding to parts skeleton is extracted and connects in the position disconnected, the result of forming member segmentation.
Further, described step (4) is theoretical according to gold lattice, and the step judged Hanzi structure is as follows:
Step (D1), Chinese character are formed by various component combination, according to the relative position relation between parts, common Hanzi structure can be divided into 13 kinds, the palace lattice principle utilizing Chinese character to form, describe the architectural feature of Chinese character, adopt the gold case theory based on nine grids theoretical foundation is improved, utilize golden section ratio that original grid is divided into 9 grids, by rational for Chinese character layout in target square frame, Hanzi structure is analyzed; Build structural determination criterion, structural determination rule is converted into algorithm feasible on computing machine, judges relation between Hanzi component fast by index value method.
The present invention's advantage is compared with prior art:
(1) the present invention is in image component identification, service condition probability description structural relation, the local feature calculating two neighbours' strokes of neighbours each other helps identify the Hanzi component of input, use the best component identification result of optimum combination policy selection, improve discrimination and the accuracy rate of parts.
(2) the present invention uses the architectural feature of gold lattice theoretical description Chinese character, utilize golden section ratio that original grid is divided into 9 palace lattice, build structural determination criterion feasible on computers, Hanzi component structure is judged, accurately can judge the structure of Chinese character.
Accompanying drawing explanation
Fig. 1 is overall process schematic diagram of the present invention;
Fig. 2 is candidate's stroke result figure that parts of the present invention generate;
Fig. 3 is the Hanzi component recognition result figure obtained based on the optimal principle of the present invention;
Fig. 4 is point of the present invention and the corresponding schematic diagram of skeleton point;
Fig. 5 is the result figure that Hanzi component of the present invention segmentation generates;
Fig. 6 is gold lattice dividing method of the present invention and palace lattice weight setting figure;
Fig. 7 is that Hanzi component of the present invention merges and structural determination process flow diagram;
Fig. 8 is Hanzi structure result of determination figure of the present invention.
Embodiment
Below in conjunction with accompanying drawing and example, the present invention is described in further detail:
The invention process process comprises four key steps: the modeling of parts statistical framework also generates candidate's stroke, generates identification component, Hanzi component segmentation and Hanzi structure based on gold lattice theory judge based on the optimal principle.
As shown in Figure 1, the present invention is implemented as follows:
Step one: the modeling of Hanzi component statistical framework and candidate's stroke generate:
Carry out image thinning and skeletal extraction to the parts picture in standard part base, the point of crossing between the end points of detection stroke and stroke, as unique point, obtains initial pen section by the line between unique point; Merged the initial pen section obtained by interactive operation, form the stroke of the Hanzi component of standard of comparison.Think that the stroke of parts all obeys one 4 dimension Gaussian distribution in the present invention, be expressed as X ~ N (μ, Σ), this 4 dimensional vector is obtained by 4 dimensional vector value weightings of point each on stroke, use in the present invention that Gabor filtering detects at each o'clock 0, π/4, pi/2, response on 3 π/4 place four directions.For the Chinese character picture S of the secondary input of Hanzi component C to be matched and, the joint probability shown in use formula (1) represents the similarity between them, wherein r iand s irepresent the stroke in parts and the stroke in input Chinese character respectively.
P r(S=C)≡P r(s 1=r 1,s 2=r 2,...,s n=r n) (1)
Afterwards, represent joint probability distribution by conditional probability, obtain the method for calculating unit similarity, then formula (1) is converted into shown in formula (2):
P r ( S ) = P r ( s 1 , s 2 , ... , s n ) = ... = Π i = 1 n P r ( s i | s 1 , s 2 , ... , s i - 1 ) - - - ( 2 )
Joint probability density can be calculated by formula (2), but computation complexity is higher, use a stroke that principle of maximum entropy has the greatest impact to each stroke as neighbours in the present invention, approximate description scheme relation, formula (2) is further converted to formula (3), and strong multi-factor structure transformation is one group of diadactic structure relation.
P r ( S ) ≈ Π i = 1 n P r ( s i | n e i ( s i ) ) - - - ( 3 )
When generating candidate's stroke, help by the local feature of two the neighbours' strokes calculating neighbours each other the Hanzi component identifying input, local feature comprises center relative position, differential seat angle and length ratio etc.When identifying certain target component, first each stroke of corresponding component is obtained, one group of possible solution is generated for each stroke, these solutions are likely initial pen sections, also may be the combined result of initial pen section, the rule defining the combination of initial pen section is that two pen sections join end to end and direction difference is no more than 15 °, or one of them section is enough little, and such two pen sections are combined into possible stroke coupling solution and join in the queue of candidate's stroke.Fig. 2 illustrates the stroke for the overstriking in target component " bar ", and candidate's stroke of input Chinese character picture " father " generates situation.
Step 2: generate identification component based on the optimal principle:
Obtain candidate's stroke set corresponding to each stroke in target component in step one after, by optimal combination strategy, from candidate's stroke set, select one group of candidate's stroke, make the similarity of target component and candidate's stroke set maximum.This step is by building search graph and principle of optimality realization.In search graph, each stroke marked of parts to be matched is shown in every list, and the every a line in often arranging is expressed as candidate's stroke that the stroke for these parts is generated by the initial pen section inputting Chinese character.Finding optimum solution is exactly find a point from each row, from first row to last arrange likely solve the maximum solution of similarity solution.
When searching for certain unicursal of coupling, if candidate's stroke to be matched is taking conflict with the candidate's stroke chosen above to the initial pen section of input Chinese character, then this candidate's pen section can not be selected.In addition, if when mating certain unicursal, if be selected as neighbours' stroke of this unicursal, then need by probability calculation and consider that local feature carrys out description similarity.Dynamic programming is used to find the recognition result of optimum combination as input Hanzi component.The optimal principle obtains the schematic diagram of final recognition result as shown in Figure 3, for input Hanzi component " really ", for several parts " mouth ", " wood ", " people ", " field " and " soil ", all possible solution can be obtained, by optimum combination strategy, can show that " field " and " wood " is the optimum component identification result of this input Chinese character " really ".
Step 3: based on profile and skeleton corresponding relation, Hanzi component is split:
Obtain optimum component identification result in step 2 after, first use Canny contour detecting algorithm to obtain the profile of parts, this process comprises filtering and noise reduction, enhancing and detection three step.Use convolution operation to carry out filtering to original image, formula (4) and formula (5) give the gaussian kernel function of the peacekeeping two dimension used in convolution operation.
K = 1 2 π σ e - x 2 2 σ 2 - - - ( 4 )
K = 1 2 πσ 2 e - x 2 + y 2 2 σ 2 - - - ( 5 )
There is not the link information of forerunner and follow-up marginal point in the profile after Canny algorithm process, therefore utilizing the border following algorithm based on 8 connected regions to find the point chain organizational form of corresponding profile, concrete method is: (1) determines starting point.Find image the most upper left pixel value be 1 point be starting point, store its coordinate figure and be designated as P 0; (2) from P 0point 0 neighborhood starts, and counterclockwise detects its 8 neighborhood point value, find first pixel value be 1 point, store its coordinate and be designated as P 1, claim P 0for P 1first object neighborhood point; (3) often walk after and all search for from next the neighborhood point of the first object neighborhood point found a little, find next target neighborhood point; (4) circulation step (3), until P jfirst object neighborhood point be P 0and P j+1first object neighborhood point be P 1time, follow the tracks of and terminate.After border following algorithm process, object edge can be reorganized, obtain the objective contour information of chain, after storage, can be used for doing the segmentation of subsequent parts.
After the optimal principle obtains the input skeleton of Chinese character and the corresponding relation of target component, find the corresponding relation of point and skeleton, like this after successfully calculating corresponding relation, the point corresponding to parts skeleton is extracted and connects in the position disconnected, the result of forming member segmentation.As shown in Figure 4, in Fig. 4, (c) (d) represents the result that the point of " arrow " and "Yes" two cross section to be split is corresponding with skeleton point to its process respectively.Due to the present invention adopt algorithm find to correspondence time point is marked, the value of mark is consistent with the numbering from its nearest skeleton point, and whole profile is the storage of chain form, after traversal, the point that the point finding same numbering to disconnect should separate exactly, is connected to the point next time occurring same numbering, just can realizes the separation of the parts of intersection like this after the connection of iteration at point of crossing place preferably.Figure 5 shows that the result of " fortune ", " foot ", " monarch " and " hindering " four word parts segmentations.
Step 4: theoretical according to gold lattice, judges Hanzi structure:
Chinese character is formed by various component combination, and the relative position relation between parts determines feature of Chinese characters structure.Between parts, relative position is fixed, whole Chinese character literary style has very strong spatial layout feature to exist, present invention employs the gold case theory based on nine grids theoretical foundation is improved, by analyzing a large amount of Hanzi structure, sum up golden section approach for analyzing structure of Chinese character, as shown in Figure 6, the present invention uses the golden section approach for analyzing structure of Chinese character of " first four methods " for gold lattice dividing method and palace lattice weight setting.
In the parts segmentation described by step 3, be actually and Chinese character has been splitted into each subdivision in units of parts, and these subdivisions of merging that the judgement of structure is exactly recurrence finally obtain an overall process, in the process merged, judge the structural relation of each two parts merged, both can obtain the hierarchy information that Chinese character is formed.Two subdivisions of choosing of recurrence of the present invention merge, first need to choose most suitable two to merge, pass through observation analysis, the parts majority merged has bounding box position to close on, and horizontal (erecting) is to when merging, the bounding box border of corresponding Vertical dimension (level to) has the ratio close to 1, like this when selecting parts to be combined, calculate the adjacent distance of bounding box and bounding box length between two, the similarity of width, the most close two are selected to merge, Fig. 7 is that Hanzi component merges the process flow diagram with structural determination, recurrence is gone down always, terminate time to the last all parts merge into one.Fig. 8 be " prisoner ", " ", " stubble " and " caye " the structural determination result schematic diagram of four Chinese characters.
The content be not elaborated in instructions of the present invention belongs to the known prior art of professional and technical personnel in the field.
The above is only the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (5)

1., based on Hanzi component segmentation and the structural determination method of component identification, it is characterized in that step is as follows:
Step (1), the modeling of parts statistical framework, describe the stroke in Hanzi component and structural relation, generates the candidate's stroke marking stroke in matching block;
Step (2), according to the candidate's stroke obtained in step (1), utilize the optimum combination result of Dynamic Programming Idea and optimum combination strategy generating parts, as the result of Hanzi component identification;
Step (3), according to the Hanzi component recognition result obtained in step (2), based on profile and skeleton corresponding relation, Hanzi component to be split;
Step (4), according to the Hanzi component segmentation result obtained in step (3), go out layout and the Hanzi structure of parts according to the theoretical accurate analysis of gold lattice, structural determination is carried out to Chinese character.
2. a kind of segmentation of the Hanzi component based on component identification according to claim 1 and structural determination method, is characterized in that: describe the stroke in Hanzi component and structural relation in described step (1), the particular content generating candidate's stroke is as follows:
Step (A1), the pre-service of extraction skeleton is carried out to 687 Hanzi component pictures in existing standard Chinese character part library, detect point of crossing between the end points of stroke and stroke as unique point, obtain initial pen section by the line between unique point; Merged the initial pen section obtained by interactive operation, obtain the stroke of the Hanzi component marked;
Step (A2), direction character according to Chinese character unit stroke, carry out Gabor characteristic extraction to Chinese character unit stroke, complete the statistical modeling of Chinese character unit stroke; Use principle of maximum entropy, by utilizing approximate construction relation, neighbours' stroke is chosen, approximate construction relation is the structural relation that the structural relation of a stroke and other all strokes in Hanzi component is approximately relative to oneself neighbour, by conditional probability description scheme relation;
Step (A3), by the local feature of two the neighbours' strokes calculating neighbours each other help identify input Hanzi component, local feature comprises center relative position, differential seat angle and length ratio;
Step (A4), when identifying certain target component, first each stroke of corresponding component is obtained, one group of possible solution is generated for each stroke, these solutions are likely initial pen sections, also may be the combination of initial pen section, the rule defining the combination of initial pen section is that two pen sections join end to end and direction difference is no more than 15 °, or one of them section is enough little, and such two pen sections are combined into possible stroke coupling solution and join in the queue of candidate's stroke.
3. a kind of Hanzi component based on component identification segmentation according to claim 1 and structural determination method, is characterized in that: obtain the step of recognition result based on the optimal principle in described step (2) as follows:
Step (B1), structure search graph, each stroke marked of parts to be matched is shown in every list in the figure, and the every a line in often arranging is expressed as candidate's stroke that the stroke for these parts is generated by the initial pen section inputting Chinese character, such matching problem is converted into the search procedure of figure, look for each row all to find a point, from all feasible solutions that first row finds last to arrange, solve the maximum solution of similarity;
In step (B2), step (B1), the search procedure rule of figure is as follows: first, when mating some strokes, if candidate's stroke to be matched is taking conflict with the candidate's stroke chosen above to the initial pen section of input Chinese character, then this candidate's pen section can not be selected; Second, when mating some strokes, if be selected as the stroke of the neighbours of this stroke, then to be calculated by conditional probability, and the local feature information stored before adopting, center relative position relation, the stroke length ratio of candidate's stroke that calculate this candidate's stroke to be matched and mated above, and compare to come description similarity with the local feature information stored;
Step (B3), may separating of obtaining each Hanzi component in step (B1), find the recognition result of optimum combination as input Hanzi component; Dynamic Programming Idea is used to solve this problem, first this problem is described as a knapsack problem, knapsack capacity is initial hop count order of input Chinese character, each possible component identification solution corresponds to a mark array and identifies this possible solution and take situation to input Chinese character initial pen section, this problem equivalent puts into knapsack in how choosing several the article do not conflicted, and knapsack just can be made to pile as much as possible.
4. a kind of segmentation of the Hanzi component based on component identification according to claim 1 and structural determination method, is characterized in that: based on profile and skeleton corresponding relation in described step (3), the step split Hanzi component is as follows:
Step (C1), to the optimum component identification result obtained in step (2), Canny edge detection algorithm is used to obtain the profile of parts;
In fact there is not the link information of forerunner and follow-up marginal point in the profile obtained in step (C2), step (C1), therefore utilize border following algorithm to find the point chain organizational form of corresponding profile, may be used for the segmentation of subsequent parts after the objective contour information storage of the chain obtained;
Step (C3), in step (2) through the optimal principle obtain input the skeleton of Chinese character and the corresponding relation of target component after, find the corresponding relation of point and skeleton, the point corresponding to parts skeleton is extracted and connects in the position disconnected, the result of forming member segmentation.
5. a kind of segmentation of the Hanzi component based on component identification according to claim 1 and structural determination method, is characterized in that: theoretical according to gold lattice in described step (4), the step judged Hanzi structure is as follows:
Step (D1), Chinese character are formed by various component combination, according to the relative position relation between parts, common Hanzi structure can be divided into 13 kinds, the palace lattice principle utilizing Chinese character to form, the architectural feature of Chinese character is described, adopt based on the gold case theory that nine grids theoretical foundation is improved, Hanzi structure is analyzed; Build structural determination criterion, structural determination rule is converted into algorithm feasible on computing machine, judges relation between Hanzi component fast by index value method.
CN201510424057.9A 2015-07-17 2015-07-17 A kind of Hanzi component segmentation and structural determination method based on part identification Active CN104992161B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510424057.9A CN104992161B (en) 2015-07-17 2015-07-17 A kind of Hanzi component segmentation and structural determination method based on part identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510424057.9A CN104992161B (en) 2015-07-17 2015-07-17 A kind of Hanzi component segmentation and structural determination method based on part identification

Publications (2)

Publication Number Publication Date
CN104992161A true CN104992161A (en) 2015-10-21
CN104992161B CN104992161B (en) 2018-04-06

Family

ID=54303974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510424057.9A Active CN104992161B (en) 2015-07-17 2015-07-17 A kind of Hanzi component segmentation and structural determination method based on part identification

Country Status (1)

Country Link
CN (1) CN104992161B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503706A (en) * 2016-09-23 2017-03-15 北京大学 The method of discrimination of Chinese character pattern cutting result correctness
CN108491444A (en) * 2018-02-12 2018-09-04 龙马智芯(珠海横琴)科技有限公司 The generation method and device of solution

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968619A (en) * 2012-11-13 2013-03-13 北京航空航天大学 Recognition method for components of Chinese character pictures
CN104182748A (en) * 2014-08-15 2014-12-03 电子科技大学 A method for extracting automatically character strokes based on splitting and matching
CN104200210A (en) * 2014-08-12 2014-12-10 合肥工业大学 License plate character segmentation method based on parts

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968619A (en) * 2012-11-13 2013-03-13 北京航空航天大学 Recognition method for components of Chinese character pictures
CN104200210A (en) * 2014-08-12 2014-12-10 合肥工业大学 License plate character segmentation method based on parts
CN104182748A (en) * 2014-08-15 2014-12-03 电子科技大学 A method for extracting automatically character strokes based on splitting and matching

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
何坚: "手写体汉字识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
冯万仁: "基于部件复用的分级汉字字库的设计与实现", 《万方数据知识服务平台》 *
刘敏等: "基于骨架图匹配的汉字变形技术", 《北京航空航天大学学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503706A (en) * 2016-09-23 2017-03-15 北京大学 The method of discrimination of Chinese character pattern cutting result correctness
CN106503706B (en) * 2016-09-23 2019-06-07 北京大学 The method of discrimination of Chinese character pattern cutting result correctness
CN108491444A (en) * 2018-02-12 2018-09-04 龙马智芯(珠海横琴)科技有限公司 The generation method and device of solution

Also Published As

Publication number Publication date
CN104992161B (en) 2018-04-06

Similar Documents

Publication Publication Date Title
Qiao et al. LGPMA: complicated table structure recognition with local and global pyramid mask alignment
Yuan et al. A large chinese text dataset in the wild
CN105931295B (en) A kind of geologic map Extracting Thematic Information method
Pal et al. Touching numeral segmentation using water reservoir concept
Saba et al. Methods and strategies on off-line cursive touched characters segmentation: a directional review
Saabni et al. Language-independent text lines extraction using seam carving
CN102663382B (en) Video image character recognition method based on submesh characteristic adaptive weighting
CN102254144A (en) Robust method for extracting two-dimensional code area in image
Wang et al. Window detection from mobile LiDAR data
CN102968619B (en) Recognition method for components of Chinese character pictures
Lian et al. Weakly supervised road segmentation in high-resolution remote sensing images using point annotations
Li et al. Building footprint generation through convolutional neural networks with attraction field representation
Li et al. Urban building damage detection from very high resolution imagery by One-Class SVM and shadow information
Mori et al. Global feature for online character recognition
CN106570518A (en) Chinese and Japanese handwritten text identification method
Sharifi Noorian et al. Detecting, classifying, and mapping retail storefronts using street-level imagery
Guo et al. Exploring GIS knowledge to improve building extraction and change detection from VHR imagery in urban areas
Yuan et al. Weakly supervised road network extraction for remote sensing image based scribble annotation and adversarial learning
Liao et al. BCE-Net: Reliable building footprints change extraction based on historical map and up-to-date images using contrastive learning
CN104992161B (en) A kind of Hanzi component segmentation and structural determination method based on part identification
Coustaty et al. A new adaptive structural signature for symbol recognition by using a galois lattice as a classifier
CN103235945A (en) Method for recognizing handwritten mathematical formulas and generating MathML (mathematical makeup language) based on Android system
Xia et al. Building instance mapping from ALS point clouds aided by polygonal maps
CN103020631A (en) Human movement identification method based on star model
Vakalopoulou et al. Integrating edge/boundary priors with classification scores for building detection in very high resolution data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant