CN101930299B - Method for intelligently generating Chinese character without character library - Google Patents

Method for intelligently generating Chinese character without character library Download PDF

Info

Publication number
CN101930299B
CN101930299B CN201010263032.2A CN201010263032A CN101930299B CN 101930299 B CN101930299 B CN 101930299B CN 201010263032 A CN201010263032 A CN 201010263032A CN 101930299 B CN101930299 B CN 101930299B
Authority
CN
China
Prior art keywords
chinese character
boundary rectangle
primitive
chinese
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201010263032.2A
Other languages
Chinese (zh)
Other versions
CN101930299A (en
Inventor
皮佑国
段骋森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201010263032.2A priority Critical patent/CN101930299B/en
Publication of CN101930299A publication Critical patent/CN101930299A/en
Application granted granted Critical
Publication of CN101930299B publication Critical patent/CN101930299B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Controls And Circuits For Display Device (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention relates to a method for intelligently generating a Chinese character without a character library. A system is not provided with the character library but only a basic element library; and a task of solving mapping knowledge of the basic element of the Chinese character is decomposed into five steps by analyzing composition characteristics of the Chinese character. The method comprises the following steps of: reasoning a structure tree of the Chinese character according to input codes of the Chinese character; decomposing the Chinese character with a multilevel structure into a series of Chinese characters with a first-level structure by traversing the structure tree of the Chinese character; determining the size and position of external rectangles of compositions of the Chinese character by analyzing external rectangular puzzles of the compositions of the Chinese character; comparing the standard external rectangle of the basic element with the external rectangle of the basic element in the Chinese character to obtain the transformation coefficient of the basic element of the Chinese character; performing topological transformation on the basic element of the Chinese character to obtain the Chinese character; and therefore, the Chinese character is intelligently generated. The problem that the positioning size and the rectangle size of the external rectangle of each composition of the Chinese character cannot be determined by inputting the codes of the Chinese character in the prior art is solved; and the Chinese characters entering a computer are not limited by the character library any more.

Description

A kind of Chinese-character intelligent generation method without character library
Technical field
The invention belongs to areas of information technology, relate to a kind of computing machine without character library Chinese-character intelligent generation method.
Background technology
What at present, the Chinese information processing in various computer operating systems adopted is Hanzi font library mode.Exist (1) and be difficult to set up Chinese character information standard steady in a long-term; (2) do not meet the law of inventing character of Chinese character, impact is to cultural succession; (3) disconnect with Chinese character teaching, greatly wasted social resources; (4) cannot well meet the application demand of entire society, many Chinese characters cannot enter computing machine; (5) information entropy is high, is the Word message system that efficiency is minimum.
The drawback existing for Hanzi font library, many knowledgeable people are exploring the approaches and methods solving, and Chinese character generation technique is studied and is inquired into, and some methods of using stroke or assembly to generate Chinese character are arisen at the historic moment.
Publication number is CN 1277377A, within open day, is that the Chinese invention patent on Dec 20th, 2000 discloses a kind of " automatic Chinese character font forming method and device thereof ".This invention is expressed as take Hanzi component as operand by Chinese character, and Hanzi component is designed to 512 and be divided into independence, left and right, upper and lower and surround 4 classes.Between parts, structural relation is the mathematic(al) representation based on sign of operation, represents the structural relation between parts respectively by ten kinds of operators of lr ud ld lu ru le re ue de we.Above-mentioned component coding and structural relation are deposited in special control card, read work environmental parameter from special control card, and extract Expressions of Chinese Characters, then extract described 512 Chinese character unit stroke end points coordinates, judgement stroke type by Chinese-character stroke abstracting method, generate Chinese character framework.Stroke generates and adopts B-spline function, forms Chinese character pattern.This patent is classified (4 class) to the structure of parts itself, again the relation between parts (structure) is described, and has not both met the thought of Chinese character study, again loaded down with trivial details repetition; Generate Chinese character and adopt stroke, complicated hard to tackle, still depend on Hanzi font library.
Publication number is CN 1294357A, within open day, discloses a kind of " stroke method for generating Chinese characters " for the Chinese invention patent in May 9 calendar year 2001.This invention, with the law of inventing character of phonogram, consists of the principle of Chinese character five kinds of basic strokes, adopt recursive definition algorithm, Chinese character is summarized as to 52 Chinese word alphabetic words, and automatically generates various fonts and font with holographic Expressions of Chinese Characters.But 52 Chinese word alphabetic words do not have character cultural foundation, and stroke theory of constitution is hard to tackle; Although recursive algorithm and holographic expression formula can be described Chinese character, complicated hard to tackle, be difficult to promote the use of, and still need Hanzi font library.
Having authorized patent of invention " the combiner word-formation method in Chinese character electronization " (CN1253781C) disclosed is a kind of method of intelligent coinage, the method thinks that Chinese character is become to be grouped into by Chinese character, and all Chinese character compositions are all the topological transformation of Chinese character base unit in different Chinese character structure; And Chinese character base unit is the fundamental element of Chinese character, mainly by pictograph self-explanatory characters word and symbol thereof, formed, reflected the representation features of Chinese character.And another application for a patent for invention of inventor " a kind of Hanzi coding input method based on structure and primitive " (CN101551711A) has solved Hanzi structure and primitive coding problem on the basis of last patent CN1253781C, propose Chinese character and can be divided into 17 kinds of structures such as integral body, left and right.Chinese character is ideographic language, and Chinese character and primitive thereof can be made its boundary rectangle.Geometric figure position between the boundary rectangle that Chinese character composition (being each level structure of Chinese character and each primitive) forms is the structure of Chinese character.Unanimously different from the size and geometric of letter in alphabetic writing (word), size, position and the form of same primitive in different Chinese character all may be different.Although can solve size, position and the form different problem of same primitive in different Chinese character by topological transformation, not yet solve in CN101551711A patented claim and how to realize the Inference Problems of the above-mentioned transformation parameter in intelligent coinage according to encoding.
Summary of the invention
The shortcoming that the object of the invention is to overcome prior art is with not enough, a kind of Chinese-character intelligent generation method without character library is provided, how to have solved according to based on " a kind of Hanzi coding input method based on structure and primitive " (CN101551711A) in the unsolved location dimension of boundary rectangle and the problem of rectangular dimension of determining each constituent that forms Chinese character by input encode Chinese characters for computer, the primitive of the defined of encoding is carried out topological transformation and realizes intelligent coinage.Chinese character is the large character set of development, new Chinese character will constantly produce, the present invention inquires into the primitive conversion knowledge in emerging Chinese character by the primitive conversion knowledge of existing Chinese character, therefore can there is no character library and only have primitive storehouse, encode Chinese characters for computer by input automatically generates Chinese character, thereby realizes intelligent coinage truly.
The present invention is achieved through the following technical solutions above-mentioned purpose: this is without the Chinese-character intelligent generation method of character library, and the coding by Chinese character automatically generates Chinese character, and this process comprises the following steps:
S1, according to the encode Chinese characters for computer of input, obtain the structure tree of Chinese character;
S2, according to the structure tree of Chinese character, the picture mosaic that structure is comprised of the boundary rectangle of each level structure of Chinese character and each primitive;
S3, determine each level structure of Chinese character boundary rectangle position and shape; Determine position and the shape of the boundary rectangle of each primitive in target characters;
Position and the shape of S4, each primitive of obtaining according to step S3 boundary rectangle in target characters, determine topological transformation coefficient;
S5, the topological transformation coefficient that utilizes topological transformation method and step S4 to obtain, transform to Chinese character base unit in target characters and go, and completes the conversion of each level structure and each primitive in target characters.
Described in step S3, the boundary rectangle of each primitive in target characters is normalization boundary rectangle
Figure BSA00000243848300021
wherein W, H are respectively the height and width of Chinese character, and take the Chinese character upper left corner and set up a rectangular coordinate system as initial point, and direction is to the right x axle, and downward direction is y axle, and x, y are the coordinate of primitive left upper apex, the height and width that w, h are primitive.
Described in step S3, determine the position of the boundary rectangle of each primitive in target characters and be shaped as: first determining the maximum possible boundary rectangle of Chinese character base unit in Chinese character; Then the described maximum possible boundary rectangle of take is benchmark, adjusts size and the aspect ratio of Chinese character base unit, makes Chinese character base unit be positioned at the centre position of described maximum possible boundary rectangle.
Topological transformation method described in step S5 is affine transformation method, and affined transformation is defined as:
AW + t = a A b A c A d A W + t = a A b A c A d A W x W y + t x t y
Wherein, matrix A represents linear transformation, the element a on principal diagonal a, d adA represents respectively the scaling of source images in x, y direction, the element b on minor diagonal a, c abe illustrated respectively in twiddle factor in x, y direction, the component t of vectorial t x, t ybe illustrated respectively in the translational movement in mapping space x, y direction.
Encode Chinese characters for computer described in step S1 is comprised of primitive coding and structured coding, each primitive configuration is on a primitive key assignments being combined by two key arrangement, get corresponding primitive key assignments as primitive coding, structured coding obtains according to the basic structure of Chinese character; Described Chinese character basic structure comprises: one-piece construction; Upper left surrounds structure, lower-left surrounds structure, upper right encirclement structure, upper three guarantees closed structure, lower three guarantees closed structure, left three guarantees closed structure and entirely surrounds structure; Frame embedding structure, mutual embedding structure; Product word structure, dual stack structure; Left and right structure, left, center, right structure, multiple row structure; Up-down structure, Up-Center-Down Structure and arrange structure more.
In the described Chinese-character intelligent generation method without character library, from the standard boundary rectangle R of each primitive 0=(x 0, y 0, w 0, h 0) transforming to each primitive described in step S3 boundary rectangle R=(x, y, w, h) in target characters, its topological transformation coefficient is:
xo = x - x 0 · w w 0 xs = w w 0 yo = y - y 0 · h h 0 ys = h h 0
Wherein, xo is x direction of principal axis translational movement, and xs is width decrement, and yo is y direction of principal axis translational movement, and ys is high compression amount.
The present invention is by the analysis to structural characteristics of Chinese character, by the Task-decomposing that solves Chinese character base unit mapping, be that five steps carries out: according to the encode Chinese characters for computer of input, by reasoning, obtain the structure tree of Chinese character, structure tree by traversal Chinese character becomes the Chinese character decomposition of multilevel hierarchy the Chinese character of a series of primary structures, by analyzing the boundary rectangle picture mosaic of Chinese character constituent (being Chinese character base unit), determine size and the position of the Chinese character constituent boundary rectangle forming, the conversion coefficient of Chinese character base unit is obtained in the conversion of the standard boundary rectangle boundary rectangle in target characters to Chinese character base unit by Chinese character base unit, by being carried out to topological transformation, Chinese character base unit obtains Chinese character, thereby realize intelligent coinage.The present invention has following advantage and effect with respect to prior art:
1, according to the coding of Chinese character input, by reasoning, obtain the structure tree of Chinese character, structure tree by traversal Chinese character forms to obtain the rectangular dimension of Chinese character composition boundary rectangle and the position dimension of this rectangle, by primitive size and the conversion boundary rectangle size of target and existing Chinese character base unit topological transformation knowledge, inquire into and need to convert the primitive conversion knowledge of primitive in coinage conversion, realize topological transformation and realize intelligent coinage.Do not rely on Hanzi font library, as long as set up limited Chinese character base unit storehouse, just can produce intimate unlimited Chinese character.
2, utilize Chinese character to using pictograph, self-explanatory characters' word and symbol thereof as next " making " word of Chinese character base unit (fundamental element), show " expressing the meaning " feature of Chinese character especially, passed on character cultural.
The essence of 3, coinage process is " writing " rather than " word selection ", has embodied and has used electronic tool and electronic media as the writing process of writing medium and instrument, has passed on character cultural.
4, the process of topological transformation is that computer background completes, and manually inputs the process of Chinese character and utilizes the knowledge of functional literacy to carry out completely, has realized the seamless link with functional literacy.
5, Chinese character base unit only has 1085, and Hanzi structure only has 18 kinds, can set up standard steady in a long-term, Chinese character development, and structure, primitive and coding standard are steady in a long-term constant; Therefore the present invention has good stability, can adapt to well the development and change of Chinese character.
Accompanying drawing explanation
Fig. 1 is Chinese character tree structure and Symbolic Representation thereof;
Fig. 2 is the boundary rectangle picture mosaic of Chinese character " mushroom ";
Fig. 3 is Chinese character base unit boundary rectangle schematic diagram;
Fig. 4 is monolithic devices Chinese character tree structure;
Fig. 5 surrounds type Chinese character tree structure forward;
Fig. 6 is Overlapping Chinese character tree structure;
Fig. 7 is line tandem type Chinese character tree structure;
Fig. 8 is Chinese character structure tree;
Fig. 9 is Chinese character
Figure BSA00000243848300042
boundary rectangle picture mosaic;
Figure 10 is Chinese character
Figure BSA00000243848300043
tertiary structure derivation graph;
Figure 11 is Chinese character
Figure BSA00000243848300044
secondary structure derivation graph;
Figure 12 is Chinese character
Figure BSA00000243848300045
primary structure derivation graph;
Figure 13 is Chinese character
Figure BSA00000243848300046
final derivation design sketch;
Figure 14 is the derivation design sketch of some other Chinese character.
Embodiment
Below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited to this.
Embodiment
Basic foundation of the present invention is: based on " a kind of Hanzi coding input method based on structure and primitive " (CN101551711A) basic structure of determined 17 kinds of Chinese characters, the basic structure of Chinese character is expanded into 18 kinds: one-piece construction; Surround 7 kinds, structure (upper left surrounds structure, lower-left surrounds structure, upper right encirclement structure, upper three guarantees closed structure, lower three guarantees closed structure, left three guarantees closed structure and entirely surrounds structure); 2 kinds of mosaic textures (frame embedding structure and mutually embedding structure); 2 kinds of overlay structures (product word structure, dual stack structure); 3 kinds, horizontally-arranged structure 3 kinds (left and right structure, left, center, right structure and multiple row structure) and tandem structure (up-down structure, Up-Center-Down Structure and arrange structure) more.The structure of Chinese character can form hierarchical structure by above-mentioned basic structure, and Chinese character can have one or more levels structure, and every level structure comprises one or more primitives and determines by encode Chinese characters for computer is unique; Encode Chinese characters for computer is comprised of structured coding and primitive coding, and each primitive configuration, on a primitive key assignments being combined by two key arrangement, is got corresponding primitive key assignments as primitive coding.
Input of the present invention is encode Chinese characters for computer, and output is the Chinese character corresponding with this input coding, and the step of enforcement is:
The first step, according to the encode Chinese characters for computer of input, obtains the structure tree of Chinese character.
According to above-mentioned encode Chinese characters for computer information, can determine the structure of the structure progression of Chinese character, every grade of Chinese character and the primitive that this structure comprises completely.Therefore, just can draw the structure tree of this Chinese character.For example, Chinese character " mushroom ", the tree structure of its symbolically is as shown in Figure 1.The structure tree of Chinese character has just been obtained to the coding of Chinese character by the first root traversal of depth-first; Otherwise, from the coding of Chinese character, also can obtain the structure tree of Chinese character.
Second step, according to the structure tree of Chinese character, the picture mosaic that structure is comprised of the boundary rectangle of each level structure of Chinese character and each primitive.
Chinese character is ideographic language, and Chinese character and Chinese character base unit can be spelled and form a font with its boundary rectangle, Chinese character " mushroom " for example, and its boundary rectangle picture mosaic is as shown in Figure 2.
The 3rd step, determine each level structure of Chinese character boundary rectangle position and shape; Determine position and the shape of the boundary rectangle of each primitive in target characters.
If the height and width of Chinese character are respectively W and H, take its upper left corner as initial point, direction is to the right x axle, downward direction is y axle, can set up a rectangular coordinate system.Because Chinese character base unit all intercepts from Chinese character, so this coordinate system is also applicable to Chinese character base unit.The boundary rectangle of Chinese character base unit is defined as a four-tuple:
R=(x,y,w,h) x∈[0,W],y∈[0,H],w∈[0,W-x],h∈[0,H-y]
Wherein x and y represent the left upper apex coordinate of Chinese character base unit boundary rectangle, and w and h represent respectively the wide and high of Chinese character base unit boundary rectangle.The boundary rectangle of a Chinese character base unit as shown in Figure 3.
In same Chinese character, the boundary rectangle information of Chinese character base unit can change with the change of Chinese character font size, but the topological structure of Chinese character base unit in Chinese character is constant, for unified, analyzes, need to be by the boundary rectangle normalization of Chinese character base unit.Take Fig. 3 as example, the boundary rectangle of Chinese character base unit is wherein normalized, the boundary rectangle obtaining is: the normalization boundary rectangle that is called Chinese character base unit.The boundary rectangle occurring hereinafter, if do not specialized, all aim one is changed boundary rectangle.
The method of obtaining Chinese character base unit topological transformation coefficient is as follows: the standard boundary rectangle of supposing some Chinese character base units is R 0=(x 0, y 0, w 0, h 0), this Chinese character base unit is after conversion, and its boundary rectangle is R=(x, y, w, h), so from R 0topological transformation coefficient to R is:
xo = x - x 0 · w w 0 xs = w w 0 yo = y - y 0 · h h 0 ys = h h 0 - - - ( 1 )
Wherein, xo is x direction of principal axis translational movement, and xs is width decrement, and yo is y direction of principal axis translational movement, and ys is high compression amount.The standard boundary rectangle of Chinese character base unit is to obtain by scanning the normal pictures of Chinese character base unit, and the normal pictures of Chinese character base unit is in advance by artificial screening and making.
So, as long as know standard boundary rectangle information and its boundary rectangle information in a certain Chinese character of a Chinese character base unit, just can determine the topological transformation coefficient of this Chinese character base unit in this Chinese character.Because the standard boundary rectangle of Chinese character base unit can obtain, therefore ask the problem of Chinese character base unit topological transformation coefficient to be just converted into and ask the boundary rectangle of Chinese character base unit in Chinese character.
Boundary rectangle for Chinese character base unit in Chinese character, can pass through
Chinese character base unit image and Chinese character image are carried out to pattern match and obtain the transform data of corresponding Chinese character base unit in Chinese character, these transform datas are arranged, be generalized into Chinese character base unit conversion knowledge, and set up knowledge base and inference machine.During use, according to the structural motif information in input encode Chinese characters for computer, derive the boundary rectangle of Chinese character base unit in Chinese character, recycling formula (1) is obtained the topological transformation coefficient of Chinese character base unit.
The present invention determines position and the shape (being location dimension and the geomery of boundary rectangle) of the boundary rectangle of Chinese character base unit in target characters in two steps: first determine the maximum possible boundary rectangle of Chinese character base unit in Chinese character, then take this maximum possible boundary rectangle is benchmark, adjustment Chinese character base unit size, and Chinese character base unit is adjusted to the correct position in this maximum possible boundary rectangle, adjust size and the aspect ratio of Chinese character base unit, make Chinese character base unit be positioned at the centre position of this maximum possible boundary rectangle.
The aspect ratio of primitive in Chinese character is the important parameter of reflection primitive shape in Chinese character, is the height of primitive and wide ratio.If the boundary rectangle of Chinese character base unit is R=(x, y, w, h), the aspect ratio of Chinese character base unit is adjusted into r, the boundary rectangle after adjustment is , when adjusting the aspect ratio of primitive, there are three kinds of methods of adjustment:
Keep wide constant,
R ^ = ( x , y , w , w · r ) - - - ( 2 )
Keep high constant,
R ^ = ( x , y , h r , h ) - - - ( 3 )
Reduce large amount,
R ^ = ( x , y , w , w &CenterDot; r ) h w &GreaterEqual; r ( x , y , h r , w ) h w < r - - - ( 4 )
Operation placed in the middle refers to adjusts to the centre position of another boundary rectangle by a boundary rectangle on a certain change in coordinate axis direction.If two boundary rectangles are respectively R 1and R 2, so by R 1in X-direction, be centered to R 2in, the parameter of the new boundary rectangle R obtaining is:
x = x 2 + | w 1 - w 2 | 2 y = y 1 w = w 1 h = h 1 - - - ( 5 )
By R 1in Y direction, be centered to R 2in, the parameter of the new boundary rectangle R obtaining is:
x = x 1 y = y 2 + | h 1 - h 2 | 2 w = w 1 h = h 1 - - - ( 6 )
By R 1on XY direction of principal axis, be centered to R simultaneously 2in, be equivalent to first by R 1in X-direction, be centered to R 2, then by R 1in Y direction, be centered to R 2in.
Tiling operation refers to be expanded a boundary rectangle or is compressed in another one boundary rectangle on a certain change in coordinate axis direction.
If two boundary rectangles are respectively R 1and R 2, so by R 1in X-direction, tiling is to R 2in, the new boundary rectangle obtaining is:
R=(x 2,y 1,w 2,h 1) (7)
By R 1in Y direction, tiling is to R 2in, the new boundary rectangle obtaining is:
R=(x 1,y 2,w 1,h 2) (8)
By R 1on XY direction of principal axis, R is arrived in tiling simultaneously 2in, the new boundary rectangle obtaining is:
R=R 2(9)
In intelligent coinage system, Chinese character can be divided into primary structure Chinese character and multilevel hierarchy Chinese character by structure, for multilevel hierarchy Chinese character, its minor structures at different levels can be used as to a new Chinese character and treat, thereby be solving a series of primary structure Chinese characters by the Task-decomposing that solves each Chinese character base unit topological transformation coefficient in multilevel hierarchy Chinese character.Take " recklessly " as example, it comprises two-layer configuration, first order structure is left and right, and second level structure is upper and lower, uses method above will to solving of " recklessly ", be decomposed into solving of " Gu " and the solving " recklessly " by forming " Gu " and " moon " being comprised of " ten " and " mouth ".But the boundary rectangle of determined " ten " and " mouth " is all relative " Gu " rather than relatively " recklessly " in to " Gu " solution procedure, therefore must be by conversion, the boundary rectangle of " mouth " of " ten " is transformed to " recklessly " relatively.
For two boundary rectangle R 1and R 2if: and x 1≤ x 2, y 1≤ y 2, x 1+ w 1>=x 2+ w 2, y 1+ h 1>=y 2+ h 2, claim R 1comprise R 2, or R 2be contained in R 1.If R 2be contained in R 1, R 2to R 1relative boundary rectangle be:
R 21 = ( x 2 - x 1 w 1 , y 2 - y 1 h 1 , w 2 w 1 , h 2 h 1 ) - - - ( 10 )
If boundary rectangle R 1comprise R 2, R 2comprise R 3, and R 2to R 1relative boundary rectangle be R 21, R 3to R 2relative boundary rectangle be R 32, R so 3to R 1four parameters of relative boundary rectangle be respectively:
x 31 = x 21 + x 32 &times; w 21 y 31 = y 21 + y 32 &times; h 21 w 31 = w 32 &times; w 21 h 31 = h 32 &times; h 21 - - - ( 11 )
Like this, known that " ten " and " mouth " are with respect to the boundary rectangle of " Gu ", and " Gu " is with respect to the boundary rectangle of " recklessly ", just can calculate " ten " and " mouth " boundary rectangle with respect to " recklessly " according to formula (8).
The 4th step, position and the shape of the boundary rectangle of each primitive obtaining according to the 3rd step in target characters, determine topological transformation coefficient.
The method of conversion meets the requirement of topological transformation, below will adopt affined transformation as topological transformation method, and the method that the present invention adopts includes but not limited to affined transformation.If W presentation video, x represents a point in this image.Affined transformation is defined as:
AW + t = a A b A c A d A W + t = a A b A c A d A W x W y + t x t y
Wherein, matrix A represents linear transformation, element aA, dA on principal diagonal represents respectively the scaling of source images in x, y direction, element bA, cA on minor diagonal is illustrated respectively in twiddle factor in x, y direction, and component tx, the ty of vectorial t is illustrated respectively in the translational movement in mapping space x, y direction.Because Chinese character is Chinese characters, Founder is one of essential characteristic of Chinese character image, and the element in matrix of a linear transformation A on minor diagonal should be zero, and element aA, dA on principal diagonal just reflected that Chinese character base unit is mapped to the size and geometric in Chinese character.
The image of Chinese character base unit is known, and the image in target characters can obtain by the 3rd step.Utilize so in two width images four not the point of conllinear just can solve four unknown numbers in transformation equation, thereby obtain topological transformation coefficient.
The 5th step, the topological transformation coefficient that utilizes topological transformation method and the 4th step to obtain, transforms to Chinese character base unit in target characters and goes, and completes the conversion of each level structure and each primitive in target characters, thereby has realized intelligent coinage.
Because can be by solving the solving of the paired a series of primary structure Chinese characters of Task-decomposing of the conversion of Chinese character base unit in multilevel hierarchy Chinese character knowledge, so be first described solving the process of primary structure Chinese character at this.
In six large class formations of Chinese character, surround structure and mosaic texture, line structure and tandem structure have very large similarity on Chinese character forms, therefore merged into surround when solving, inlay class and line tandem class.Therefore, the Chinese character of all primary structures can be divided into Four types, that is: monolithic devices (being independent body type), encirclement mosaic, Overlapping and line tandem type.Below introduce respectively the method for solving of the primary structure Chinese character of this Four types.
For independent body type Chinese character, its tree structure as shown in Figure 4.Because all single characters are all Chinese character base units, the affined transformation coefficient of all single characters is all (0,1,0,1), therefore the boundary rectangle of primitive is exactly the standard boundary rectangle of Chinese character base unit in independent body type Chinese character, the affined transformation coefficient of the Chinese character base unit obtaining according to formula (1) is like this exactly (0,1,0,1).
For surrounding mosaic Chinese character, its common trait is an always framework of first Chinese character base unit, this framework can surround a Chinese character base unit or minor structure, or can embed one or several Chinese character base unit or minor structure, and the Chinese character base unit that claims this frame-type is enclosure body; Those besieged bodies encirclements or the Chinese character base unit or the minor structure that are embedded in enclosure body are called embedded body.Surround the tree structure of mosaic Chinese character as shown in Figure 5.The maximum possible boundary rectangle that surrounds each Chinese character base unit in mosaic Chinese character is directly determined by enclosure body.Primitive as enclosure body, the maximum boundary rectangle information of the part of itself and its encirclement is all kept in the knowledge base of expert system, in the part of while solving, enclosure body is taken out and be applied to knowledge corresponding to enclosure body from knowledge base and surrounding, can obtain the maximum boundary rectangle of each several part.For occlusion body, its maximum boundary rectangle is exactly its actual boundary rectangle in Chinese character; For embedded body, first by formula (4), adjust its aspect ratio, and then the boundary rectangle after adjusting is adjusted upward in its maximum boundary rectangle in XY side simultaneously, can obtain its actual boundary rectangle in Chinese character, then utilize formula (1) just can calculate the conversion coefficient of each several part.
For Overlapping Chinese character, it is characterized in that structure itself has just determined the maximum boundary rectangle of each several part in structure, the tree structure of Overlapping Chinese character is as shown in Figure 6.The maximum boundary rectangle information of the each several part in Overlapping Chinese character leaves in the knowledge base of expert system, while solving, according to structure, from knowledge base, read corresponding information and apply it to the each several part in Chinese character, then press formula (4) and adjust the aspect ratio of each several part, again each several part is centered in maximum boundary rectangle simultaneously in XY direction, just obtain the boundary rectangle of each several part in Chinese character, finally utilized formula (1) to calculate the conversion coefficient of each several part.
For line tandem type Chinese character, the order that in Chinese character, the boundary rectangle of each several part not only occurs in structure with each several part is relevant, also relevant with the size of each several part, and larger part shared proportion in line tandem type Chinese character is often also larger.The parameter of weighing each several part size is the standard boundary rectangle of each several part, for Chinese character base unit, use its standard boundary rectangle, for minor structure, because its boundary rectangle is inevitable, release (the traversal mode of depth-first determines), therefore use the boundary rectangle of releasing.When deriving, first according to the standard boundary rectangle of each several part, determine its proportion in total, and then the order occurring in structure according to it, determine its maximum possible boundary rectangle.For line structure, utilize formula (3) to determine the aspect ratio of each several part, then each several part is centered in maximum boundary rectangle in the Y direction, on directions X, be centered in maximum boundary rectangle; For tandem structure, utilize formula (2) to determine the aspect ratio of each several part, each several part is centered on directions X in maximum boundary rectangle again, be centered in the Y direction in maximum boundary rectangle, can obtain the boundary rectangle of each several part in Chinese character, then utilize formula (1) to calculate the conversion coefficient of each several part.The tree structure of line tandem type Chinese character as shown in Figure 7.
Below with Chinese character
Figure BSA00000243848300091
for example, describe the process that solves Chinese character base unit conversion coefficient in detail:
1,
Figure BSA00000243848300101
being encoded to of word: M, 8g0, O, 9d0, J, 341,2l0, can obtain its structure tree as shown in Figure 8 by coding.
From the structure tree of Chinese character, can find out, this Chinese character has tertiary structure.Primary structure is M, and secondary structure is O, and tertiary structure is J.
2,, according to the feature of structure tree and each Chinese character base unit, can determine the boundary rectangle picture mosaic of this Chinese character as shown in Figure 9.3, according to the boundary rectangle picture mosaic of the structure tree of Chinese character and Chinese character, by Chinese character
Figure BSA00000243848300102
be decomposed into the solving of Chinese character of a series of primary structures.The process of this decomposition is from the least significant end of structure tree, and the inside of boundary rectangle picture mosaic starts, and comprises altogether following 3 steps:
(1) tertiary structure is separated as a Chinese character and solved, as shown in figure 10.
This Chinese character be encoded to J, 341,2l0, utilizes the method for solving of foregoing line tandem type Chinese character, can obtain two primitives at Chinese character
Figure BSA00000243848300103
in boundary rectangle be respectively:
341:(0.031,0,0.94,0.59)
2l0:(0.281,0.59,0.44,0.41)
(2) secondary structure is separated as a Chinese character and is solved, wherein establish the Chinese character of trying to achieve in (1) for " #1 ", using it as a primitive, treat, as shown in figure 11, this Chinese character be encoded to O, 9d0, #1.Utilize the method for solving of foregoing encirclement mosaic Chinese character, can obtain the boundary rectangle of two primitives in Chinese character " Tang " and be respectively:
9d0:(0,0.031,0.938,0.938)
#1:(0.26,0.251,0.66,0.72)
(3) Chinese character of trying to achieve in note (2) is #2, using it as a primitive, brings in primary structure, as shown in figure 12; Being encoded to of Chinese character: M now, 8g0, #2.Utilize the method for solving of foregoing encirclement mosaic Chinese character, can obtain two primitives at Chinese character in boundary rectangle be respectively:
8g0:(0.094,0.063,0.813,0.875)
#2:(0.294,0.283,0.453,0.455)
4, due to Chinese character comprise " #2 ", " #2 " comprises " #1 ", according to formula (11), can try to achieve successively the absolute boundary rectangle of each primitive in Chinese character and is:
8g0:(0.094,0.063,0.813,0.875)
9d0:(0.294,0.297,0.424,0.427)
341:(0.421,0.397,0.279,0.193)
2l0:(0.495,0.589,0.131,0.134)
The affined transformation coefficient that calculates each primitive of Chinese character according to formula (1) is:
8g0:(0,1,0,1)
9d0:(0.294,0.453,0.283,0.455)
341:(0.411,0.298,0.387,0.32)
2l0:(0.41,0.298,0.501,0.32)
5, each primitive is converted by above-mentioned affined transformation coefficient, and transformation results is grouped together and just can obtains the figure of Chinese character, as shown in figure 13.
Utilize this 5 step, can derive the derivation design sketch of some other Chinese characters, as shown in figure 14.
Above-described embodiment is preferably embodiment of the present invention; but embodiments of the present invention are not restricted to the described embodiments; other any do not deviate from change, the modification done under Spirit Essence of the present invention and principle, substitutes, combination, simplify; all should be equivalent substitute mode, within being included in protection scope of the present invention.

Claims (1)

1. the Chinese-character intelligent generation method without character library, is characterized in that automatically generating Chinese character by the coding of Chinese character, and this process comprises the following steps:
S1, according to the encode Chinese characters for computer of input, obtain the structure tree of Chinese character;
S2, according to the structure tree of Chinese character, the picture mosaic that structure is comprised of the boundary rectangle of each level structure of Chinese character and each primitive;
S3, determine position and the shape of the boundary rectangle of each level structure of Chinese character; Determine position and the shape of the boundary rectangle of each primitive in target characters;
Position and the shape of S4, each primitive of obtaining according to step S3 boundary rectangle in target characters, determine topological transformation coefficient;
S5, the topological transformation coefficient that utilizes topological transformation method and step S4 to obtain, transform to Chinese character base unit in target characters and go, and completes the conversion of each level structure and each primitive in target characters;
Described in step S3, the boundary rectangle of each primitive in target characters is normalization boundary rectangle
Figure FDA0000382865630000011
w, H are respectively the height and width of Chinese character, take the Chinese character upper left corner to set up a rectangular coordinate system as initial point, and direction is to the right x axle, and downward direction is y axle, and x, y are the coordinate of primitive left upper apex, the height and width that w, h are primitive;
Described in step S3, determine the position of the boundary rectangle of each primitive in target characters and be shaped as: first determining the maximum possible boundary rectangle of Chinese character base unit in Chinese character; Then the described maximum possible boundary rectangle of take is benchmark, adjusts size and the aspect ratio of Chinese character base unit, makes Chinese character base unit be positioned at the centre position of described maximum possible boundary rectangle;
Described topological transformation method is affine transformation method, and affined transformation is defined as:
AW + t = a A b A c A d A W + t = a A b A c A d A W x W y + t x t y
Wherein, matrix A represents the matrix of linear transformation, and element aA, the dA on principal diagonal represents respectively the scaling of source images in x, y direction, the element b on minor diagonal a, c abe illustrated respectively in twiddle factor in x, y direction, the component t of vectorial t x, t ybe illustrated respectively in the translational movement in mapping space x, y direction;
Described in step S1, encode Chinese characters for computer is comprised of primitive coding and structured coding, and each primitive configuration, on a primitive key assignments being combined by two key arrangement, is got corresponding primitive key assignments as primitive coding, and structured coding obtains according to the basic structure of Chinese character; The basic structure of described Chinese character comprises: one-piece construction; Upper left surrounds structure, lower-left surrounds structure, upper right encirclement structure, upper three guarantees closed structure, lower three guarantees closed structure, left three guarantees closed structure and entirely surrounds structure; Frame embedding structure, mutual embedding structure; Product word structure, dual stack structure; Left and right structure, left, center, right structure, multiple row structure; Up-down structure, Up-Center-Down Structure and arrange structure more;
Standard boundary rectangle R from each primitive 0=(x 0, y 0, w 0, h 0) transforming to the boundary rectangle R=(x, y, w, h) of each primitive in target characters described in step S3, its topological transformation coefficient is:
xo = x - x 0 &CenterDot; w w 0 xs = w w 0 yo = y - y 0 &CenterDot; h h 0 ys = h h 0
Wherein, xo is x direction of principal axis translational movement, and xs is width decrement, and yo is y direction of principal axis translational movement, and ys is high compression amount.
CN201010263032.2A 2010-08-25 2010-08-25 Method for intelligently generating Chinese character without character library Expired - Fee Related CN101930299B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010263032.2A CN101930299B (en) 2010-08-25 2010-08-25 Method for intelligently generating Chinese character without character library

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010263032.2A CN101930299B (en) 2010-08-25 2010-08-25 Method for intelligently generating Chinese character without character library

Publications (2)

Publication Number Publication Date
CN101930299A CN101930299A (en) 2010-12-29
CN101930299B true CN101930299B (en) 2014-04-02

Family

ID=43369509

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010263032.2A Expired - Fee Related CN101930299B (en) 2010-08-25 2010-08-25 Method for intelligently generating Chinese character without character library

Country Status (1)

Country Link
CN (1) CN101930299B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102141884B (en) * 2010-12-31 2014-01-01 珠海金山办公软件有限公司 Drawing device and method
CN103186511B (en) * 2011-12-31 2017-03-08 北京大学 Chinese characters word-formation method and apparatus, the method for construction fontlib
CN107220224A (en) * 2017-05-18 2017-09-29 吉首大学 A kind of literary generation method of square seedling derived based on intelligence
CN107610200B (en) * 2017-10-10 2020-11-03 南京师范大学 Character library rapid generation method based on characteristic template
CN110276051B (en) * 2018-03-14 2020-12-04 北大方正集团有限公司 Method and device for splitting font part

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551711A (en) * 2009-05-21 2009-10-07 华南理工大学 Chinese character coding input method based on structure and primitive

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551711A (en) * 2009-05-21 2009-10-07 华南理工大学 Chinese character coding input method based on structure and primitive

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
皮佑国.无字库智能造字研究.《2009中国智能自动化会议论文集》.2009,1752-1756页. *
谌杨帆.无字库条件下汉字全包围结构识别方法研究.《自动化技术与应用》.2009,第28卷(第2期),全文. *

Also Published As

Publication number Publication date
CN101930299A (en) 2010-12-29

Similar Documents

Publication Publication Date Title
CN106384094B (en) A kind of Chinese word library automatic generation method based on writing style modeling
DE60031664T2 (en) COMPUTER METHOD AND DEVICE FOR CREATING VISIBLE GRAPHICS USING GRAPH ALGEBRA
CN101930299B (en) Method for intelligently generating Chinese character without character library
CN110033054B (en) Personalized handwriting migration method and system based on collaborative stroke optimization
CN103295197B (en) Based on the image super-resolution rebuilding method of dictionary learning and bilateral canonical
CN114463511A (en) 3D human body model reconstruction method based on Transformer decoder
CN115393872B (en) Method, device and equipment for training text classification model and storage medium
CN115240201B (en) Chinese character generation method for alleviating network mode collapse problem by using Chinese character skeleton information
CN118115819A (en) Deep learning-based chart image data identification method and system
CN104794308B (en) Domain image based on Image Edge-Detection is converted to CIF document methods
DE102022129588A1 (en) Facilitate identification of fillable areas in a form
CN115131803A (en) Document word size identification method and device, computer equipment and storage medium
Lai et al. A heuristic search approach to Chinese glyph generation using hierarchical character composition
CN112036290B (en) Complex scene text recognition method and system based on class mark coding representation
CN109543525A (en) A kind of table extracting method of form of general use image
CN106021228B (en) A kind of method and system carrying out text analyzing using knowledge topographic map
CN116246064A (en) Multi-scale space feature enhancement method and device
CN112785684B (en) Three-dimensional model reconstruction method based on local information weighting mechanism
CN114359038A (en) Multi-style dynamic word forming method based on generation of confrontation network
Zhu et al. Visual normalization of handwritten Chinese characters based on generative adversarial networks
CN102074004A (en) Method and device for determining type of barrier of spatial entity
CN103049626A (en) Planning construction land balance sheet manufacturing method based on raster images
CN112990336A (en) Depth three-dimensional point cloud classification network construction method based on competitive attention fusion
Bose et al. Utilizing Machine Learning Models for Developing a Comprehensive Accessibility System for Visually Impaired People
CN116363329B (en) Three-dimensional image generation method and system based on CGAN and LeNet-5

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140402

Termination date: 20190825

CF01 Termination of patent right due to non-payment of annual fee