Summary of the invention
The shortcoming that the object of the invention is to overcome prior art is with not enough, a kind of Chinese-character intelligent generation method without character library is provided, how to have solved according to based on " a kind of Hanzi coding input method based on structure and primitive " (CN101551711A) in the unsolved location dimension of boundary rectangle and the problem of rectangular dimension of determining each constituent that forms Chinese character by input encode Chinese characters for computer, the primitive of the defined of encoding is carried out topological transformation and realizes intelligent coinage.Chinese character is the large character set of development, new Chinese character will constantly produce, the present invention inquires into the primitive conversion knowledge in emerging Chinese character by the primitive conversion knowledge of existing Chinese character, therefore can there is no character library and only have primitive storehouse, encode Chinese characters for computer by input automatically generates Chinese character, thereby realizes intelligent coinage truly.
The present invention is achieved through the following technical solutions above-mentioned purpose: this is without the Chinese-character intelligent generation method of character library, and the coding by Chinese character automatically generates Chinese character, and this process comprises the following steps:
S1, according to the encode Chinese characters for computer of input, obtain the structure tree of Chinese character;
S2, according to the structure tree of Chinese character, the picture mosaic that structure is comprised of the boundary rectangle of each level structure of Chinese character and each primitive;
S3, determine each level structure of Chinese character boundary rectangle position and shape; Determine position and the shape of the boundary rectangle of each primitive in target characters;
Position and the shape of S4, each primitive of obtaining according to step S3 boundary rectangle in target characters, determine topological transformation coefficient;
S5, the topological transformation coefficient that utilizes topological transformation method and step S4 to obtain, transform to Chinese character base unit in target characters and go, and completes the conversion of each level structure and each primitive in target characters.
Described in step S3, the boundary rectangle of each primitive in target characters is normalization boundary rectangle
wherein W, H are respectively the height and width of Chinese character, and take the Chinese character upper left corner and set up a rectangular coordinate system as initial point, and direction is to the right x axle, and downward direction is y axle, and x, y are the coordinate of primitive left upper apex, the height and width that w, h are primitive.
Described in step S3, determine the position of the boundary rectangle of each primitive in target characters and be shaped as: first determining the maximum possible boundary rectangle of Chinese character base unit in Chinese character; Then the described maximum possible boundary rectangle of take is benchmark, adjusts size and the aspect ratio of Chinese character base unit, makes Chinese character base unit be positioned at the centre position of described maximum possible boundary rectangle.
Topological transformation method described in step S5 is affine transformation method, and affined transformation is defined as:
Wherein, matrix A represents linear transformation, the element a on principal diagonal
a, d
adA represents respectively the scaling of source images in x, y direction, the element b on minor diagonal
a, c
abe illustrated respectively in twiddle factor in x, y direction, the component t of vectorial t
x, t
ybe illustrated respectively in the translational movement in mapping space x, y direction.
Encode Chinese characters for computer described in step S1 is comprised of primitive coding and structured coding, each primitive configuration is on a primitive key assignments being combined by two key arrangement, get corresponding primitive key assignments as primitive coding, structured coding obtains according to the basic structure of Chinese character; Described Chinese character basic structure comprises: one-piece construction; Upper left surrounds structure, lower-left surrounds structure, upper right encirclement structure, upper three guarantees closed structure, lower three guarantees closed structure, left three guarantees closed structure and entirely surrounds structure; Frame embedding structure, mutual embedding structure; Product word structure, dual stack structure; Left and right structure, left, center, right structure, multiple row structure; Up-down structure, Up-Center-Down Structure and arrange structure more.
In the described Chinese-character intelligent generation method without character library, from the standard boundary rectangle R of each primitive
0=(x
0, y
0, w
0, h
0) transforming to each primitive described in step S3 boundary rectangle R=(x, y, w, h) in target characters, its topological transformation coefficient is:
Wherein, xo is x direction of principal axis translational movement, and xs is width decrement, and yo is y direction of principal axis translational movement, and ys is high compression amount.
The present invention is by the analysis to structural characteristics of Chinese character, by the Task-decomposing that solves Chinese character base unit mapping, be that five steps carries out: according to the encode Chinese characters for computer of input, by reasoning, obtain the structure tree of Chinese character, structure tree by traversal Chinese character becomes the Chinese character decomposition of multilevel hierarchy the Chinese character of a series of primary structures, by analyzing the boundary rectangle picture mosaic of Chinese character constituent (being Chinese character base unit), determine size and the position of the Chinese character constituent boundary rectangle forming, the conversion coefficient of Chinese character base unit is obtained in the conversion of the standard boundary rectangle boundary rectangle in target characters to Chinese character base unit by Chinese character base unit, by being carried out to topological transformation, Chinese character base unit obtains Chinese character, thereby realize intelligent coinage.The present invention has following advantage and effect with respect to prior art:
1, according to the coding of Chinese character input, by reasoning, obtain the structure tree of Chinese character, structure tree by traversal Chinese character forms to obtain the rectangular dimension of Chinese character composition boundary rectangle and the position dimension of this rectangle, by primitive size and the conversion boundary rectangle size of target and existing Chinese character base unit topological transformation knowledge, inquire into and need to convert the primitive conversion knowledge of primitive in coinage conversion, realize topological transformation and realize intelligent coinage.Do not rely on Hanzi font library, as long as set up limited Chinese character base unit storehouse, just can produce intimate unlimited Chinese character.
2, utilize Chinese character to using pictograph, self-explanatory characters' word and symbol thereof as next " making " word of Chinese character base unit (fundamental element), show " expressing the meaning " feature of Chinese character especially, passed on character cultural.
The essence of 3, coinage process is " writing " rather than " word selection ", has embodied and has used electronic tool and electronic media as the writing process of writing medium and instrument, has passed on character cultural.
4, the process of topological transformation is that computer background completes, and manually inputs the process of Chinese character and utilizes the knowledge of functional literacy to carry out completely, has realized the seamless link with functional literacy.
5, Chinese character base unit only has 1085, and Hanzi structure only has 18 kinds, can set up standard steady in a long-term, Chinese character development, and structure, primitive and coding standard are steady in a long-term constant; Therefore the present invention has good stability, can adapt to well the development and change of Chinese character.
Embodiment
Basic foundation of the present invention is: based on " a kind of Hanzi coding input method based on structure and primitive " (CN101551711A) basic structure of determined 17 kinds of Chinese characters, the basic structure of Chinese character is expanded into 18 kinds: one-piece construction; Surround 7 kinds, structure (upper left surrounds structure, lower-left surrounds structure, upper right encirclement structure, upper three guarantees closed structure, lower three guarantees closed structure, left three guarantees closed structure and entirely surrounds structure); 2 kinds of mosaic textures (frame embedding structure and mutually embedding structure); 2 kinds of overlay structures (product word structure, dual stack structure); 3 kinds, horizontally-arranged structure 3 kinds (left and right structure, left, center, right structure and multiple row structure) and tandem structure (up-down structure, Up-Center-Down Structure and arrange structure) more.The structure of Chinese character can form hierarchical structure by above-mentioned basic structure, and Chinese character can have one or more levels structure, and every level structure comprises one or more primitives and determines by encode Chinese characters for computer is unique; Encode Chinese characters for computer is comprised of structured coding and primitive coding, and each primitive configuration, on a primitive key assignments being combined by two key arrangement, is got corresponding primitive key assignments as primitive coding.
Input of the present invention is encode Chinese characters for computer, and output is the Chinese character corresponding with this input coding, and the step of enforcement is:
The first step, according to the encode Chinese characters for computer of input, obtains the structure tree of Chinese character.
According to above-mentioned encode Chinese characters for computer information, can determine the structure of the structure progression of Chinese character, every grade of Chinese character and the primitive that this structure comprises completely.Therefore, just can draw the structure tree of this Chinese character.For example, Chinese character " mushroom ", the tree structure of its symbolically is as shown in Figure 1.The structure tree of Chinese character has just been obtained to the coding of Chinese character by the first root traversal of depth-first; Otherwise, from the coding of Chinese character, also can obtain the structure tree of Chinese character.
Second step, according to the structure tree of Chinese character, the picture mosaic that structure is comprised of the boundary rectangle of each level structure of Chinese character and each primitive.
Chinese character is ideographic language, and Chinese character and Chinese character base unit can be spelled and form a font with its boundary rectangle, Chinese character " mushroom " for example, and its boundary rectangle picture mosaic is as shown in Figure 2.
The 3rd step, determine each level structure of Chinese character boundary rectangle position and shape; Determine position and the shape of the boundary rectangle of each primitive in target characters.
If the height and width of Chinese character are respectively W and H, take its upper left corner as initial point, direction is to the right x axle, downward direction is y axle, can set up a rectangular coordinate system.Because Chinese character base unit all intercepts from Chinese character, so this coordinate system is also applicable to Chinese character base unit.The boundary rectangle of Chinese character base unit is defined as a four-tuple:
R=(x,y,w,h) x∈[0,W],y∈[0,H],w∈[0,W-x],h∈[0,H-y]
Wherein x and y represent the left upper apex coordinate of Chinese character base unit boundary rectangle, and w and h represent respectively the wide and high of Chinese character base unit boundary rectangle.The boundary rectangle of a Chinese character base unit as shown in Figure 3.
In same Chinese character, the boundary rectangle information of Chinese character base unit can change with the change of Chinese character font size, but the topological structure of Chinese character base unit in Chinese character is constant, for unified, analyzes, need to be by the boundary rectangle normalization of Chinese character base unit.Take Fig. 3 as example, the boundary rectangle of Chinese character base unit is wherein normalized, the boundary rectangle obtaining is:
the normalization boundary rectangle that is called Chinese character base unit.The boundary rectangle occurring hereinafter, if do not specialized, all aim one is changed boundary rectangle.
The method of obtaining Chinese character base unit topological transformation coefficient is as follows: the standard boundary rectangle of supposing some Chinese character base units is R
0=(x
0, y
0, w
0, h
0), this Chinese character base unit is after conversion, and its boundary rectangle is R=(x, y, w, h), so from R
0topological transformation coefficient to R is:
Wherein, xo is x direction of principal axis translational movement, and xs is width decrement, and yo is y direction of principal axis translational movement, and ys is high compression amount.The standard boundary rectangle of Chinese character base unit is to obtain by scanning the normal pictures of Chinese character base unit, and the normal pictures of Chinese character base unit is in advance by artificial screening and making.
So, as long as know standard boundary rectangle information and its boundary rectangle information in a certain Chinese character of a Chinese character base unit, just can determine the topological transformation coefficient of this Chinese character base unit in this Chinese character.Because the standard boundary rectangle of Chinese character base unit can obtain, therefore ask the problem of Chinese character base unit topological transformation coefficient to be just converted into and ask the boundary rectangle of Chinese character base unit in Chinese character.
Boundary rectangle for Chinese character base unit in Chinese character, can pass through
Chinese character base unit image and Chinese character image are carried out to pattern match and obtain the transform data of corresponding Chinese character base unit in Chinese character, these transform datas are arranged, be generalized into Chinese character base unit conversion knowledge, and set up knowledge base and inference machine.During use, according to the structural motif information in input encode Chinese characters for computer, derive the boundary rectangle of Chinese character base unit in Chinese character, recycling formula (1) is obtained the topological transformation coefficient of Chinese character base unit.
The present invention determines position and the shape (being location dimension and the geomery of boundary rectangle) of the boundary rectangle of Chinese character base unit in target characters in two steps: first determine the maximum possible boundary rectangle of Chinese character base unit in Chinese character, then take this maximum possible boundary rectangle is benchmark, adjustment Chinese character base unit size, and Chinese character base unit is adjusted to the correct position in this maximum possible boundary rectangle, adjust size and the aspect ratio of Chinese character base unit, make Chinese character base unit be positioned at the centre position of this maximum possible boundary rectangle.
The aspect ratio of primitive in Chinese character is the important parameter of reflection primitive shape in Chinese character, is the height of primitive and wide ratio.If the boundary rectangle of Chinese character base unit is R=(x, y, w, h), the aspect ratio of Chinese character base unit is adjusted into r, the boundary rectangle after adjustment is
, when adjusting the aspect ratio of primitive, there are three kinds of methods of adjustment:
Keep wide constant,
Keep high constant,
Reduce large amount,
Operation placed in the middle refers to adjusts to the centre position of another boundary rectangle by a boundary rectangle on a certain change in coordinate axis direction.If two boundary rectangles are respectively R
1and R
2, so by R
1in X-direction, be centered to R
2in, the parameter of the new boundary rectangle R obtaining is:
By R
1in Y direction, be centered to R
2in, the parameter of the new boundary rectangle R obtaining is:
By R
1on XY direction of principal axis, be centered to R simultaneously
2in, be equivalent to first by R
1in X-direction, be centered to R
2, then by R
1in Y direction, be centered to R
2in.
Tiling operation refers to be expanded a boundary rectangle or is compressed in another one boundary rectangle on a certain change in coordinate axis direction.
If two boundary rectangles are respectively R
1and R
2, so by R
1in X-direction, tiling is to R
2in, the new boundary rectangle obtaining is:
R=(x
2,y
1,w
2,h
1) (7)
By R
1in Y direction, tiling is to R
2in, the new boundary rectangle obtaining is:
R=(x
1,y
2,w
1,h
2) (8)
By R
1on XY direction of principal axis, R is arrived in tiling simultaneously
2in, the new boundary rectangle obtaining is:
R=R
2(9)
In intelligent coinage system, Chinese character can be divided into primary structure Chinese character and multilevel hierarchy Chinese character by structure, for multilevel hierarchy Chinese character, its minor structures at different levels can be used as to a new Chinese character and treat, thereby be solving a series of primary structure Chinese characters by the Task-decomposing that solves each Chinese character base unit topological transformation coefficient in multilevel hierarchy Chinese character.Take " recklessly " as example, it comprises two-layer configuration, first order structure is left and right, and second level structure is upper and lower, uses method above will to solving of " recklessly ", be decomposed into solving of " Gu " and the solving " recklessly " by forming " Gu " and " moon " being comprised of " ten " and " mouth ".But the boundary rectangle of determined " ten " and " mouth " is all relative " Gu " rather than relatively " recklessly " in to " Gu " solution procedure, therefore must be by conversion, the boundary rectangle of " mouth " of " ten " is transformed to " recklessly " relatively.
For two boundary rectangle R
1and R
2if: and x
1≤ x
2, y
1≤ y
2, x
1+ w
1>=x
2+ w
2, y
1+ h
1>=y
2+ h
2, claim R
1comprise R
2, or R
2be contained in R
1.If R
2be contained in R
1, R
2to R
1relative boundary rectangle be:
If boundary rectangle R
1comprise R
2, R
2comprise R
3, and R
2to R
1relative boundary rectangle be R
21, R
3to R
2relative boundary rectangle be R
32, R so
3to R
1four parameters of relative boundary rectangle be respectively:
Like this, known that " ten " and " mouth " are with respect to the boundary rectangle of " Gu ", and " Gu " is with respect to the boundary rectangle of " recklessly ", just can calculate " ten " and " mouth " boundary rectangle with respect to " recklessly " according to formula (8).
The 4th step, position and the shape of the boundary rectangle of each primitive obtaining according to the 3rd step in target characters, determine topological transformation coefficient.
The method of conversion meets the requirement of topological transformation, below will adopt affined transformation as topological transformation method, and the method that the present invention adopts includes but not limited to affined transformation.If W presentation video, x represents a point in this image.Affined transformation is defined as:
Wherein, matrix A represents linear transformation, element aA, dA on principal diagonal represents respectively the scaling of source images in x, y direction, element bA, cA on minor diagonal is illustrated respectively in twiddle factor in x, y direction, and component tx, the ty of vectorial t is illustrated respectively in the translational movement in mapping space x, y direction.Because Chinese character is Chinese characters, Founder is one of essential characteristic of Chinese character image, and the element in matrix of a linear transformation A on minor diagonal should be zero, and element aA, dA on principal diagonal just reflected that Chinese character base unit is mapped to the size and geometric in Chinese character.
The image of Chinese character base unit is known, and the image in target characters can obtain by the 3rd step.Utilize so in two width images four not the point of conllinear just can solve four unknown numbers in transformation equation, thereby obtain topological transformation coefficient.
The 5th step, the topological transformation coefficient that utilizes topological transformation method and the 4th step to obtain, transforms to Chinese character base unit in target characters and goes, and completes the conversion of each level structure and each primitive in target characters, thereby has realized intelligent coinage.
Because can be by solving the solving of the paired a series of primary structure Chinese characters of Task-decomposing of the conversion of Chinese character base unit in multilevel hierarchy Chinese character knowledge, so be first described solving the process of primary structure Chinese character at this.
In six large class formations of Chinese character, surround structure and mosaic texture, line structure and tandem structure have very large similarity on Chinese character forms, therefore merged into surround when solving, inlay class and line tandem class.Therefore, the Chinese character of all primary structures can be divided into Four types, that is: monolithic devices (being independent body type), encirclement mosaic, Overlapping and line tandem type.Below introduce respectively the method for solving of the primary structure Chinese character of this Four types.
For independent body type Chinese character, its tree structure as shown in Figure 4.Because all single characters are all Chinese character base units, the affined transformation coefficient of all single characters is all (0,1,0,1), therefore the boundary rectangle of primitive is exactly the standard boundary rectangle of Chinese character base unit in independent body type Chinese character, the affined transformation coefficient of the Chinese character base unit obtaining according to formula (1) is like this exactly (0,1,0,1).
For surrounding mosaic Chinese character, its common trait is an always framework of first Chinese character base unit, this framework can surround a Chinese character base unit or minor structure, or can embed one or several Chinese character base unit or minor structure, and the Chinese character base unit that claims this frame-type is enclosure body; Those besieged bodies encirclements or the Chinese character base unit or the minor structure that are embedded in enclosure body are called embedded body.Surround the tree structure of mosaic Chinese character as shown in Figure 5.The maximum possible boundary rectangle that surrounds each Chinese character base unit in mosaic Chinese character is directly determined by enclosure body.Primitive as enclosure body, the maximum boundary rectangle information of the part of itself and its encirclement is all kept in the knowledge base of expert system, in the part of while solving, enclosure body is taken out and be applied to knowledge corresponding to enclosure body from knowledge base and surrounding, can obtain the maximum boundary rectangle of each several part.For occlusion body, its maximum boundary rectangle is exactly its actual boundary rectangle in Chinese character; For embedded body, first by formula (4), adjust its aspect ratio, and then the boundary rectangle after adjusting is adjusted upward in its maximum boundary rectangle in XY side simultaneously, can obtain its actual boundary rectangle in Chinese character, then utilize formula (1) just can calculate the conversion coefficient of each several part.
For Overlapping Chinese character, it is characterized in that structure itself has just determined the maximum boundary rectangle of each several part in structure, the tree structure of Overlapping Chinese character is as shown in Figure 6.The maximum boundary rectangle information of the each several part in Overlapping Chinese character leaves in the knowledge base of expert system, while solving, according to structure, from knowledge base, read corresponding information and apply it to the each several part in Chinese character, then press formula (4) and adjust the aspect ratio of each several part, again each several part is centered in maximum boundary rectangle simultaneously in XY direction, just obtain the boundary rectangle of each several part in Chinese character, finally utilized formula (1) to calculate the conversion coefficient of each several part.
For line tandem type Chinese character, the order that in Chinese character, the boundary rectangle of each several part not only occurs in structure with each several part is relevant, also relevant with the size of each several part, and larger part shared proportion in line tandem type Chinese character is often also larger.The parameter of weighing each several part size is the standard boundary rectangle of each several part, for Chinese character base unit, use its standard boundary rectangle, for minor structure, because its boundary rectangle is inevitable, release (the traversal mode of depth-first determines), therefore use the boundary rectangle of releasing.When deriving, first according to the standard boundary rectangle of each several part, determine its proportion in total, and then the order occurring in structure according to it, determine its maximum possible boundary rectangle.For line structure, utilize formula (3) to determine the aspect ratio of each several part, then each several part is centered in maximum boundary rectangle in the Y direction, on directions X, be centered in maximum boundary rectangle; For tandem structure, utilize formula (2) to determine the aspect ratio of each several part, each several part is centered on directions X in maximum boundary rectangle again, be centered in the Y direction in maximum boundary rectangle, can obtain the boundary rectangle of each several part in Chinese character, then utilize formula (1) to calculate the conversion coefficient of each several part.The tree structure of line tandem type Chinese character as shown in Figure 7.
Below with Chinese character
for example, describe the process that solves Chinese character base unit conversion coefficient in detail:
1,
being encoded to of word: M, 8g0, O, 9d0, J, 341,2l0, can obtain its structure tree as shown in Figure 8 by coding.
From the structure tree of Chinese character, can find out, this Chinese character has tertiary structure.Primary structure is M, and secondary structure is O, and tertiary structure is J.
2,, according to the feature of structure tree and each Chinese character base unit, can determine the boundary rectangle picture mosaic of this Chinese character as shown in Figure 9.3, according to the boundary rectangle picture mosaic of the structure tree of Chinese character and Chinese character, by Chinese character
be decomposed into the solving of Chinese character of a series of primary structures.The process of this decomposition is from the least significant end of structure tree, and the inside of boundary rectangle picture mosaic starts, and comprises altogether following 3 steps:
(1) tertiary structure is separated as a Chinese character and solved, as shown in figure 10.
This Chinese character be encoded to J, 341,2l0, utilizes the method for solving of foregoing line tandem type Chinese character, can obtain two primitives at Chinese character
in boundary rectangle be respectively:
341:(0.031,0,0.94,0.59)
2l0:(0.281,0.59,0.44,0.41)
(2) secondary structure is separated as a Chinese character and is solved, wherein establish the Chinese character of trying to achieve in (1) for " #1 ", using it as a primitive, treat, as shown in figure 11, this Chinese character be encoded to O, 9d0, #1.Utilize the method for solving of foregoing encirclement mosaic Chinese character, can obtain the boundary rectangle of two primitives in Chinese character " Tang " and be respectively:
9d0:(0,0.031,0.938,0.938)
#1:(0.26,0.251,0.66,0.72)
(3) Chinese character of trying to achieve in note (2) is #2, using it as a primitive, brings in primary structure, as shown in figure 12; Being encoded to of Chinese character: M now, 8g0, #2.Utilize the method for solving of foregoing encirclement mosaic Chinese character, can obtain two primitives at Chinese character
in boundary rectangle be respectively:
8g0:(0.094,0.063,0.813,0.875)
#2:(0.294,0.283,0.453,0.455)
4, due to Chinese character
comprise " #2 ", " #2 " comprises " #1 ", according to formula (11), can try to achieve successively the absolute boundary rectangle of each primitive in Chinese character and is:
8g0:(0.094,0.063,0.813,0.875)
9d0:(0.294,0.297,0.424,0.427)
341:(0.421,0.397,0.279,0.193)
2l0:(0.495,0.589,0.131,0.134)
The affined transformation coefficient that calculates each primitive of Chinese character according to formula (1) is:
8g0:(0,1,0,1)
9d0:(0.294,0.453,0.283,0.455)
341:(0.411,0.298,0.387,0.32)
2l0:(0.41,0.298,0.501,0.32)
5, each primitive is converted by above-mentioned affined transformation coefficient, and transformation results is grouped together and just can obtains the figure of Chinese character, as shown in figure 13.
Utilize this 5 step, can derive the derivation design sketch of some other Chinese characters, as shown in figure 14.
Above-described embodiment is preferably embodiment of the present invention; but embodiments of the present invention are not restricted to the described embodiments; other any do not deviate from change, the modification done under Spirit Essence of the present invention and principle, substitutes, combination, simplify; all should be equivalent substitute mode, within being included in protection scope of the present invention.