CN1089735A - Whole words (Chinese character) code - Google Patents

Whole words (Chinese character) code Download PDF

Info

Publication number
CN1089735A
CN1089735A CN 93100866 CN93100866A CN1089735A CN 1089735 A CN1089735 A CN 1089735A CN 93100866 CN93100866 CN 93100866 CN 93100866 A CN93100866 A CN 93100866A CN 1089735 A CN1089735 A CN 1089735A
Authority
CN
China
Prior art keywords
parts
group word
code
word
sign indicating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 93100866
Other languages
Chinese (zh)
Other versions
CN1091529C (en
Inventor
陈劲松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN93100866A priority Critical patent/CN1091529C/en
Publication of CN1089735A publication Critical patent/CN1089735A/en
Application granted granted Critical
Publication of CN1091529C publication Critical patent/CN1091529C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to the literal code in the computer character information processing, it comes descriptive text by the structure that adopts " group word structure+group word parts ", has set forth a kind of generation rule of literal.Can produce, represent and import the Chinese character of arbitrary shape with it.It has improved the comprehensive input speed of Chinese character, and is not only applicable to, and also is applicable to other various countries, various nationalities' literal.It is a kind of complete input method of literal; It is a kind of compression method of literal character library; It is a kind of interior code plan of computing machine of single national writing; It also is a kind of unified ISN standard scheme of multi-national literal.

Description

Whole words (Chinese character) code
" whole words (Chinese character) code " of the present invention relates to the literal code in the computer character information processing, specifically, the present invention relates to how to remove to explain literal, and the present invention is in the specific implementation method of the aspects such as computer general-purpose code of the compression of the input of the computing machine of literal, text font and generation, literal with numerical value or abstract character.
Below be respectively the present situation of computing machine input, computing machine ISN and the compression of font storehouse of literal.
With viewpoint of the present invention, the input of the coding of literal is divided from the contained information of its coding, is broadly divided into identification input (not exclusively input) and imports two kinds fully.The identification input is to encode at a fixing literal set (Chinese character is generally GB2312-80), the effect of this coding is exactly with the simplest a kind of rule other word in the character set to be differentiated to open as far as possible, this literal code only needs that the coding of each word is different with the coding of other word in the character set and gets final product (to allow certain heavy rate mutually, be the repetition rate of coding), and need not each font details of this word is described.All characteristic parameters that contain text font fully in Shu Ru the coding, system can directly produce text font according to coding, is exactly a kind of complete input method as the input mode of active computer English word.
The identification input method except the advantage of high input speed, but have many fatal shortcomings can't or the utmost point be difficult to resolve and determine: one, can only import the character in the fixed set.The outer character of pair set can't be imported, produces font and unified management, cause confusion.Two, do not contain all characteristic parameters of character glyphs in the coding, can't produce the font of character according to coding.Three, because the input process of identification coding is not the overall process of writing, carry out " writing " if often substitute pen, make the importer forget the concrete shape of font easily.This especially is unfavorable for education of middle and primary schools.
Input method can both finely solve above problem fully.Though its input speed is slow, can solve by the method for simplifying input.Simplifying the speed of input imports not second to identification.
The structure of English list mark is single, and its building block-letter less and fixing, is easy to realize input fully.The complex structure of Chinese character, parts are various, extremely difficult realization input fully.The research that causes Chinese character to be imported is confined to the research to the identification input always, makes the input of Chinese character can not show a candle to English input conveniently.
In the same manner, with viewpoint of the present invention, the literal ISN of computing machine is from also distinguishable knowledge ISN of its portable message context and complete ISN two big classes.Here, the identification ISN only is a kind of code name of graph block, and ISN carries the characteristic parameter of character fully.Identification ISN and fully ISN is all corresponding has above relative merits that exemplify.
AM General character standard A SCII sign indicating number, be exactly a kind of complete ISN, the group word parts (letter) that are actually English have carried out standard definition, with this sign indicating number as the computing machine ISN, the English words (word) that can represent alphabetical combination in any comprises the English words of some misspellings.The Chinese character standard character set GB2312-80 of China, it is the identification ISN, it has carried out standard definition to complete word, because the Chinese character total amount is bigger, and not a fixed number, such character set can only write down the higher word of some frequencies of utilization (if English also adopts writing-method, also can run into such problem), and this just makes to also have many words to come out with this canonical representation.
Present international standard desire is integrated this standard of two types, this two class standard, and a class is a components list, a class is a word table; One class is complete ISN, and a class is the identification ISN.They are incorporated into together, are extremely irrational.
Present text font storehouse, as Chinese character base, owing to can not find a literal generation rule preferably, therefore not ideal enough always to the compression of font data.Especially high matrix font, its reduction computing and smoothing processing are quite time-consuming.
The objective of the invention is to, open up the brand-new field of a literal code research, realize coding fully, to solve all difficult problems that above-mentioned current encoder present situation exists; Design a kind of input coding and the interior identical scheme of code structure of computing machine, in order to Computer Processing.
The concrete structure of " whole words (Chinese character) code " of the present invention is: the mode with " group word structure+group word parts " is explained each character.
The elementary cell of literal switch is called word (it is from division in shape, rather than divides from its meaning of word), and the base font unit of forming word is called parts, and the rule that parts are formed word is called group word structure.
World various nationalities literal is broadly divided into two big classes from font:
Wherein a class group of text word structure is single, and group word parts seldom are separated by with the space between word and the word.As English, French, German etc., claim that usually this class literal is an alphabetic writing, the title word is a word, parts are letter.
An other class group of text word structure is quite complicated, and group word parts are also a lot, and parts combine very tight with parts, and the big or small width of glyph shape is relatively-stationary usually, claims that generally this class literal is ideograph (pictograph).As Chinese, japanese character, Korea's Chinese character etc.
No matter alphabetic writing or ideograph in fact, all dive contain " group word structure+group word parts " form interior.Only alphabetic writing has only a kind of structure usually, allows the people ignore.Space between alphabetic writing word and the word is exactly a kind of form of distortion of constructive code in fact.Though the structure of ideograph is complicated, for any literal, all impossible all unique a kind of structure of each character, what be regular can be target-seeking, the rule of this group of word is just organized the general rule of word, can be used as group word structure and note, be used for combiner and become word.
To some few structures that occur and the structure that some can't be concluded, all can record, and the word that will have this structure is as basic group word parts, and can explain this word by " single character structure+these group word parts ".
The form of a stroke or a combination of strokes of being write as of once starting to write is called stroke.
Based on the tightness degree of stroke combination, reducing total number of parts as far as possible and reducing every word component count is two principles dividing, determine group word parts.
Group word structure is to be used for the relative position relation of description group word parts in font specially, and group word parts are to be used for the shape of descriptive text base font element.Group word parts also contain parameters such as the shape, characteristic attribute of self, system can analyze in conjunction with group word structure according to these parameters, determine respectively to organize big or small width and the position of word parts in character, and eliminate the clear and definite boundary between each parts in the font, thereby be combined into the font of literal by intelligent method correct, attractive in appearancely.
" whole words (Chinese character) code " of the present invention compared with existing coding, and its advantage is, has broken through the identification structure of current encoder initiatively, realized coding fully, can solve the present identification indeterminable all difficult problems of encoding.With the form statement text font of " group word structure+group word parts ", the structure attribute of text font and shape attribute separately help system and carry out the font processing and produce font; Can succinct, clear, intactly explain the shape facility of each character; It is a kind of font generation rule that can produce text font; Suitable standard as a kind of character code.
Whole words (Chinese character) code is suitable for the occasion of any need with numerical value or abstract character statement literal.
Below be the concrete enforcement of whole words (Chinese character) code aspect computing machine input, ISN, the compression of font storehouse.
The elementary cell of Chinese character pattern is a stroke, as horizontal, vertical, cast aside, press down, roll over, choose ... its shape is very limited.
It is parts that the I of being made up of stroke is known the unit, and it can represent certain meaning, and general stroke is in conjunction with tightr.As day, the moon, people, mouth, gold, wood, water, soil ... though the number of parts is many, also be limited.With Shanghai Communications University's encode Chinese characters for computer group and Shanghai Chinese alphabetic writing seminar write, " dictionary of Chinese character information " of Science Press's in Dec, 1988 version (hereinafter to be referred as " Chinese " is example, add up 11254 of Chinese characters in the book altogether, but, only use 694 parts according to himself statistics.The total amount of Hanzi component has a characteristic; Along with the increase again of Chinese character sum, the increasing of deserted word, used total number of parts stingy more but seldom.These deserted words mostly are to occur with the new form that makes up of worn part.
Become the rule of Chinese character by unit construction, be called modular construction.As single character, up-down structure, left and right sides structure, semi-surrounding structure, full investing mechanism ... the structure of parts also is limited.
So this Glygh code input method so defines: first sign indicating number is the constructive code of this word, in order to define the group word structure of this word.Thereafter coding defines by sequential write each parts to this word successively, is called component code.Concrete shape such as Fig. 1 of Hanzi structure.(for the parts that parallel construction, subsumption architecture among the figure can not be combined into, the present invention is classified as single character with it without exception).
As seen from Figure 1, though Chinese character group word structure is limited, also quite various, complicated.For this reason, the present invention does following simplification in conjunction with the processing capacity of computer system to Chinese character group word structure.
Subsumption architecture divides inclusion body and by inclusion body.Can serve as the parts of inclusion body for each, system is all contained the description of parameter, be used for defining the condition that these parts can serve as inclusion body, and after this parts form subsumption architecture, inclusion body and by the relative position of inclusion body in this structure, parameters such as shape size separately.Like this, the user is as long as whom providing to system is inclusion body, and who is by inclusion body, and system just can produce its corresponding font to subsumption architecture.
Therefore, at first,, can represent with full encirclement for the semi-surrounding structure (see figure 2) of each opening direction.
Secondly, further again, look at parallel construction, if will
Figure 931008662_IMG2
Be expressed as 1 ∥ 2, " ∥ " represents coordination; Be expressed as 1 ∥, 2 ∥ 3;
Figure 931008662_IMG3
Be expressed as 1 ∥ (2 ∥ 3);
Figure 931008662_IMG4
Be expressed as 1 ∥ (2 ∥ 3) ∥ 4);
Figure 931008662_IMG5
Be expressed as 1 ∥ (2 ∥ 3) ∥ 4.Can find, be implied with nest relation in the parallel construction interior.Utilize this, we just can be reduced to parallel construction with subsumption architecture.
Simplify principle: place and column position with inclusion body and by inclusion body, and should and be contained with inclusion body by the custom series arrangement of writing.As
Figure 931008662_IMG6
Can be reduced to
Figure 931008662_IMG7
Or Can be reduced to
Figure 931008662_IMG9
Can be reduced to " burbulent " word, its structure is
Figure 931008662_IMG11
, should be reduced to
Figure 931008662_IMG12
, and should be during the input block sign indicating number by 1-" Rui ", 2-" * ", the order input of 3-" Qian ", if with 2,3 reversed order, the font that then after system handles, produces will for "
Figure 931008662_IMG13
" (constructive code is The time) or (constructive code is
Figure 931008662_IMG16
The time).
Like this, just can represent subsumption architecture with parallel construction.Four and the four equality structures with lower member have 30, therefrom choose 25 and are defined on the keyboard, see that Fig. 3 (can use by product word structure
Figure 931008662_IMG17
Representative).Comprise not that for this 25 structures available these 25 structures are done twice or twice above re-defining.As
Figure 931008662_IMG18
Can be defined as earlier , a many-side people uses again
Figure 931008662_IMG20
Definition 2;
Figure 931008662_IMG21
Optional being defined as
Figure 931008662_IMG22
, and then be defined as 4 Optional being defined as , be defined as 1 again
Figure 931008662_IMG25
, be defined as 2
Figure 931008662_IMG26
From the statistical data of table 1 as can be seen, it is less that the parts more than 5 are formed the probability of a Chinese character, so this access times that re-define can be not too many.In addition, it is noted that, the person of re-defining be repeated the definien for to be contained and containment relationship, as, selected adopted one
Figure 931008662_IMG27
, its parts relationship is 1 ∥ 2, uses again
Figure 931008662_IMG28
Re-define 2, gained is shaped as
Figure 931008662_IMG29
, mutual the closing of parts this moment is 1 ∥ (2 ∥ 3), this and directly use
Figure 931008662_IMG30
Be different (1 ∥, 2 ∥ 3)!
Like this, after input holotype sign indicating number, system checks earlier in the parts of importing the parts that do not have to serve as inclusion body, if do not have, then decision set word structure is a parallel construction.If have, then take out the containing parameter of these parts, judge whether these parts are set up as the condition of inclusion body, if promptly can handle as subsumption architecture.Otherwise, still as parallel construction.
In addition, arrange the point location of one " * " key as the parts input, can be with parts with on arbitrary position of any size definition in character.Like this, for the character of arbitrary shape, arbitrary structures, can directly go definition with coding.Wide, the height of character is divided into the normal place of several five equilibriums, defines the coordinate position available standards location number of a certain parts in character and represent, these parts shared width in character can be represented with its normal place quantity of striding.General configuration be " the wide high parameter of the coordinate parameters+parts 2 of component code+parts 2 in character of the wide high parameter+parts 2 of the coordinate parameters+parts 1 of the parts+parts 1 of constructive code ' * ' key+parts 1 in character in character in character+... ".(concrete grammar slightly)
After the input structure, will the input block sign indicating number.The compiling method of component code can be more flexibly, and its task is that minimum key is determined Hanzi component.Here the kind number of Hanzi component, title are all with reference to the table ten one (999 pages-1009 pages) in " Chinese ", totally 694 of parts, because each parts form of a stroke or a combination of strokes of Chinese character is very similar, structure is in conjunction with also very tight, so very difficult method with the decomposition form of a stroke or a combination of strokes is determined parts, the present invention and is advised adopting the mode of " basic stroke+phonetic " to determine.
The basic stroke of parts, and cast aside anyhow and press down 5 kinds of foldings, the combination of two kinds of strokes has 25 kinds, add up and always have 30 kinds, just in time be assigned near in 30 female buttons, see Fig. 4, wherein 1,2,3,4,5 represent horizontal stroke respectively, perpendicular, cast aside, press down, folding, so, parts can be imported like this: first is two strokes of these parts, (get less than two strokes, basic stroke by single is got), second key (also can get the 3rd for two strokes in end of these parts, four stroke combination, the user can be free), triple bond is that (each initial consonant is corresponding with its corresponding English key, ZH for the initial consonant of the Chinese phonetic alphabet of this component names, CH, SH respectively with A, U, the I key is corresponding).Generally speaking, it is extremely low to adopt triple bond to import a parts repetition rate of coding.As adopt the input of two keys, and then the repetition rate of coding is high slightly, but can improve input speed, adopts two keys to import into still adopting the triple bond input, and the user will select voluntarily according to actual conditions.Parts input hardship has repeated code, can be by handling in following mode: with the repeated code parts by sign indicating number headed by the height row of its frequency of utilization, 0 yard, 1 yard, 2 yards ... 10 yards, then its corresponding options button is respectively space bar → 0 yard, 1 → 1 yard of numeral, 2 → 2 yards of numerals ... 9 → 9 yards of numerals, 0 → 10 yard of numeral.If do not knock options button (space bar or numerical key), face is directly imported next content, sign indicating number headed by system can confirm automatically.So just can guarantee that the higher Hanzi component of nearly 600 frequencies can import with two keys.
The full input process of encoding, example: " sign indicating number "-select input structure sign indicating number
Figure 931008662_IMG31
(pressing the K key), the component code of input block " stone ", the component code of input " horse " at last again: the component code of " king "-(J key)+" king "; " journey "-
Figure 931008662_IMG32
The component code of (S key)+" standing grain "+" king's " component code; " keep away "-
Figure 931008662_IMG33
(Z key)+digital 3+
Figure 931008662_IMG34
(L key)+" corpse "+" mouth "+" standing "+" ten "+" Chuo "; " win "-
Figure 931008662_IMG35
+ numerical key 3+
Figure 931008662_IMG36
+ " dying "+" mouth "+" moon "+" shellfish "+" all ".
Subordinate list
Single Hanzi component is counted the distribution situation statistical form
Figure 931008662_IMG37
By the statistics of subordinate list as can be seen, 2 of the dynamic state part number of packages average out to of each Chinese character, generally speaking, the constructive code of each Chinese character need be imported with a key, each parts need be imported with two keys, and need not to use the space bar short in size in input, and system can carry out short in size according to constructive code automatically.Like this, on average each word need be imported with 5 keys, and per minute can be imported more than 110 words.The highest input speed of individual character is more than 150 words per minutes at present, compare with regard to this speed two: suppose to occur in per 10000 the outer Chinese character of 3 GBs, need coinage, now import 10000 words, need 1000/110=91 minute, plant the coding input with it and need 1000/150=67 minute with this coding, spent 30 minutes (each word has been spent 10 minutes and wherein make 3 words, comprising the overall process of coinage, the storage of font data, the conversion of character library etc.), spent 67+30=97 minute altogether.Both velocity contrasts are few.From subordinate list as can be seen, the word number of times that dynamically occurs that surpasses the Chinese character of six parts reaches 0.057%, and GB one Chinese characters of level 2 middle rank rare six with upper-part, that is to say, actual count is the result show, the outer Chinese character occurrence rate of GB much larger than exemplify previously 0.03%.As seen the actual comprehensive input speed of this coding is greater than other fast codings.And this coded input method reduced the trouble that many coinage bring for the importer, also can allow the beginner import deserted word under the situation that is ignorant of the Hanzi section-position code principle.
For phonetic type literal and phonetic, the mixed type of expressing the meaning literal, can still import with its letter as parts with the form of " constructive code+component code ".And,, therefore need not to have imported again the space of compartmentation between the word because constructive code has been done definition to the number of letters of its whole word for phonetic type literal.
Under the situation of the existing character library and the coding table of comparisons, this coding can adopt the mode of holotype brevity code to import, the all inputs singly of holotype symbols encoded only need not be needed to import by the part code element that a certain rule is chosen in the holotype sign indicating number, to improve input rate.Promptly adopt the simplification input mode of complete input method to carry out the identification input.If the holotype of certain literal is encoded to JB 11B 12B 21B 22B 31B 32Wherein J is a constructive code, and B is a component code, the serial number number of the defined parts of first mark expression component code.Then the form of holotype brevity code can be JB 11B 12B 21B 31Or B 11B 21B 31As long as the code length of brevity code is controlled in 4 keys, concrete input form can freely be selected and be defined by the user.As, import the first key (B of component code of first key+last parts that first parts are amassed wealth by heavy taxation the component code of part sign indicating number (two keys)+second parts 11B 12B 21B 31); Perhaps import the first key (B of component code of each parts successively 11B 12B 21B 31); Perhaps import the first key (B of component code of each parts successively 11B 12B 21B 31) ... or the like.
The Chinese character input speed of holotype brevity code is similar to the Five-stroke Method.But with the holotype method when running into the Chinese character that word do not have in being, can be directly with the input of holotype sign indicating number, identification input and importing fully combines like this, its comprehensive input rate in all trades and professions practical application will be encoded considerably beyond other.
In the Chinese character input,, can be exclusively used in the outer Chinese character of input GB, to separate the hardship of coinage with this coding as the reserve sign indicating number for the user who is accustomed to the use of other coding.
In addition, the present invention can be used for the compression storage to the text font storehouse.System can save the text font storehouse if directly use this cryptoprinciple, and only need set up a part library of describing the parts font, just the literal of enough coding input of energy arbitrary shape.The sum of parts is far smaller than whole word character sum, and the stroke of parts is also less than the stroke of whole word.
Therefore, can significantly reduce the font data memory space of writing system with this method.It is estimated, adopt this mode on the APPLE machine, realize can be enough the Chinese character of simplified arbitrarily, the traditional font of coding input, allosome and common wrongly written or mispronounced characters, the core internal memory needs 8K, full memory only needs about 16K.So just solved the low capacity microcomputer and when using Chinese character, must call all troubles that Chinese character base brings with disk drive, also make low grade for a short time the amount of climbing over a wall microcomputer only have magnetic tape station to make peripheral hardware just to use hanzi system (under no Chinese Card situation).
Claim to set up the conversion table of a GB GB2312-80 region-position code and this coding under the big situation in memory capacity, and the concrete font of each Chinese character is added the modification of some thin portions, make font more attractive in appearance.Like this, just, formed a compression Chinese character base with standard interface.Various codings can directly use this character library by GB district sign indicating number, produce font or obtain font data.
Concrete is constructed as follows:
One, with certain font of Chinese character all unified numbering of the stroke that might occur, set up a stroke shapes table.The concrete shape of each stroke is all made detailed vector or dot matrix data logging (as long horizontal stroke, hyphen, long perpendicular, short perpendicular, perpendicular left-falling stroke, lifting-hook, tiltedly hook, heart hook for sleeping in ...).
Two, give all parts unified numbering, set up a component shape table.Based on the stroke of above-noted, produce the shape of each parts: at first write down the stroke number of each stroke, a many-side people writes down starting and ending coordinate or the normal place in this parts number of each stroke in these parts.In case of necessity, do the correction of some thin portions, as which stroke this is longer, which pen should be thinner or the like.
Three, with the also unified numbering of Hanzi structure that can occur in the GB, set up a Hanzi structure table, successively each Chinese character in the GB is gone definition with this coding: at first write down the structure number of this Chinese character, the sequential write by this Chinese character writes down the parts code name of forming this Chinese character successively again.So just finished the thick frame of a Chinese character, last in addition some corrections of thin portion again should be greatly as which parts, and which two parts combination should be tightr, or the like.
Four, all corrected parameters, correction type, correction character are concluded, set up a corrected parameter table, corrected parameter of each sequence number contrast.First parts and second parts are in conjunction with very tight when organizing word as No. 1 corrected parameter is defined as; The degree that No. 2 corrected parameters are defined as last parts is than the big normal place of corresponding ratio, or the like.
Whole compression process is seen Fig. 5.
Because the present invention has found a kind of literal generation rule that can conclude the literal shape, so can effectively compress text font.
Can see that such record has greatly reduced the character library capacity; Above record, except the first step is described the record of the dot matrix of shape or vector to stroke, to parts, the description of whole word shape all is simplified to the record to stroke code name, parts code name, group word structure code name, corrected parameter code name and some coordinate datas or standard coordinate location number.Therefore, every increase parts, whole word all only need to increase several code names, coordinate data, a kind of font of every change, and the shape that only need change basic stroke sometimes gets final product.Like this, dot matrix is high more, and font is many more, and the multiple of memory capacity compression is just big more, can reach 1: 10-1: 100.Equally, can split into " structure+parts " by whole word for phonetic type literal and phonetic, the mixed type of expressing the meaning literal; Parts split into the mode of stroke and compress.
General Hanzi internal code national standard region-position code on the computing machine is a kind of identification ISN now.If the present invention is used for the computing machine ISN, can makes the interior code function of computing machine obtain important breakthrough and realize complete ISN.Concrete structure: the ISN of each Chinese character in two sub-sections, the vermachen of first's recording-member structure type in the Hanzi structure table, second portion writes down the vermachen in the parts a word used in place name spare shape table of forming this word one by one by sequential write thereafter.The ISN of each Chinese character is not isometric, and its length is by the defined component count decision of modular construction.
So just, can be standard with complete ISN, formulate a fully interior code plan that each literal of the world is general.
Concrete grammar: fear all group word parts of various nationalities' literal, be created as a component shape table; Summarize the group word structure that these parts may occur, set up a group word structural table, and set up " re-defining " and " point location " two special constructions, the group word structure that had not defined in order to define system.For the extremely low group word structure of frequency of utilization, group word parts, the whole word of their described characters can be inserted the component shape table as parts; Some are had the segment of certain sense, also can insert in the component shape table with the form of parts, these characters, segment can access it with the form of single character from part library.Like this, the ISN of each character so defines: two parts of the ISN diet of each character, first writes down the sequence number of group word structure in structural table of this word, and second portion is respectively organized the sequence number of word parts in components list by what sequential write write down this word one by one.As, the ISN of each English words, first defines its group word structure and organizes the word component count for equality parallel type from left to right with it, and second portion writes down its every times part (letter) one by one, can save the storage in the space of separated words effect; The ISN of Chinese can be with reference to the Hanzi internal code of front.
Further say, this ISN can adopt two kinds of concrete forms: one, the first of ISN and second portion are relatively independent, be that constructive code and component code are unrelated, each structure or parts all only have unique serial number, corresponding structure of sequence number or parts, but because the group word parts total amount of various nationalities' language is sizable, this makes that the required numerical value of the ISN of each word of record is also a lot.Two, the second portion of ISN needs to determine in conjunction with first.This method is divided into several region with part number, and each parts is being compiled out position number in the position in the position separately separately separately, and promptly each parts need be determined with " area code+position number ".The second portion of ISN only writes down the position number of each parts in district separately, is in which district as for concrete these parts and then can judges from the structure of first's record number.Because each group word structure only is applicable to a part of certain components, and system normally has these parts of general character and is summarized in the same position.That is, each group word structure all only defines the group word parts in the unique district, and any organizes the area code that word structure has all been fixed its defined group of word parts in other words.Adopt this second kind of scheme can suitably reduce the code length figure place of whole ISN.The length of interior code bit number is to weigh the important parameter of a standard.
The group word structure sum of world various nationalities literal is at least more than 300 kinds, and group word total number of parts is at least more than 1000.Like this, adopt first form, the storage that each group word structure is the people needs 9 binary digits (512), and each storage of organizing the word parts is needed 14BIT(16384); Adopt second kind of form, the storage of each group word structure number needed 9BIT(512), each group word component stores is needed 10BIT(1024).
Directly adopt the structure of " group word structure number+group word part number " to store, do not wait, brought difficulty for the processing of system, influenced the efficient of system because the numerical digit of group word structure number and group word part number is long.
Can take following method to handle: will organize the word structure and be included into same set with group word parts, group word structure is deposited on the top of set, the bottom of set deposits group word parts in, and each group word structure or group word parts all have unique serial number corresponding with it in set.In set, have one the boundary number.Sequence number is greater than this boundary number, is group word structure number.Sequence number is group word part number less than this boundary number.As, above first kind of form of ISN, each structure number or part number can adopt 15BIT(32768) store, sequence number is greater than 28672(binary one 110000,00000000) decidable be structure number, less than 28672 be part number.Second kind of form of top ISN, each structure number or part number can adopt 11BIT(2048) store, sequence number is greater than 1536(binary one 10,00000000) decidable be structure number, less than 1536 be parts.Like this, compare with top method, though can take some internal memories more, such processing, the numerical digit equal in length of structure number and part number also is easy to structure number and part number differentiated and opens, thus the treatment effeciency of raising system.
From top data as can be seen, no matter take any concrete form, this ISN all will be saved internal memory than the international ISN standard of the double byte of present formulation.
So, the literal of which state no matter, as long as obtain its ISN, we make the country origin that can learn this literal, the structure of forming this word and parts, and produce the font of this word.Conversely, to a literal that has been shaped, can also be with the unique definite sign indicating number of this ISN as this word.
In addition, because this ISN and input coding are same structure, so just need not to do the table of comparisons of an input coding and ISN, also need not otherwise designed input coding scheme, significantly reduced the expense of system, this is particularly important for multi-lingual input.
Concrete embodiment is referring to Fig. 6.The Hanzi component sequence number of ISN is taken from dynamically preface frequently of table ten five is arranged in " Chinese " parts group word in the chart, the Hanzi structure sequence number press among Fig. 1 structure from top to bottom, number from left to right.The sequence number of English parts (letter) is identical with ASCII.Chinese character group word structure is number from 1 open numbering, and English group word structure is number from 257 open numberings, and Hanzi component is number from 1 open numbering, and English part number is from 1025 open numberings.
Description of drawings:
Fig. 1: Hanzi structure exploded view.Code name among the figure, 1: single character 2: parallel construction 3: subsumption architecture 4: product word structure 5: left and right sides structure 6: up-down structure 7: full investing mechanism 8: semi-surrounding structure.
Fig. 2: the concrete shape of semi-surrounding structure.
Fig. 3: Hanzi structure sign indicating number keyboard layout.
Fig. 4: stroke 1 combination and keyboard corresponding diagram.1,2,3,4,5 respectively corresponding horizontal, vertical among the figure, cast aside, press down, folding.
Fig. 5: character library compression method diagram.Code name among the figure, 1: the concrete shape of each stroke of detail record.2: record stroke numbering and its last coordinate in parts.3: record group word structure code name and group word parts code name.4: corrected parameter.A: stroke.B: parts.C: word.
Fig. 6: holotype sign indicating number embodiment chart.

Claims (12)

1, whole words (Chinese character) code is characterized in that the structure with " the group word parts of the group word structure+literal of literal " fully explains structure, the shape facility of arbitrary phonetic or ideograph character.
2, the described holotype sign indicating number of claim 1 is characterized in that the alphabetic character and the graphic character of explaining with " group word structure+group word parts " can't or being difficult to, all with its integral body as one group of word parts.
3, the described holotype sign indicating number of claim 1 is characterized in that setting up the specific group word structure of " point location ", can be with group word parts with on arbitrary position of any size definition in the character.
4, the described holotype sign indicating number of claim 1 is characterized in that setting up the specific group word structure of " re-defining ", can utilize system to define group word structure and carry out nested definition.
5, the described holotype sign indicating number of claim 1, being used for the computword input can be that coding structure is the all-key input of " constructive code+component code ", or the brevity code that the part code element in the input of selected parts all-key is carried out is imported.
6, the described holotype sign indicating number of claim 1, the compression that is used for the text font storehouse is to split into " structure+parts " according to whole word, parts split into the form of stroke to carry out.
7, the described holotype sign indicating number of claim 1, can group word structure number and the unrelated mode of group word part number when being used for the computing machine ISN of single or multiple national writings with the structure of " group word structure number+group word part number ", or will organize word part number partition number and location number, location number is used for record, and area code is according to the group word structure number next mode of determining.
8, according to claim 5, the chinese-wide code of holotype sign indicating number, its constructive code is characterized as, and " subsumption architecture " is converted into " parallel construction " represents.
9, according to claim 5, the holotype sign indicating number is used for the input of Chinese characters in computer keyboard, its component code structure is: first sign indicating number is two stroke combination of parts, and inferior sign indicating number is two pen combinations of three, four stroke combination or ends of parts, and last code is the initial consonant of the component names of these parts.
10, according to claim 7, the holotype sign indicating number is used for the storage organization of the general ISN of literal, and its structure number and part number both can divide two set to be numbered, but also subordination set, segmentation is numbered.
11, holotype sign indicating number according to claim 1, its group word structure is in order to the relative position relation of record group word parts, and group word parts are in order to the shape of record group word font element.
12, holotype sign indicating number according to claim 1, when being used to produce text font, big or small width and the position of group word parts in character is to decide according to shape, characteristic attribute that group word structure is taken in conjunction with group word parts.
CN93100866A 1993-01-12 1993-01-12 Whole words (Chinese character) code Expired - Fee Related CN1091529C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN93100866A CN1091529C (en) 1993-01-12 1993-01-12 Whole words (Chinese character) code

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN93100866A CN1091529C (en) 1993-01-12 1993-01-12 Whole words (Chinese character) code

Publications (2)

Publication Number Publication Date
CN1089735A true CN1089735A (en) 1994-07-20
CN1091529C CN1091529C (en) 2002-09-25

Family

ID=4983272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN93100866A Expired - Fee Related CN1091529C (en) 1993-01-12 1993-01-12 Whole words (Chinese character) code

Country Status (1)

Country Link
CN (1) CN1091529C (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193647A (en) * 2010-03-20 2011-09-21 赵现隆 Position Chinese character input method for shape codes and touch screen
CN105677718A (en) * 2015-12-29 2016-06-15 北京汉王数字科技有限公司 Character retrieval method and apparatus
CN105912139A (en) * 2016-01-11 2016-08-31 金云中 Corresponding recognition method for coding Chinese characters by using modular strokes
CN106649764A (en) * 2016-12-27 2017-05-10 北京汉王数字科技有限公司 Character retrieval method and character retrieval device
CN107241100A (en) * 2016-03-29 2017-10-10 北大方正集团有限公司 Character library component compresses method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN85102473B (en) * 1985-04-01 1987-11-25 山东电子研究所 Chinese character information processing technique with sequential word-root approach

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193647A (en) * 2010-03-20 2011-09-21 赵现隆 Position Chinese character input method for shape codes and touch screen
CN102193647B (en) * 2010-03-20 2015-06-10 赵现隆 Position Chinese character input method for shape codes and touch screen
CN105677718A (en) * 2015-12-29 2016-06-15 北京汉王数字科技有限公司 Character retrieval method and apparatus
CN105677718B (en) * 2015-12-29 2019-04-09 北京汉王数字科技有限公司 Character search method and device
CN105912139A (en) * 2016-01-11 2016-08-31 金云中 Corresponding recognition method for coding Chinese characters by using modular strokes
CN107241100A (en) * 2016-03-29 2017-10-10 北大方正集团有限公司 Character library component compresses method and device
CN106649764A (en) * 2016-12-27 2017-05-10 北京汉王数字科技有限公司 Character retrieval method and character retrieval device
CN106649764B (en) * 2016-12-27 2020-04-17 北京汉王数字科技有限公司 Character search method and character search device

Also Published As

Publication number Publication date
CN1091529C (en) 2002-09-25

Similar Documents

Publication Publication Date Title
US5475767A (en) Method of inputting Chinese characters using the holo-information code for Chinese characters and keyboard therefor
CN1102714A (en) Chinese character input method and keyboard based on two strokes and two-stroke symbol
CN1091529C (en) Whole words (Chinese character) code
CN1194287C (en) Chinese full-information word-phrase code imput method for computer and its keyboard
CN1202461C (en) Chinese-character 'Four-corner rucmbers' input method for computer
CN1243300C (en) Three-stroke digital code Chinese character input method in computer
CN1120403C (en) Number code input method of Chinese characters
CA2026228A1 (en) Holo-information code of chinese characters
CN1851625A (en) Digital keyboard Chinese character and word group input method
CN1130618C (en) Chinese-English input method
CN1164982C (en) Yi-code input method for Chinese characters
CN100342314C (en) Chinese digital characteristic code inputting method and keyboard
CN1417668A (en) Simple digit, symbol and Chinese character input method and keyboard
CN1136177A (en) Method and keyboard for inputting by three strokes and three spelling
CN1167994C (en) Input method for Chinese character
CN1202462C (en) Fuzzy double-radical Chinese character input method
CN115033117A (en) Improvement of novel pure-stroke Chinese character input method
CN1049418A (en) Unicode computer Chinese character key-board input method
CN1122911C (en) Inputting Chinese characters by simple codes using two times reading method
CN1039512C (en) Single stroke input method and keyboard thereof
CN1523477A (en) Ten digit Chinese characters coding method
CN1115050A (en) Four-stroke character root coding method and its keyboard
CN1309343A (en) Chinese-character shape-first phonetic letter input method with numeral keypad
CN1079059A (en) " China, Japan and Korea S. " multinational Chinese voice coding input technology method
CN1285542A (en) Dingli shijie lode and improved compatible keyboard thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee