CA1172335A - Means for encoding ideographic characters - Google Patents

Means for encoding ideographic characters

Info

Publication number
CA1172335A
CA1172335A CA000347137A CA347137A CA1172335A CA 1172335 A CA1172335 A CA 1172335A CA 000347137 A CA000347137 A CA 000347137A CA 347137 A CA347137 A CA 347137A CA 1172335 A CA1172335 A CA 1172335A
Authority
CA
Canada
Prior art keywords
keys
accordance
code
key
keyboard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
CA000347137A
Other languages
French (fr)
Inventor
Daniel L. Leung
Lai-Wo S. Leung
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LEUNG LAI WO S
Original Assignee
LEUNG LAI WO S
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LEUNG LAI WO S filed Critical LEUNG LAI WO S
Priority to CA000347137A priority Critical patent/CA1172335A/en
Priority to GB8106816A priority patent/GB2071018B/en
Priority to JP3137881A priority patent/JPS56140436A/en
Application granted granted Critical
Publication of CA1172335A publication Critical patent/CA1172335A/en
Priority to HK40086A priority patent/HK40086A/en
Expired legal-status Critical Current

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B41PRINTING; LINING MACHINES; TYPEWRITERS; STAMPS
    • B41JTYPEWRITERS; SELECTIVE PRINTING MECHANISMS, i.e. MECHANISMS PRINTING OTHERWISE THAN FROM A FORME; CORRECTION OF TYPOGRAPHICAL ERRORS
    • B41J3/00Typewriters or selective printing or marking mechanisms characterised by the purpose for which they are constructed
    • B41J3/01Typewriters or selective printing or marking mechanisms characterised by the purpose for which they are constructed for special character, e.g. for Chinese characters or barcodes

Landscapes

  • Document Processing Apparatus (AREA)
  • Input From Keyboards Or The Like (AREA)

Abstract

File 1358 P/2 CA

APPARATUS FOR ENCODING IDEOGRAPHIC CHARACTERS
ABSTRACT OF THE DISCLOSURE

A word processing system for Chinese type characters includes a keyboard with a generally standard key arrangement for encoding the characters in accordance with their basic stroke type and sequence. Up to eight basic stroke types may be employed, although a five stroke system is preferred. Recurrent code sequences of two, three, four and five strokes are identified. An "end of character" code may be generated with the space bar. Preferred sequences are assigned key positions so as to provide an ergonometrically efficient keyboard. Average typing speeds using the keyboard are comparable on a character/word basis to those for English.

Description

33~

- 1 - File 1358 P/2 CA
ME~NS FOR ENCODING IDECGRAP}~C CHAR~CI~RS
FIELD OF INVENTION
This invention relates to improvements in method and apparatus for information processing. It particularly relates to improvements in such method and apparatus applicable for use in cor~ection with Chinese type eharacters currently used in Chinese, Japanese and Korean script, which are commonly referred to as ideographs.
B~CKGRDUND OF THE INVENTION
m e Chinese language is rep~rted to comprise about 30,000 charaeters. Some 8,000 are listed in a cc~monly used Chinese-English dietionary, these beirlg sufficient for modern Chinese prose. A voeablllary of about 3,000 characters accounts for 95~ of the eharaeters in every day use. Telegraph ecde books are limited to abo~t 9,600 charaeters.
The 30,000 charaeters that ccmprise the writing system of the Chinese language are a heterogenous set, and were ereated at different stages of the development of the language. The pronuneiation, :in general, has been assigned arbitrarily to the eharaeters, and the strokes from whieh the characters are composed have no syntaetic meaning in themselves. m e `!1~

33~
-2- File 1358 P/2CA

characters are not amenable to classification in a well structured system. The traditional Chinese dictionary arrangement method is in accordance with radicals and strokes comprising the character.
The system has many deficiences. Ihere are 214 different radicals listed in the Kang Hsi dictionary, and it is sometimes difficult to determune the radical group to which a character is related, especially when it is not a phonetic ccmpound or a c~mplex ideograph. Iooking up a character is tedious and involves some six steps. Also, there is considerable degeneracy, with up to 30 characters having the same radical-stroke number characteristic.
A second system that is sometimes used is one wherein -the four corners of the character are assigned a number in accordance with the stroke types and configuration of the strokes in that corner. The rules are relatively ccmplicated, and mis-coding frequently occurs. Degeneracy is also a problem; for example, in the 2000-2999 section of the four corner code table in the Xinhua Zidian (New Chinese Dictionary) 1971, there are 1599 characters defined by 885 codes.
The Pinyin method of classification was introduc~d in Beijing (Peking~ in the 1950's. The method involves a standardized phonetic system of representation using the Latin letters and tone indicators. m e assignment of phonetic values necessitates a kncwledge of the official dialect (Mandar m), and also subtle differences in the sound and tonè must be discerned in nany characters. Pinyin spelling of characters involves considerable degeneracy.
Other systems of classification are also known, serving .~ ~L7~33~i
- 3 - File 1358 P/2 CA

for different purposes. me telegraph code lists some 9~600 characters numerically, thus avoiding degeneracy entirely.
However, the list is accessed by operator search, or memory in the case of comm~nly used characters, hence the system is slow and requires considerable training. More recently, Caldwell, US Patent 2,950,800, proposed a system based upon the type of stroke from which the characters are constructed, and the sequence thereof.
Scme 21 "basic" strokes were identified. Some degeneracy was observed, but this was relatively small in comçarison to the more traditional systems. Mbreover, the method did not necessitate a fine knowledge of the Chinese written language or a particular dialectic manner of pronunciation~ hence it could be o~en for widespread use.
Once a Chinese character is converted into a code signal, such signal may be employed in an inormation processing system such as ccmmMnication, printing, translation and machine control.
Thus, Caldwell described an electro-Techanical keybvard device for mputting the oode elements into an accumulator. The concatenated code elements in the accumwlator were then converted into X-Y coordinates so as to select and control the position of a film matrix upon which the preformed characters were stored, whereby the selected coded character could be optically printed. More recently micro-processor developments wculd readily permit the construction of electronic analogues embodying Caldwell's system, such as shown by Shashoua et al, US Patent 3,325,786. Still more recently in accordance with well kncwn prccedures, writing instructions converted from code 7233~;
- ~ ~ File 1358 P/2CA

signals may be for CRT, LED or "liquid crystal" display, or for printing such æ impact printing, matrix wire printing, hot point printing or jet printing, for example. Plso, whilst such instructions may relate to writing pre-formed character, they may relate to instructions for synthesising such characters. A
simple synthesis was proposed by Li, US Patent 3,950,734 wherein a "prefix" an~ "suffix" were ccmbined to form a character. More complex systems of synthesis in accordance with the stroke type and spatial configuration of the strokes are also known, for example as in the electronic system designed by Wakamatsu, US Patent 4,144,405, or in the various mechanical systems that have heretofore been proposed.
It is important to note here that the kinds of strokes for synthesis of the character for writing purposes are not well-defined. Most strokes are not known by name to the average Chinese writer, and the classification of such strokes into types is quite arbitrary. ~ilst Caldwell defined and employed 21 such basic writing stroke types for encoding purposes, it has been recognized heretofore that a small number of stroke types would suffice for this purpose. A summary of different stroke types for coding systems which have heretofore been proposed is given by Stallings, "Pattern Recognition", Pergamon Press, Vol 8 pp 87-98 (1976). Cheung and Chan, in "Ccmputer-aided instruction in Chinese characters" Proc. 1st Int. Symposium on Computers and Chinese Input/Output Systems, 599-616 (1973) identify scme 31 different stroke types. Liu, in "Real Time Chinese Hand Writing Recognition Machine" MIT Cambridge, E.E. Thesis, 1966 identifies ~'7~335.
S ~ File 1358 P/2CA

19 stroke types. Yoshida and Ede~n, in "Handwritten C~inese -~C'haracter Reoognition by an Analysis-by-Synthesis Method". Proc.
1st Int. Conference on Pattern Recognition, 197-204 tl973) identify 7 stroke types, and Groner let al, "On-line computer classification of handprinted Chinese characters as a translation aid" TFF~ Trans Elect. Cbmput. 16, pp 856-860 (1967) propo~e 5 types. The 7 stroXe types and the 5 stroke types coding methods are referred to in greater detail subsequently herein.
A keyboard for encoding characters in accordance with stroke type and sequence may permit touch typing of the characters. Using his definition of 21 stroke types, Caldwell designed a keykoard with 21 "stroke keys", each assigned to one stroke type. ~lowever it was at once apparent that the spaed attainable with such design would be, character for word, low in comparison to the average typing speed in English language on a Qwerty keyboard, the average strokes per character being about 10, and the average number of keystrikes per English word being about 5.
Caldwell reduced the number of keystrokes per 2Q character by two expedients. The first was texmed "munimwm spelling", whereby the length of the code word (that is, the sequence of code elements corresponding to the stroke types) for a character was truncated so as to just distinguish the character from other characters oomprising the vocabulary list, whilst avoiding redundancy. For example, when an operator keye~ in the oode word BGD ~GV BDP BDP BGE OE, the keyboard would lock after the seventh key had been hit, as the further information was - 6 - File 1358 P/2CA

not rec~lired to distinguish the character from the remaining characters comprising the vocabulary list. In an expanded vocabulary list containing the code word sDG EGV BDP BDP BGE GF, which differs from the above example in the last cocle element only, all of the code elements are required to avoid degeneracy. It is apparent -that the applicabilit~ of "minimum spelling" in reducing the length of code words is very much dependent upon vocabulary size. The second exp~dient was the adclition of "entity keys", which keys generate a signal corresponcling to a sequence of strokes as oppos~d - 10 to one stroke. Some 20 different "entities" were described, each representing several strokes in specific spatial arrangement, often that of a radical or having other syntactical significance.
From his relatively small vocabulary of 2,333 characters, Caldwell reported a reduction of the median value of ~0.2 strokes per 15 character to 6.7 using the "mu mmum spelling" methocl. When using a small sample drawn from the aforementioned vocabulary, Caldwell estimated the average number of keystrokes necessary to enter a character making full use of the "entity keys" was 4.7, which coincides quite closely to the average word length in the English language. However! after a suitable period of training, the typing speed on such a keyboard WdS reported by Caldwell to be onl~ 14 characters per minute. Sueh typing speed is, of course, much less than is considered average for typing English words.
~e consider that stroke coding systems for Chinese type characters which employ highly discriminating "basie" str~kes have inherent disadvantages which tend to limit the attainment of good typing speeds. For example, certain strokes have a close 33~

- 7 - File 1358 P/2 CA

resemblance to other strokes; this may be conducive to error in coding, and considerable effort must be expended on the part of ~he typist to distinguish between the types. Further, whilst there is no theoretical limit to the number of ~rd encoding keys whih may locate on a keyboard, there would appear to be a practical limit beyond which touch typing becomes increasingly difficult. As a first approximation it is not believed to be desirable to exce0d the 26 letter keys of a Qwerty keyboard. mus, whilst Caldwell identified some 20 different "entities", only 6 were assigned a key position, together with the 21 "basic" stroke keys. This restriction on the number of keys severely limits the applicability of the entity keys, since the percentage of characters of an expanded vocabulary list which may be enccded using the assigned '1entity" keys is necessarily limited. Still further, in accordance with Information meory, an optimal coding system should have a set of code elements each of which is used an approximately equal num~er of times when coding an average text.
A 21 basic stroke code system is far from optimal sinoe, as stated by ~aldwell, "90% of all Chinese writing is accounted for by only 9 of the 21 basic strokes". (op. cit.) The shortest uniform length binary signals that could be assigned to each of these stroke types w~uld be 5 bits, and would be highly redundant. ~ence, - Caldwell employed Huffman's method of constructing munimum redundancy codes of non-uniform lengths tD.A. Huffman, "A method for the construction of Minimum-Redundancy Ccdes", Pr~c. I.R.E., 40, pp. 1098-1101, 1952). Such non-uniform length signals for code elements are used in serial transmission of information, and pose ~ ~Lt7~33~i - 8 - File 1358 P/2 JAP

no problems for large computers with large accumulators. Hcwever in smaller information processing sys-tems where the accun~lators commonly have 8 to 16 bits, additional circuits and ccmponents are required before such code signals can be processed.

It is an object of the present invention to provide an efficient system of coding to facilitate the inputting of information representing Chinese type characters into a information processing system.
It is an object of our invention to provide an improved information processing system for writing Chinese type characters.
It is a further object of our invention to provide improved keyboard apparatus for use in the above system.
SUMMARY OF INVENTIVE ASPECTS
In accordance with one embodiT~nt of the invention, an apparatus for encoding Chinese type characters in accordance with their basic stroke type and sequence CQmpriSeS a keyboard having not more than eight'~onographic"keys and a plurality of "digraphic"
keys. A "monographic" key is defined here to be representative of a single basic stroke type. A "digraphic" key is defined to be representative of a sequence of two basic strokes, which may be identical or otherwise. Means is provided responsive to the actuation of each "monographic" key for generating a code signal representative of the basic stroke associated therewith, which means is responsive to the actuation of each "digraphic" key for generating a ccde signal representative of the sequential basic strokes associated therewith.

3~
- 9 - File 1358 P/2CA

In a preferred aspect of ~he invention, the number of basic keys representing basic stroke types is limited to five, and the stroke types are classified by simple geGmetric properties, as is described herein.
In accordance with another aspect of the invention, the keykoard includes in addi-tion to the nographic and digraphic keys a plurality of keys having a graphicity of more than two, e.g.
three, four or five. In a preferred form, the selection of the polygraphic keys (which expression includes digraphic) for inclusion on the keyboard is made primarily in accordance with their stroke saving function, as defined herei~. In accordance with a still further aspect of the invention, the selected keys are assigned positions on the keybcard so as to provide a key~oard of good ergono~etric efficiency.
m e above mentioned and other features and objects of our invention and the manner of obtaining them will beccme more apparent and the invention itself will be best understood by reference to the following description of an embodirent of the invention taken together with the accompanying drawing wherein:
BRIEF DESCRIPTION OF THE DR~WINGS
Fig. 1 - shows the stroke types of a first prior art oode arrangement;
Fig. 2 - shows the stroke types of a second prior art code arrangement;
Fig. 3 shcws the stroke types which we preferably employ herein for character ooding purposes;

7;~;~3~

- 10 - File 135~ P/2 CA

Fig. 4 - shows certain strokes which are treated exceptionally for character coding purposes;
Fig. 5a, 5b and 5c - sh~7w examples of Chinese characters enccded in accordance with the code system illustrated in Fig. 3;
Fig. 6 - shows a keykoard arrangement in accordance with this invention, and Fig. 7 - shows in schematic form a word processing system embodying the present invention.
DESCRIPTION OF A PREFER~ED EMBODIMENT
Referring to Fig. 1, there is shown therein the five basic stroke types proposed by Groner et al, loc. cit. for the purpose of encoding Chinese type characters. In Fig. 2 there is shcwn the seven elements proposed ~y Yoshida et al, lcc. cit.
for this purpose. It is tc ke remarked ~hat the above authors e~ployed a tablet inputting means necessitating pattern recognition techniques by the ccmputer to identify the stroke types. The above basic strokes are not necessarily suitable for use in connection with the present invention, but they are illu~strative of codes which may ke represented by three characteristic bits.
m e particular classification of basic stroke types that we identify for coding purposes and which seems well suited to our invention is shcwn in Fig. 3. m ese basic strake types are as follows:

3~S

~ File 1358 P/2 CA
TYPE STR3KE(S) 1 horizontal 2 vertical, optionally with left hook 3 left or right obliques and curves
4 dot S angu~ate~ strokes sustaining an acute or right angle Three strokes are illustrated in Fig. 4 tha-t are treated as an exception. These strokes are conventionally considered as one-stroke executions; hGwever, in accordance our preferred method of coding, these are considered as being ccmposed of tw~ basic strokes of the types indicated.
Examples of characters encoded in accordance with our basic stroke types are shcwn in Fig. 5:
Fig. 5a: (to) call; code sequence 25152 Fig. 5b: bird; code sequence 354251 Fig. 5c: (to) bolt; code sequence 4251 The akove described five basic stroke types are defined in such manner as to facilitate easy recognition, and also such that the frequencies of occurrence for each of the five stroke types are approximately equal so as to optimize the efficiency of the coding system and allcw for the efficient use of uniform length signals for representation of code elements.
An analysis of Chinese type characters coded in accordance with the code elements defined in Fig. 3 indicates that certain code sequences of two or more elements are recurrent. The samples that we employed for analysis were excerpted from recently 7~335 - 12 - File 1358 P/2CA

published rnaterials of diverse contents from kooks, newspapers and magazines. Separate samples of 5,250 characters were analysed as described below, and we found that: all -the frequencies pertinent to our design are stabilized at this sample size; that is, these samples of about 5,000 characters are statistically representative of rnodern Chinese prose. Four such samples totalling 21,000 characters were ana]ysed in detail.
A value we refer to as the Stroke Saving Function (SSF) in accordance with the follcwing definition was calculated for each code sequence of significance that was identified in the above analysis SSF = f (n~l) where f is the frequency of occurrence of the given sequence and n is the number of code elements in the sequence. In calculating the SSF value, any character with distir.ct left and right parts is considered to comprise two disconnected code sequer~ces rather than one continuous code sequence. For example, the character illustrated in Fig. 5a having a code sequence 25152 is treated for the purpose of calculating the SSF as comprising separate sequences 251 and 52. mis lS the natural manner that an operator would tend to treat such character, and is analogous to the preference to spell Fnglish words in syllables. The hundred or so most commonly occuring code sequences identified in the above analysis were investigated and preliminary SSF values calculated therefor. The sequence having the highest SSF value was then "selected"and identified by a specific designation whereby in subsequent calculation shorter code sequences comprised in the "selected" sequence would not be encountered, and whereby in longer :~7'~33~

- 13 - File 1358 P/2 CP-code sequences which include the selected sequence, the selected sequence would be considered as cc~nprising a single code element.
To illustrate the concept, assume that in a hypothetical sample of 40 characters, the character with code word "251 occurs 20 times, the character with code word "2511" occurs 10 times, and the character with code word l'4125'l occurs 10 times.
Preliminary SSF values are deterrnined as follows:
~ODE SEQ Fl~ S.S.F. V~LUF
.. .. . _ .
"25" 20 from code word "251" (20+10+10)Xt2-1)=40 10 fran code ~rd "2511"
10 fran code word "4125"
"51" 20 from ccde word "251" (20+10)x(2-1)-30 10 fran code word "2511l' u25lll 20 fran code wor~l "251" t20+10)X(3-1)=60 10 fram code ~.7Ord "2511"
"2511" 10 fram code word "2511" (10)x(4-1)=30 4125" 10 fonTI code word "4125" (10)x(4-1)=30 Fr~m the above, it is seen that the grea~:est SSF value is 60, being that attaching to the code sequence "251". If it now be 20 assumed that this sequence is "selected" and represented by t~e sy}~ol "*", the sequences of the above example may be identified as "25", "51", "*", "*1", "4125".
The SSF values attaching to the newly defined sequences are determined as follows:
25 CODE SEQ. E~ OF ~[~RR~E S.S.F. VPLI3E
_ _ _ _ "25" 0 from code word "*"
O fran code word "*l"

3~3~
- 14 - ~ile 1358 P/2 C~ODE SEQ. P~I~U~C~! OF OCCU~NCE S.S.F. V~l.UE
10 frcm code word "4125" (lO)x(2~ 10 "51" 0 from code w~rd "*"
O from code w~rd "*l" (O)x(2~ 0 "*"(previously "251") 20 from code word "*" (20)x(1-1~=0 "*"(previously "2511") 10 frcm ccde word "*1" (lO)x(2-1)=10 "4125" 10 from code w~rd "4125" (lO)x(4-1)=30 In accordance with the newly calculated SSF values the code sequence "4125" would be next "selected" as having the highest SSF value, and the process of calculation repeated for the rema ming se~uences. It should be understood that in the above example the frequenc~s and the SSF v~ ues calculated therefrcm are illustrative only of the concept, and that they do not bear any quantitative significance. In practice the SSF value of khe code sequence "4125"
is relatively low, and the sequence is not observed a ngst the code sequPnces having the top 30 SSF values. It may also be noted that in practice the SSF value of "selected" sequences w~l~d nok be recalculated in the manner shown, sin oe a seq~loe, once s@lected, is defined for recalculation purposes as ccmprising a single code element (n=l~ for which the SSF value has no significan oe.
As will be appreciated from the above,the determ mation of the Stroke-saving Function values is a non-linear process heav;ly dependent on which code sequenoe s have already been selected, and for test purposes the number of sequences investigated is desirably greater than the number to be selected.
A list of 28 code sequences in generally descending order . ~ ., ~'7~335:

- 15 - File 1358 P/2 of SSF values as determined in accord~nce with the foregoing principles is given in Table 1 below TAELE 1: Ccde sequences havinq hiqhest SSF values It should be noted that ther:e are relatively ~small differences only in the SSF values associated with the last several sequences listed, and it is again stressed that the values associated with the sequences will be dependent upon the "selected"
sequences, hence some change in the order, particularly tcwards the lower end, may be found.
The p~lygraphic sequences of code elements such as are given in Table 1 are not intended to define the spatial arrangement of the strokesl and the significance thereof may vary from character to character. This is illustrated by the codes of the characters defined in Figs. 5a, 5b and 5c where in each instance the trigraphic code sequence of 251 occurs. In Fig. 5a this sequence represents the radical ~ " uth". In Fig. 5b this s3me sequence represents an arrangement of strokes having no syntactical significance. In Fiq. 5c the same sequence represents a still further arrangement of strokes differing from those of Fig. 5a and 5b and again having no syntactical significance. In this respect, then~ the sequences do not correspond to the "entities"
of strokes defined by Cal*well, such "entities", it being recalled, being representative of a defined and specific spatial arrangement of strokes normally having a syntactical significance. Thus whilst ~ 7'~33~

- 16 - File 1358 P/2 C31dwell proposed an entity key representative of the r~dical ~Imouth~ and such key would be of utility in encoding the character of Fig. 5a, such utility w~uld not extend to encading the strokes comprising the characters of Figs. 5b and 5c.
Whilst the limitation of the nu~er of basic stroke types used for code pu~poses has considerable significance in assisting stroke type identification as earlier discussed, such limitation has further significance m relation to an ergonometric-efficient keyboard. Given a standard Qwerty keyboard with 44 keys, let it be assumed that 10 such keys are assigned to input numerals and 4 to input special instructions and punctuations; there will then remain 30 keys for character and machine function ooding purposes.
If 21 such keys are required for encoding basic stroke, there will be available only 9 keys to which polygraphic code sequences may be assigned. In a 21 basic stroke codin~ system there is a total of 441 digraphic sequences (212) that are theoretically possible, h~nce it will be ~pparent that only a relatively small proportion of the digraphic sequences could be assigned a key position. Also, in such s`ystem the stroke saving functions of the various ccde sequences are generally lcw due to the ccmparatively law value of f, the frequency with which a sequence recurs.
In ca~parison, in our preferred 5 basic stroke system, there may be up to 25 keys available for assignment to polygraphic sequences, whereby all of the 25 possible digraphic code sequences might be assigned key positions. However we prefer a more efficient keyboard where those polygraphic sequences generally having the highest stroke-saving function values are assigned to the 3L~7~35 - 17 - File 1358 P/2 available keys. Twenty eight of these sequences are listed in Table 1.
Referring to Fig. 6, a "standard" keyboard is identified therein by the numeral 10. As used herein, "standard" refers to a key arrangement wherein there are provided three horizontally æranged ranks of keys identified as upper rank 12, middle rank 14 and lower rank 16, wherein all character coding keys are located.
Generally there are some ten keys in each rcmk. In a Qwerty key arrangement, the twenty six letters of the Latin alphabet are assigne~
standard positions in these three ranks, the four remaining keys being assigned punctuation functions. Other keys to the left of the left hand keys and to the right of the right hand keys may also be present; these keys are generally assigned machine operation, punctuation or special symbol functions. These additional keys are not normally used for ch æacter coding purposes.
Still further keys may be present and are here shown as a rank 18 superior to upper rank 12, and are used for inputting the numerals 0 - 9 and/or sym~ols. This rank of keys is shown without specific designation appearing thereon so as to avoid any confusion in the ensuing description. Such numerical vc~lue input~Lng keys may ccmmonly be formed as a separate array in a computer input keyboard. m ere is no fixed limit to the numker of keys on a "standard" keyboard, but the maxImum number is usually about 50.
For touch typing of w~rds (which expression here includes Chinese type characters) the ~ord writing keys are to be considered as preferably con~isting of a left hand and a right hand sphere of operation, the keys being divided accordingly by an imagin ry 7~33Si - 18 - File 1358 P/2 CA

line 19. "Home" finger positions are located on middle rc~nk 14 and comprise the four keys of each hand commencing one key removed from line 19. In the Qwerty keykoard assignment such eight home keys are identified as "A,S,D,F" and l'J,K,L,;". Our standard Chinese character coding keyboard includes monographic keys for entering basic strokes, and polygraphic keys for entering sequences of basic strokes, selection of the sequences for inclusion on the key~oard being detenmned from the ranking of their SSF values as earlier discussed. The assignment of the exact location for each of the aforementionel keys, both monographic and polygraphic, is determuned by studies of the "mono-strike" and "dual-strike" frequencies of these keys. The "mono-strike" frequency of a key, whether a monographic or a polygraphic key, is the frequency of that key being hit in coding Chinese characters frcm a properly selected sample reflecting the average Chinese prose. The "dual-strike" frequency is the frequency of occurrence of a sequence of tw~ keys, whether monographic or polygraphic, in a similar sample as described.
To achieve maximum ergonomic efficiency in the keyboard ; 20 design, the work loads of both hands are distributed approximately equally, that is, the sum total of the ll no-strike" frequencies of all the keys fcr the left hand is about the same as the right.
Also, the work load, i.e. the sum of "mono-strikell frequencies, for each finger is distributed directly proportionally to the strength and tapping speed of that finyer. Furthermore, the keys with the highest "mono-strike"frequency, which in our studies include all five basic keys, are assigned to the most accessible keys, namely ' ;3 ~l~7Z335 ~ 19 - File 1358 P/2C~

the home keys. Lastly, pairs of keys that have high "dual-strike"
frequencies are arranged so that the t~o keys of each pair are assigned to opposite hands, such that the operation of successive keys by alternate hands be maximized.
We prefer to identify the keys in accordance with a code or code sequence corresponding to the basic stroke or sequence of basic strokes assigned to the keys, arabic numerals being preferred for this purpose in view of their easy recognition. It will be appreciated that the single digit n ~nber used to identify a basic stroke is arbitrary.
me various code elements assigned to the character coding keys of keyb~ard 10 in general accordance with the above principles may be seen in Fig. 6. It may be observed from a comparison of this Figure with Table 1 that the sequences "25"
and "123" have been assigned keyboard positions in preference to others of nominally superior SSF value; in practice it was found that such keys were preferred by an operator to other possible keys of approximately equal SSF value. Also in keyboard 10 an "end of code" designation is assigned to the space bar of the keyboard. Keyboard 10 used in a word processing sysLem for the typing of Chinese type characters to be described was subject to a perfor~ance test by an operator. A text lumited to 350 characters was selected, such characters being of varying degrees of complexity such as would be found in a text of wider scope. After several sessions totalling only 20 hours of practice, the operator attained a speed of 50 characters per minute. This is almost 4 times faster than the only reported speed for the operation of a Chinese keyboard device of 14 ~:~ 7Z33~

- 20 ~ File 1358 P/2CA

characters/munute. Inputting of the codin~ xequired an average of 3.76 coding key-strikes/character plus 1 strike for the space bar, to denote the end of the character code/ hence the keying speed ccmpares v~ry favourably to typing an English language text after an extended training period. It was found that the frequency of use of the polygraphic keys by the trained operator was quite close to that of the optimal result o~nputed.
The average n~nber of strokes/character in the selected text was about 7.29, hence each key strike represented about 1.94 strokes of a Chinese character.
An exemplary Chinese character word processing system including keyboard 10 is indicated schematically in Fig. 7. m e system further comprises a converter 20 which generates a bit pattern corresponding to the code or code sequences represented by a key struck on keyboard 10. Such bit pattern is preferably generated in simple binary code.
The hit pattern generated by the actuation of a monographic key can be assigned to 3 significant bits in our preferred arrangement wherein there are 5 stroke types plus one 1'stop" code to designate the end of a code word, and there will be 2 bit patterns left undefined in the 8 possible bit patterns.
Alternately, the code elements can be concatenated in an accumulator, and three such code elements can be assigned to one byte, taking up 216 (i.e. 63) of the 256 possible bit patterns, leaving 40 bit patterns for alphabets, numerals, or other coded instructions. Other alternatives are possible, such as starting each C~hinese code word at the beginnin~ of a byte, 7~
- 21 - File 1358 P/2 and assigning 3 code element to a byte, thus using up only 180 of the 256 possible bit patterns, leaving 76 for other codes.
Arrangements as such can be varied ancl are generally known in the art.
Ib take full advantage of the increasingly popular and econcmical 8-hit microprocessors and integrated m~mory circuits, uniforrn length code signals of not longer than 8 bits are required. A11 the above listed possible binary code signal patterns are of unifonn lengths. In a coding system of 6 code elements (5 stroke types plus a stop code), the average number of bits per code element is 2.25 (6 bit patterns _ x 3 bits).
8 possible bit patterns In the prior art system of 21 strokes, the average number of bits per code element is 3.4375 (22 bit patterns _ x 5 bits).
32 possible bit patterns mus, our invention provides a 1.53 (3.4375 . 2.25) times improvement in ~emory space requirement and processing speed when uniform length code signals are ~sed. This significant improvement in efficiency is achieved by a well designed stroke type classification which extracts the most distinguishing properties of the strokes in Chinese characters. Also, this efficiency is achieved by a relatively unifonn~ distribution of stroke type frequencies, ~hich is considered a more optimal code system according to inforrnation theory.
The generated bit patterns are routed via bus 22 to an input buffer 24 which provides temporary storage for editing purposes, and in the case of several key~oards sharing the sarne -~ 7;~3~
- 22 - File 1358 P~2CA

output information generator 30, provides storage until the latter is available.
The keyboard 10 also permits keyboard input of control instructions via the central control unit 28 to the various processing units of the word processing system, via control lines 26 for functions such as deletion or addition of certain bit patterns during editing and correctiGn, or control of information flGw to and from various units.
The binary machine words representing the code words for Chinese characters are converted in the output information generator 30 into the appropriate forms of information. The matching of the binary code word to the output information may be by one of the many well doc~ented algorithms, such as the "hashing" method, for example. The output information will also vary according to the purpose of information process system, and may comprise for example X-Y coordinates of the location of a character stored in tangible form on film,disc or tape, or may be writing instructions for producing hardcopies, such as instructions to matrix printers and ink jet printers, for example. In the case of keyboard controlled movable type printing system, the output information can be the control signals to select the type of a particular Chinese character; or in the case of telecomm~nication, the output information can be the corresponding signals in minim~m redundancy codes for telegraphic transmission. Ccmbinations of the various 25 information types may of course be utilizsd in the same Chinese information processing system.

33~

~ 23 - File 1358 P/2CA

The information frc~ the output infor~tion generator is received in an output buffer 34, which stores the information temporarily, permitting suitable material to be viewed on a video monitor 36, or light emitting diode arra~, or liquid crystal S display, for example. This is desirable in permitting ~he operator to examune the outputted material before any permanently printed copy is made, or the informa-tion transmitted, so as to allow text editing, correction and selection between characters having identical codes. Output buffer 34 further permits time storing of storage means, and production of a hardcopy by a printer 37. Storage means 38 is connected to both input buffer 24 and output buffer 34 whereby information in either code wor.d form or in output information form that had been earlier generated in accordance with the foregoing may be stored and later examlned and/or printed or transmitted.

Claims

- 24 - File 1358 P/2 THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE PROPERTY OR
PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:
1. Apparatus for encoding Chinese type characters in accordance with their basic stroke type and sequence comprising a keyboard having a plurality of up to eight inclusive of monographic keys, and a plurality of digraphic keys, key responsive means responsive to the actuation of each said monographic key for generating a code signal representative of a basic stroke and to the actuation of each said digraphic key for generating a code signal representative of a sequence of said basic strokes of two.
2. Apparatus in accordance with Claim 1, wherein the number of monographic keys is five.
3. Apparatus in accordance with Claim 1 or 2, wherein the number of digraphic keys is about fifteen.
4. Apparatus in accordance with Claim 1, further including a plurality of trigraphic keys, and wherein said key responsive means is responsive to the actuation of each said trigraphic key for generating a code signal representative of a sequence of said basic strokes of three.
5. Apparatus in accordance with Claim 1, 2 or 4, wherein the number of said trigraphic keys is about five.
6. Apparatus in accordance with Claim 1, 2 or 4, wherein the stroke sequence assigned to each said digraphic and trigraphic keys is selected from two and three stroke sequences generally having the highest stroke-saving function values.
7. Apparatus in accordance with Claim 1, 2 or 4, further including at least one tetragraphic key and at least one - 26 - File 1358 P/2 CA

two or more, at least a portion of said polygraphic keys being digraphic, means responsive to the actuation of said keys to generate a code signal representative of the stroke or sequence of strokes represented thereby.
13. A keyboard apparatus in accordance with Claim 12, wherein the polygraphic code sequence represented by said polygraphic keys are those code sequences selected from all possible code sequences generally having the highest stroke saving function values.
14. A keyboard in accordance with Claim 12, wherein said polygraphic keys include about fifteen digraphic keys, about five trigraphic keys, at least one tetragraphic key and at least one pentagraphic key.
15. A keyboard apparatus in accordance with Claim 12, 13 or 14, wherein the keys are assigned to positions on the keyboard as determined generally in accordance with their monostrike frequency and dual strike frequency.
16. A keyboard apparatus in accordance with Claim 12, 13 or 14, wherein said monographic keys are assigned home finger positions.
17. A keyboard apparatus in accordance with Claim 12, 13 or 14, including a space bar representative of an "end of character".
18. A keyboard apparatus in accordance with Claim 12, wherein the character coding keys are arranged substantially as illustrated in Figure 6, or the mirror image thereof.
19. Apparatus as defined in Claim 1, 2 or 4 wherein said code signal is a binary code signal.
20. Apparatus as defined in Claim 1, 2 or 4 further including output information means responsive to the input of said code signals, writing means responsive to said output information means.

- 25 - File 1358 P/2 pentagraphic key, and wherein said key responsive means respectively generates a code signal representative of a sequence of said basic strokes of four and five.
8. Apparatus as defined in Claim 1, 2 or 4, wherein said keys are assigned to positions locating in three horizontal ranks, each rank consisting of about 10 said keys.
9. Apparatus as defined in Claim 1, 2 or 4, wherein said keys are assigned to positions locating in three horizontal ranks, and wherein said monographic keys locate in home finger positions.
10. Apparatus as defined in Claim 1, 2 or 4, wherein each said character encoding key has an indicium thereon in the form of an arabic numeral, each said monographic key having a single digit numeral indicative of the basic stroke type represented by said key, and each said key having a graphicity of greater than one having a sequence of single digit numerals in accordance with the sequence of basic strokes represented by that key.
11 Apparatus as defined in Claim 11 2 or 4, including a space bar, and wherein said key responsive means is responsive to the actuation of said space bar to generate an end of code signal.
12. A keyboard for encoding Chinese type characters in accordance with five basic strokes and the sequence thereof comprising three horizontal ranks, each rank comprising about ten keys wherein all character coding keys locate, wherein five said keys locating in home finger positions are monographic and representative of a basic stroke, and about twenty three said keys are polygraphic, representative of a sequence of basic strikes of 21. Apparatus as defined in Claims 12, 13 or 14, wherein said code signal is a binary code signal.
22. Apparatus defined in Claim 12, 13 or 14, further including output information means responsive to said code signals, writing means responsive to said output information means.
23. Apparatus as defined in Claim 1, 2 or 4, further including output information mans including telegraphic signal means responsive to the input of said code signals.
24. Apparatus as defined in Claim 1, 2 or 4, further including output information means responsive to the input of said code signals wherein said output information means generates a minimum redundancy code signal.
25. Apparatus as defined in Claim 12, 13 or 14, further including output information means including telegraphic signal means responsive to the input of said code signals.
26. Apparatus as defined in Claim 12, 13 or 14, further including output information means responsive to the input of said code signal wherein said output information means generates a minimum redundancy code signal.
27. In a method for writing ideographic characters using a keyboard wherein the characters are encoded in accordance with system of up to 8 basic stoke elements and the sequence thereof the improvement wherein not less than about 25 percent of all possible digraphic sequences are inputtable from said keyboard using digraphic keys.
28. A method in accordance with Claim 27, wherein said system comprises 5 basic stroke elements and wherein not less than about 50 percent of all possible digraphic sequences are inputtable using digraphic keys, 29. A method in accordance with Claim 27 or 28, wherein at least a portion of all possible trigraphic sequences are inputtable using trigraphic keys.
30. A method in accordance with Claim 27 or 28, wherein said basic stroke elements are represented on said keys with single digit Arabic numerals, and wherein polygraphic sequences of said basic stroke elements are represented on said keys with corresponding sequences of said single digit Arabic numerals.
31. A method in accordance with Claim 27 or 28, including the step of outputting said encoded characters in minimum redundancy code for signal transmission.
CA000347137A 1980-03-06 1980-03-06 Means for encoding ideographic characters Expired CA1172335A (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CA000347137A CA1172335A (en) 1980-03-06 1980-03-06 Means for encoding ideographic characters
GB8106816A GB2071018B (en) 1980-03-06 1981-03-04 Method and apparatus for information processing
JP3137881A JPS56140436A (en) 1980-03-06 1981-03-06 Kanji encoding device
HK40086A HK40086A (en) 1980-03-06 1986-05-29 Means for encoding ideographic characters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA000347137A CA1172335A (en) 1980-03-06 1980-03-06 Means for encoding ideographic characters

Publications (1)

Publication Number Publication Date
CA1172335A true CA1172335A (en) 1984-08-07

Family

ID=4116423

Family Applications (1)

Application Number Title Priority Date Filing Date
CA000347137A Expired CA1172335A (en) 1980-03-06 1980-03-06 Means for encoding ideographic characters

Country Status (4)

Country Link
JP (1) JPS56140436A (en)
CA (1) CA1172335A (en)
GB (1) GB2071018B (en)
HK (1) HK40086A (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2128787A (en) * 1982-10-20 1984-05-02 Ejgil Otto Kaj Griese A method of printing or otherwise mechanically producing composite characters, such as Chinese characters
GB2161004A (en) * 1984-04-12 1986-01-02 Li Jin Kai System for encoding characters
GB8427281D0 (en) * 1984-10-29 1984-12-05 Wong K F Materialistic system
US5137383A (en) * 1985-12-26 1992-08-11 Wong Kam Fu Chinese and Roman alphabet keyboard arrangement
US5109352A (en) * 1988-08-09 1992-04-28 Dell Robert B O System for encoding a collection of ideographic characters
US5212769A (en) * 1989-02-23 1993-05-18 Pontech, Inc. Method and apparatus for encoding and decoding chinese characters
CN1144354A (en) * 1995-04-25 1997-03-05 齐兰发展股份有限公司 Enhanced character transcription system

Also Published As

Publication number Publication date
HK40086A (en) 1986-06-06
GB2071018A (en) 1981-09-16
GB2071018B (en) 1984-02-08
JPS56140436A (en) 1981-11-02

Similar Documents

Publication Publication Date Title
US4379288A (en) Means for encoding ideographic characters
US4679951A (en) Electronic keyboard system and method for reproducing selected symbolic language characters
US4498143A (en) Method of and apparatus for forming ideograms
US5109352A (en) System for encoding a collection of ideographic characters
US5187480A (en) Symbol definition apparatus
US4669901A (en) Keyboard device for inputting oriental characters by touch
US5119296A (en) Method and apparatus for inputting radical-encoded chinese characters
CA1162318A (en) Character set expansion
US4990903A (en) Method for storing Chinese character description information in a character generating apparatus
CA1279128C (en) Means and method for electronic coding of ideographic characters
US6094666A (en) Chinese character input scheme having ten symbol groupings of chinese characters in a recumbent or upright configuration
JPS6069726A (en) Keyboard composition and keyboard
US5378068A (en) Word processor for generating Chinese characters
WO1982000442A1 (en) Ideographic word selection system
CA1172335A (en) Means for encoding ideographic characters
EP0087871B1 (en) Interactive chinese typewriter
US5131766A (en) Method for encoding chinese alphabetic characters
Becker Typing Chinese, Japanese, and Korean
Chang An interactive system for Chinese character generation and retrieval
JPS6119045B2 (en)
WO2000043861A1 (en) Method and apparatus for chinese character text input
US5137383A (en) Chinese and Roman alphabet keyboard arrangement
CA1259412A (en) Ideographic character processing method
Stallings The morphology of Chinese characters: a survey of models and applications
CN1035083C (en) Word-oriented Chinese character typing device

Legal Events

Date Code Title Description
MKEX Expiry