Background technology
Eighties of last century eighties, a kind of stroke system character coding input technology scheme easy to learn, i.e. " 5-stroke input method " have been proposed by the inventor.This scheme is with five kinds of basic stroke horizontal strokes () of Chinese character, perpendicular (Shu), casts aside (Pie), presses down (Fu), folding (second), order respectively with digital code 1,2,3,4,5, correspond to 1,2,3,4,5 on totally five numerical keys of numeric keypad again, specification writing order according to Chinese character, only get the 1st, 2,3,4 and the last single of Chinese character and draw, Chinese character is encoded and imported.
This technical scheme once was in the known various Chinese character entering techniques, Chinese font code coded input method the most easy to learn.
The outstanding advantage of this technical scheme is easy to learn.The people of every meeting writing of Chinese characters just can learn to encode in general 5 minutes.Yet, this " easily learning ",, efficient many with the repeated code word is low to be cost.Use 5-stroke input method, in 3755 first-level Chinese characters, coding result shows that the word of repeated code does not have 464, in 6763 I and II Chinese characters, then has only 466.As seen, the repeated code situation of prior art is very serious.Repeated code makes this technology fail to obtain widespread use so far too much.
For a long time, numerous users are expecting the substance progress of this technology: when keeping its learnability, can the repetition rate of coding is had significantly reduces significantly, input efficiency is greatly improved? in today that digitizer is popularized day by day, this has become a difficult problem that needs to be resolved hurrily in the coding input research.
Summary of the invention
The objective of the invention is to propose a kind of Chinese character digital coding input technology scheme that has outstanding practical characteristics than " 5-stroke input method ".Both keep known technology characteristics easy to learn, lowered the repetition rate of coding again significantly, improved input efficiency, made it to become a kind of technical scheme that fine practicality is arranged.
Digital keyboard Chinese character coding and input method of the present invention uses 5 yards Chinese-character numeral encode method inputs of 7 keys, carry out unfixed-length coding with digital mutual-character according to shape, with numerical key input character information in any equipment that numerical key arranged of computer or other, represent 5 kinds of basic strokes one, Shu, Pie, Fu, the second of Chinese character respectively with 7 numerical keys 1234560, mouthful and end key, simultaneously according to the code taking rule of " preceding 4 ends 1, less than add 0 bond bundle for 5 yards " to simplified and traditional encode Chinese characters for computer and import computer.
The arrangement of the used numerical key of the present invention is 1 to represent horizontal and carry, the perpendicular and perpendicular left hook of 2 representatives, and 3 representatives are cast aside, and 4 representatives are pressed down and points, and 5 represents all strokes curved and hook, and 6 represent mouth, 0 end mark sign indicating number or end key during as 5 yards of less thaies.
The method of encode Chinese characters for computer input of the present invention is: 5 sign indicating numbers got at most in each Chinese character, the order of code fetch is to get the code of its 1st, 2,3,4 and art pen foremost or the code of " mouth " by the standard order of strokes of Chinese character, when 5 sign indicating numbers of individual character extracting code less than, can mend one 0 as end mark.
The present invention stipulates that for the code fetch of " mouth " write as " mouth " word person continuously by sequential write, it is encoded to 6; Although and see it is " mouth " from profile or word source angle, when the last horizontal pen of " mouth " is not to become word with the first stroke of mouth, second continuous writing, just do not replace yards 6, still draw code fetch with single by its stroke order.
The present invention can be the vocabulary code fetch, its method is: the vocabulary code fetch can be taked with boot symbol and not with a kind of mode in the boot symbol dual mode, when not adopting boot symbol, and when getting not enough regulation code length, fill the tail sign indicating number of getting this Chinese character without exception, till mending enough; When adopting boot symbol, can import in 7,8,9 three numerals earlier, or other any one and the key face symbol of encoding and not conflicting, as the vocabulary aiming symbol, after the input, import again according to the code taking rule of speech and compile the vocabulary coding of getting, can be preceding two strokes or preceding 1 stroke of each Chinese character in the vocabulary, comprise mouth, afterwards, import the end mark of same boot symbol again, and make on the vocabulary and shield as vocabulary code.
The present invention comprises isometric coding method for input to the input of Chinese character word, promptly is after the input boot symbol, imports preceding two sign indicating numbers of the 1st word, the 2nd word and the 1st sign indicating number of last 1 word, with identical totally 5 sign indicating numbers of maximum code length of individual character.
It is the vocabulary boot symbol that the present invention can be provided with the * key; Can also be provided with+,-key or other key are page turning key.
The present invention establishes the reset key and has multinomial function such as screen and input half-width space on end, the first term.
Embodiment
The numeric keypad that the inventive method is used can from top to bottom be arranged 1,2,3,4,5,6 six figure case as shown in Figure 1, also can from bottom to top arrange as shown in Figure 2; Arrange two row respectively, 0 key position can all be arranged on the next.Three key positions of all the other delegation of nine key positions on the numeric keypad can be arranged function in addition.
Application the present invention for the method for encode Chinese characters for computer input is: according to the sequential write of national regulation, 5 sign indicating numbers got at most in each Chinese character, code fetch is to get its 1st, 2,3,4 and the code of end pen or code of " mouth " foremost by the standard order of strokes of Chinese character in proper order, when 5 sign indicating numbers of individual character extracting code less than, can mend one " 0 " as an end mark that word code is so far.
When using the present invention, stipulate for the code fetch of " mouth " to be that write as " mouth " word person continuously by sequential write, it is encoded to 6 for encode Chinese characters for computer; Although and see it is " mouth " from profile or word source angle, the last horizontal pen of very moment is not to become word with the first stroke of mouth, second continuous writing, does not replace sign indicating number 6, still draws code fetch by its stroke order with single.
The invention is characterized in and to be the vocabulary code fetch, its method is: the vocabulary code fetch can be taked with boot symbol and not with a kind of mode in the boot symbol dual mode, when not adopting boot symbol, and when getting not enough regulation code length, fill the tail sign indicating number of getting not enough code word without exception, till mending enough; When adopting boot symbol, can import in 7,8,9 three numerals earlier, or other any one and the key face symbol of encoding and not conflicting, as the vocabulary aiming symbol, after the input, import again according to the code taking rule of speech and compile the vocabulary coding of getting, can be preceding two strokes or preceding 1 stroke of each Chinese character in the vocabulary, comprise mouth, import the end mark of same boot symbol again, and make on the vocabulary and shield as vocabulary code.
The present invention can also write the input code that code length equates for the vocabulary input of Chinese character.The isometric coding method for input of Chinese character word is after the input boot symbol, or without boot symbol, imports preceding two sign indicating numbers of the 1st word, the 2nd word and the 1st sign indicating number of last 1 word, and is identical with the maximum code length of individual character, amounts to 5 sign indicating numbers.
The difference aspect code element of the present invention and prior art is to have increased by one 6 key, represents mouth with 6 keys.This just makes the present invention and prior art that substantial difference has been arranged, and has produced tangible progressive effect, below 3 can prove.
1, enlarged space encoder.The space encoder of prior art is 5
5=3125, according to repetition rate of coding empirical Calculation formula, during to 6763 encodes Chinese characters for computer, the repetition rate of coding is with prior art:
And space encoder of the present invention is 6
5=7776, the corresponding repetition rate of coding is:
Obviously, under " easily learning " degree situation much at one, the repetition rate of coding has reduced by 1.5 times in theory.
The data of the actual experiment of encoding of inventor have confirmed above theoretical calculate.
| Primary word 3755 words | I and II word 6763 words |
With the present invention's coding, the number of words of no repeated code word | 710 words | 760 words |
With prior art C5 stroke coding method coding, the number of words of no repeated code word | 464 words | 466 words |
The present invention does not have the ratio that the repeated code word increases | 1.53 doubly, have a net increase of 53% | 1.63 doubly, have a net increase of 63% |
2, be that the most frequently used 1000 Chinese characters of 90% are encoded to the accumulative total occurrence frequency, result and prior art are compared, can show the present invention compared with prior art, have outstanding substantive distinguishing features.In 1000 words, the present invention not word of repeated code is 474, and prior art then is 377, and not repeated code word of the present invention is 1.26 times of prior art, not repeated code word net increase 26%.
Project | Prior art " C5 stroke coding " | The present invention's " 5 yards codings of 7 keys " |
The repeated code rank | Repeated code group number | Relate to number of words | Repeated code group number | Relate to number of words |
Repeated code number of words not | 0 | 377 | 0 | 474 |
2 word repeated codes | 99 | 198 | 108 | 216 |
3 word repeated codes | 55 | 165 | 43 | 129 |
4 word repeated codes | 15 | 60 | 10 | 40 |
5 word repeated codes | 10 | 50 | 7 | 35 |
6 word repeated codes | 4 | 24 | 5 | 30 |
7 word repeated codes | 5 | 35 | 1 | 7 |
8 word repeated codes | 1 | 8 | 1 | 8 |
9 word repeated codes | 1 | 9 | | |
10 word repeated codes | 1 | 10 | 1 | 10 |
11 word repeated codes | 2 | 22 | 2 | 22 |
12 word repeated codes | 1 | 12 | 1 | 12 |
13 word repeated codes | 1 | 13 | | |
17 word repeated codes | 1 | 17 | 1 | 17 |
Add up to | 196 | 1000 words | 180 | 1000 words |
3, the present invention is 506 results that Chinese character is encoded the most commonly used of 78% to the accumulative total occurrence frequency, more can show the outstanding substantive distinguishing features after the present invention compared with prior art.In 506 words, the present invention not number of words of repeated code is 204, and prior art has only 79.The present invention is in Chinese character the most commonly used, and no repeated code word number of words is 2.6 times of prior art, net increase 160% (seeing the following form).Because this 506 word is the most frequently used Chinese character, its accumulative total usage frequency is up to 78%, among Chinese characters in common use like this, the number of non-repeat code Chinese character, than prior art net increase 160%, can reduce the possibility of selecting the repeated code word when operating personnel import significantly, thereby can increase work efficiency significantly.In addition, from following table as can be seen, for " double code word ", the present invention also has taken great strides in one's progress than prior art.And the word of ten word repeated codes, the present invention then obviously reduces.
Project | The former C5 stroke coding of prior art | 5 yards codings of the present invention's 7 keys |
The repeated code rank | Repeated code group number | Relate to number of words | Repeated code group number | Relate to number of words |
Repeated code number of words not | 0 | 79 | 0 | 204 |
2 word repeated codes | 26 | 52 | 49 | 98 |
3 word repeated codes | 19 | 57 | 15 | 45 |
4 word repeated codes | 9 | 36 | 8 | 32 |
5 word repeated codes | 4 | 20 | 2 | 10 |
6 word repeated codes | 2 | 12 | 3 | 18 |
7 word repeated codes | 3 | 21 | 5 | 35 |
8 word repeated codes | 1 | 8 | 1 | 8 |
9 word repeated codes | | | 1 | 9 |
11 word repeated codes | 2 | 22 | | |
12 word repeated codes | 1 | 12 | | |
13 word repeated codes | 1 | 13 | 1 | 13 |
14 word repeated codes | 1 | 14 | | |
15 word repeated codes | 2 | 30 | | |
17 word repeated codes | 2 | 34 | | |
19 word repeated codes | 1 | 19 | | |
34 word repeated codes | | | 1 | 34 |
37 word repeated codes | 1 | 37 | | |
40 word repeated codes | 1 | 40 | | |
| | | | |
Add up to | 76 | 506 words | 86 | 506 words |
Just can illustrate by above three, the present invention compared with prior art, though just only increased a key, only increased a key unit " mouth ", but can be from two aspects of theory and practice, produce very significantly technical progress effect, its practical value is expected to obtain very significantly improving.
So it produces so significantly progressive effect, the one,, the more important thing is that the present invention is based on the achievement of theoretical research, preferred code element " mouth " is an addressable part as key unit, rather than has selected any other parts because increased a key!
Why code element or parts beyond preferred " mouth " drawn as single about the present invention are used for encoding, and the inventor must be in this in addition explanation fully.
If " adding a key " can enlarge space encoder, at total number of word one regularly, make coding become between mutually " sparse " some, also promptly make " repetition rate of coding " to reduce, experienced researcher in the industry " is easy to " words expecting, accomplish, so, for selecting " which parts ", just not that those skilled in the art institute can be just confirmable without thinking as code symbols.Because the parts of this " by choosing especially " must meet following condition:
1, these special " parts " in whole Hanzi components, should be that occurrence frequency is the highest or inferior high, because only in this way, " sign indicating number " that this is newly-increased just can be fully utilized, could often occur, could disperse by prior art was the coding of repeated code originally.
2, these selected " parts ", parts that should all usually occur at Hanzi structure " on each position "! Parts only in this way, in case give a new coding, could the effect of the discrete coding of performance fully on each coding site.
For this reason, the inventor has carried out a large amount of and hard to tackle data statisticss and scientific analysis on the basis of the database of the root coding of the Five-stroke Method, has put out except that five kinds of singles are drawn frequency statistics table shown below finally in order.This table has demonstrated 6763 Chinese characters of international I and II after radical splits, number of times and cumulative number that modal 10 parts occur on each coding site.
In 6763 one Chinese characters of level 2 of GB, the frequency statistics table that " high-frequency unit " occurs:
Sequence number | Parts | The 1st code position | The 2nd code position | The 3rd code position | The 4th code position | Cumulative number |
1 | Mouthful | ?397 | ?227 | ?345 | ?224 | 1193 |
2 | Lv | ?371 | ?115 | ?41 | ?26 | 553 |
3 | The people | ?61 | ?186 | ?164 | ?108 | 519 |
4 | Day | ?99 | ?162 | ?167 | ?86 | 514 |
5 | Wood | ?261 | ?84 | ?82 | ?74 | 501 |
6 | Soil | ?155 | ?122 | ?112 | ?92 | 481 |
7 | Rui | ?366 | ?31 | ?1 | ?0 | 398 |
8 | Ren | ?242 | ?91 | ?54 | ?0 | 387 |
9 | Month | ?134 | ?59 | ?106 | ?47 | 346 |
10 | Rolling | ?280 | ?31 | ?8 | ?0 | 319 |
From last table as seen, in the whole parts of Chinese character, the parts that occurrence frequency is the highest be " mouth " and, this " mouth " is the parts that all often occur on each coding site.People are not difficult to find out from this table: if from whole parts of Chinese character, select, the parts of " personal value is the highest " the most useful, the most active, statistics shows that " mouth " will be first-selected parts.
For this reason, the present invention has just determined preferably the newly-increased code element of " mouth " conduct " 5 yards on 7 keys " compiling method.
Certainly, anyone also can change other parts into " mouth ", as change into people, soil, wood,
Deng.But inventor's counting statistics shows convincingly, and except that " mouth ", any one other kind design all can not reach " mouth " is put into practical function on 6 keys.
Actual coding statistics and performance of the present invention prove that all the present invention selects the scientific evidence of " mouth " to set up, and various calculating are correct, and under the situation of same bond number, same code length, the present invention is a best-of-breed technology scheme.
When the present invention implemented on communication apparatus such as mobile phone, it was the vocabulary boot symbol that the * key can also be set; Be provided with+,-key is a page turning key."/" is set adds the numerical key of keypad as selection key of duplicat codes.If the reset key has screen and input half-width space function on end, the first term.
Use the present invention, both can be to 6763 Chinese characters of GB and 27533 encodes Chinese characters for computer of GB, also can be applied to the letter of bigger word collection, numerous, port, Japan and Korea S.'s encode Chinese characters for computer, and be applied in the software systems and hardware system of relevant device.In the time of in being applied to hardware environment, coded data of the present invention and search program can also be fired in chip, be become general chip.
In view of theoretical research of the present invention and actual design all are better than prior art significantly, both easy to learn, outstanding substance progress is arranged, so the present invention is expected to be widely used on the software and hardware product of all popular styles such as PC, mobile phone, cashing machine, PDA, electronic dictionary, set-top box, popular digitizing Chinese Computer product, network technology product again.