CN1343953A - In-line handwritten Chinese character recognition method and handwriting input method - Google Patents

In-line handwritten Chinese character recognition method and handwriting input method Download PDF

Info

Publication number
CN1343953A
CN1343953A CN99111467.1A CN99111467A CN1343953A CN 1343953 A CN1343953 A CN 1343953A CN 99111467 A CN99111467 A CN 99111467A CN 1343953 A CN1343953 A CN 1343953A
Authority
CN
China
Prior art keywords
stroke
chinese character
numeral
code
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN99111467.1A
Other languages
Chinese (zh)
Inventor
王颂平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN99111467.1A priority Critical patent/CN1343953A/en
Publication of CN1343953A publication Critical patent/CN1343953A/en
Pending legal-status Critical Current

Links

Images

Abstract

A novel in-line recognition method and input method of handwritten Chinese characters are disclosed. It features that the strokes of the Chinese characters to be input are converted to digits, and a built-in digit code table is used for searching the strokes. Its advantages are 100% of recognition rate, omitting written strokes by 25-40%, and no need of complex software to save memory space.

Description

A kind of brand-new on-line handwritten Chinese character recognition methods and hand-written inputting method
Chinese character hand-written input is a kind of method that is based upon on the Chinese Character Recognition basis, and it is familiar with the people automatically by computing machine and writes on Chinese character on paper or the medium.Its great advantage is that input makes things convenient for nature, the user need not learn again, simultaneously also for not wishing that the confidential data input computing machine that many people get involved provides convenience, so handwriting input is an indispensable character inputting device in the office automation, it has practical significance for enlarging computing machine in the application of national economy all departments.
Chinese Character Recognition has two kinds of online and off lines, because the complexity of Chinese character itself, makes the difficulty increase of machine recognition.The handwriting input utilization be the on-line handwritten Chinese character recognition technology, it is in the Chinese Character Recognition the simplest one type.
External character recognition machine just drops into actual the use as far back as the fifties.To the seventies, technology is quite ripe.Its printing word input speed reaches the thousands of characters of per second, reject rate less than ten thousand/, misclassification rate less than 100,000/.Moreover, also carried out number of research projects for Chinese Character Recognition abroad.
Japan has just finished the recognition device of printing Chinese character in " picture information disposal system " as far back as 1977.This device recognition speed is per second 100 words, and discrimination reaches 99.9%; 1984, Japan developed many bodies recognition method of printed Chinese character recognition device of discerning 2300 words, and discrimination is 99.88%, and recognition speed has been represented the highest level of printing Chinese character identification at that time greater than per second 100 words.
China begins to carry out the research of Chinese Character Recognition at the seventies, is mainly used in the character recognition in the postal letter letter sorting at that time, and the identification of the English of computing machine input usefulness, numeral, symbol.After serve university, research institute begins block letter and Handwritten Chinese Character Recognition are studied, and has obtained some achievements.Wherein on-line handwritten Chinese character identification, promptly our progress of usually said handwriting pad is subjected to people's attention most.Before more than ten years, the military at first develops the pressure sensitive graphic tablet, with common pen or ballpoint pen, can realize on-line handwritten Chinese character identification on PC, and its discrimination reaches 98%.Thereafter Shanghai, research institute of the ministry of electronics industry and some universities all develop various handwriting input devices, have realized the online identification of handwritten Chinese character on generic graphic tablet and PC.
Recent years, the research and development of the online recognition device of Chinese character both at home and abroad are very active, and various handwriting pads emerge one after another and contend for markets.People see that along with the development of science and technology and office automation, it is practical that the online recognition device of Chinese character has moved towards.
Present handwriting pad before decreases on price, and more stable reliable on the usability, discrimination also increases.But still there is no small problem in it: the firstth, and input speed; The secondth, discrimination; The 3rd is price.
In the view of many researchers, make the handwriting pad raising speed cannot.They think that thing is a sharp fraud always, make things convenient for nature with hand-written compared with keyboard input, how can require it fast simultaneously! Therefore, more expert and manufacturer are concerned about is to improve discrimination and reduce cost, performance and price than in seek new market: or go the production low price, to the tighter low grade products of user writing restriction; Remove to develop the expensive goods that to discern unrestricted various Free Writing bodies.Brainstrust is more favored in the latter, and used font and speed was write when they expected that handwriting pad can accomplish that resembling the people takes notes, and can discern the running hand handwritten Chinese character, but so, will improve greatly for the requirement of machine recognition, and cost also can correspondingly increase.
In a word, because present situation, the market orientation of handwriting pad at present is just little at input quantity toward contact, to the less demanding colony of input speed.From we have also seen the limitation of handwriting pad and the problem of existence here, can accomplish both fast and sound also to save money, this is the problem that the online identification of Chinese character awaits solving, and also is the difficult problem that domestic and international expert studies, captures.
Purpose of the present invention is exactly in order to overcome the defective of above-mentioned prior art, for the online identification of Chinese character provides a kind of brand-new, very simple, the Chinese characters recognition method and the input method that save trouble.Adopt this method to solve and it is believed that the input speed problem that handwriting pad can not solve, realization both fast and sound, cost is low and the user is required also low comprehensive excellent results.
The online identification Free Writing of the Chinese character body problem that the relevant expert proposes, final purpose also are in order to improve the input speed of handwriting pad, and the starting point of this thinking is to realize by the performance of strengthening computer software.Even but this scheme is feasible, the cost of development and production also can increase greatly, and discrimination to reach what index yet be question mark.The present invention can solve the speed issue and the identification problem of handwriting pad from another diverse angle, can also reduce production costs greatly simultaneously.
What can realize above-mentioned purpose is that elder generation is with the one stroke divide into several classes of Chinese character according to new on-line handwritten Chinese character recognition methods and hand-written inputting method of the present invention.What need be illustrated is, the classification " standard " according to different can be divided into five classes, six classes or ten classes, eight classes do not wait, but these schemes all are identical generally.
Below in conjunction with description of drawings first embodiment of the present invention, it is divided into six big classes with one stroke.
Fig. 1 be according to the present invention with Chinese character input single stroke be divided into six classes and with 6 corresponding synoptic diagram of numeral.
Fig. 2 is traditional " on-line handwritten Chinese character identification " schematic flow sheet.
Fig. 3 is a schematic flow sheet of the present invention.
Fig. 4 is operation interface synoptic diagram of the present invention and relevant internal code table.
Please see Figure 1 below, after one stroke and 1,2,3,4,5,6 these six arabic numeral is corresponding, we can utilize it that all Chinese characters are encoded.Method is with the code of these numerals as stroke, by sequential write word is encoded.Suppose that establishing the word that stroke number is maximum in whole Chinese characters is k, then can will be divided into two big classes from the whole numeric strings that form the pairing numeral of each stroke of 1 to k indication: all have the corresponding with it numeric string of Chinese character to be referred to as significant figure, do not have Chinese character corresponding with it be called nonsignificant digit.
Fig. 2 is " on-line handwritten Chinese character identification " schematic flow sheet, can see the basic process of on-line handwritten Chinese character identification from this figure.First step writing stroke; Second step was the identification stroke, according to the coordinate points of graphic tablet output, discerned basic strokes according to direction and stroke length that stroke is write, was changed detecting break again by the basic strokes direction, discerned compound stroke with the variation of break fore-and-aft direction.Simultaneously, with elongation with shorten a certain direction sequence length and find optimum matching between the both direction sequence, improve the degree of accuracy of identification with the method for iterative computation; The 3rd step was the whole word of identification, according to stroke and then differentiate unknown Chinese character.Form character features by the stroke that has identified, each grapholect feature of going again to search in the dictionary is compared.If coupling, promptly enter the differentiation output of the 4th step; If do not match, promptly there is not the feature of this literal in the dictionary, that will calculate the distance between each grapholect feature in input characters feature and the dictionary, discern according to the principle that distance is minimum.When differentiating a plurality of Chinese character of demonstration, the word that manually selected back is chosen can reenter sample set, and the step of back is the process that allows machine learning and expand dictionary.
Though on-line handwritten Chinese character identification has been the most a kind of in the Chinese Character Recognition, but we still can feel the operation process that it is complicated, machine also will be discerned compound stroke after having discerned basic strokes, also will discern whole word later on having discerned compound stroke.Each whole word all has many increments in dictionary, and the increment collection is cumulative also in continuous expansion.
Everything not only will account for the internal memory of machine more, but also can increase the complexity of software greatly, also will pay another cost simultaneously, and that is exactly the speed that influences machine run! Because the complex calculations process must be consuming time.Just think, under such prerequisite, if allow the hand-written running hand of the various people of machine learning again, the result again will be how about!
Therefrom we also can understand, why the handwriting pad price can not be fallen all the time, and the reason that quality can not improve all the time!
The most crucial content of the present invention has proposed this problem exactly, and has solved this problem.Its process is as follows:
At first, suppose that we carry with numeral 1 a corresponding horizontal Dian
Figure A9911146700041
Represent perpendicular Shu with numeral 2; Pie is cast aside in numeral 3 expressions; Dian point Dian is pressed down in numeral 4 expressions; Numeral 5 all clockwise folding stroke: of expression, , , Off,
Figure A9911146700043
,,; Numeral 6 all anticlockwise folding strokes of expression: , ∠, second,,.Multiple folding pen is as the criterion with the tail folding, determines that it is the still contrary folding of fold.We just can obtain a code table of using numeral to give encode Chinese characters for computer like this, but this code table only is a blank, must process it again.
Method is to determine the length of sign indicating number according to the height of Hanzi frequency count, and so just had code table to cut out first principle that shortens with code length: promptly the usage frequency of our regulation word and code length are inversely proportional to, and the code word length that frequency is high more is short more.Code length is since 1 yard, and the frequency code length that successively decreases increases progressively.
But it is limited only adopting this a kind of method abridged number of strokes, and the present invention proposes one and antipodal viewpoint of traditional method and way, and that is exactly artificially, has purpose that repeated code is set regularly, and this also is one of distinguished design of the present invention.
Each people who does encode Chinese characters for computer headache repeated code very all in the past, no matter do keyboard input or handwriting input, all will eliminate repeated code must desire to eliminate as the difficult point of capturing, and classify the height of the repetition rate of coding as estimate a scheme quality standard, this exactly machinery and rigid thought makes people be absorbed in predicament.If we change a thinking, not to eliminate it stiffly, but utilize it dexterously that instead problem has been readily solved.
For handwriting pad, the incoherent repeated code word of a pile when writing lack of standardizationly, occurs with it and upset sight line, not as allowing them arrange the candidate regularly, along this dialectical thinking, can obtain another results, that is exactly to save the stroke number of writing, notice, and this point is the difficult problem that handwriting pad needs to be resolved hurrily just.
How being provided with and working out repeated code artificially for handwriting pad, this is the problem that need take seriously.
For a code table, be n if establish the pairing Chinese total number of character (numeral or letter), the covet value of n of past people is constantly equal to 1, regards this as a kind of optimal state, is not in fact.For handwriting pad, more than or equal to 1, be very significant in the value of n smaller or equal to selecting a rational codomain in 10 such scopes, we can select n=6 or n=7, and it is the foundation that code length is shortened once more.In a word, this is second principle that we determine to cut out code table.That is to say, we will be by being provided with repeated code artificially, and another adopts the principle of priority of high frequency, the pairing Chinese character of significant figure in montage again, the arrangement code table, make the short significant figure of code length comparatively speaking under possible prerequisite, as far as possible corresponding to a plurality of Chinese characters.
We see, when the maximal value of setting n is big, then code length is short, and the repeated code number of words is also many more; Otherwise the maximal value of setting n is more little, and then code length is long more, and the repeated code number of words is few more.Equal 1 if establish the maximal value of n, then the omission of writing stroke number is just very limited; If select the maximal value of n to surpass 10, then the speed of eyes search will obviously reduce, and it is tired that the people also feels easily, is inversely proportional between them.Therefore, we equal the maximal value of n to 10 and are considered as the limit that the repeated code word select is selected.As seen, the setting of n is to reduce the repeated code number and saving the equilibrium point of seeking a best between the stroke number.
Embodiment 1, if the value of n more than or equal to 1 smaller or equal to 7, promptly significant figure are at most corresponding to 7 repeated code words, this is a comparatively desirable scheme, as long as strokes just can be come out in advance before it write everyday character, both control code length effectively, improved writing speed, and be unlikely to make eyes to feel tired again.
Code table through processing like this is comparatively desirable.Next, we place the machine internal memory with code table, make it to become hand script Chinese input equipment recognition data dictionary.The stroke of writing is directly changed into corresponding numeral after identification, and forms new significant figure according to the increase of writing stroke.Diverse with traditional concept is that the present invention does not walk to put in order the approach of word identification, but directly searches, retrieves our said that code table just now.
Please see Figure 3: schematic flow sheet of the present invention.After the input stroke, entered for second step: the identification stroke.Traditional way is to have defined many compound strokes later on again having defined simple stroke, and then the whole word of identification.The present invention just goes on foot second and has showed its feature, only need machine to discern the combination of the most basic stroke of six classes and a small amount of stroke successively, and the sample set of conduct comparison also is stroke--digital corresponding tables and limited combination of strokes.According to stroke--digital corresponding tables, the timely discrimination input pairing numeral of stroke also converts numeral to.
Entered for the 3rd step then: search code table.Traditional way, the 3rd step was the whole word of identification.In Fig. 2, for it complicated we have gained some understanding.And the present invention has save this procedure, and identifying is become extremely simple.Because allow machine search code table automatically, searching numeral with numeral is the easiest realization for computing machine.This is the present invention and traditional handwriting pad the most different place in design.
By the way, about code table, embodiment 1 usefulness 6 numerals, its code table is that the stroke-digital corresponding tables according to Fig. 1 generates.Embodiment 2 usefulness 5 numerals.The code table of two kinds of schemes all is according to Hanzi frequency count, and the orderly arrangement of 6763 word segmentations of GB is formed.
In the 4th step, differentiate.Our design whenever increases and writes one from the first stroke of a Chinese character, all shows different Chinese characters, and selected as output and switching that continue to import with word.
Please see Figure shown in 4:
When input the first stroke fold , 7 high frequency words of candidate have just appearred: " add also little to people's ".If selected, promptly go up editing area with this word behind the pen point; If do not select, continue to write next record, new candidate will appear, until end of a period.Take " people " word, will write 7 originally, writing illegibly many incoherent repeated codes also to occur, and adds that now selection only writes 2.And do not produce ambiguity, the user can feel not only laborsaving but also improve speed.
The benefit of above-mentioned way is the difficulty that greatly reduces the handwriting pad software development.Because the step of second among Fig. 3, stroke identification, only need identification the most basic five or six classes, maximum ten class strokes and a spot of combination of strokes, convert numeral after the identification immediately to, remove to search code table by numeral again, judge the output Chinese character, this process compares to traditional disposal route and improves greatly and simplified.Prior art is often distinguished right form wrong accurately for the hand script Chinese input equipment one stroke, so the recognition correct rate of this method just needn't be tested and can guarantee to reach 100%.
Need point out that also in Fig. 3 process flow diagram of the present invention, though also indicate " study " and " sample set " picture frame, it is almost completely different with the implication that occurs among Fig. 2.Study among Fig. 2 is the pattern expression-form that extracts according to a plurality of unknown increments, and structure or substantial, modification dictionary improve constantly system recognition rate with this automatically.The literal of all identification after machine learning all will enter dictionary as increment, waits until the increment that increases a comparison when unknown Chinese character mates.So along with the increase of writing number of words, the set of new increment can constantly be expanded, dictionary database also can continue to increase, thus must in soft, hardware, reserve enough spaces during design, otherwise will have influence on the performance of machine.In this sense, the discrimination of traditional on-line handwritten Chinese character depends on the size of headspace, and this raising performance that also is brainstrust is foretold will strengthen the reason of cost.
And " study " shown in Fig. 3 of the present invention, " sample set ", because it does not round word identification fully, just get the simplest stroke and discern and compare, so even increase the increment of stroke and combination of strokes, its quantity also is extremely limited, can accomplish during design to ignore.Like this, cost descends, and machine performance but can be not influenced.
For the sample set of a standard, the learning functionality of machine is a kind of intelligent fault-tolerant in fact.Traditional way is continuous exptended sample collection, and is just constantly fault-tolerant, and the very big and effect of its cost is not necessarily desirable.Thinking of the present invention is not only and is utilized the corresponding Chinese character of significant figure output, can also utilize nonsignificant digit to come error correction simultaneously, as differentiating the removing property foundation of not knowing stroke and unknown Chinese character.Inaccurate when a certain stroke is differentiated, such as not knowing it is to cast aside or horizontal stroke, or during perpendicular still lifting-hook, nonsignificant digit just can be helped us with negative sure inference form and be made differentiation.Coming with nonsignificant digit negates some non-existent combination of strokes, all is very economical from the design of soft, hardware.
Logically, fault-tolerant is a kind of way that enlarges extension, and extension enlarges intension must dwindle, and means that the character of things thickens; Otherwise error correction is a kind of way of dwindling extension, and extension is dwindled intension must be increased, and the character of things is clearer and more definite.Fault-tolerant, as the term suggests be to allow it that bigger pardon is arranged.To pay a price and contain fuzzy, uncertain thing, for the configuration of machine, sacrifice be the space; For the performance of machine, sacrifice be the time.Utilize the error correction of nonsignificant digit not need to pay any cost fully conversely speaking.
Utilize digital code table error correction should become new ideas in the Chinese Character Recognition field.In hand script Chinese input equipment identification, its effect is fairly obvious.In the off line Chinese Character Recognition, also should cause people's attention.
At last, also to point out emphatically,, will determine according to the usage frequency of word except mixing, and the arrangement of every capable repeated code word also require to follow the principle of priority of high frequency at the branch of code length for designed, the special code table of on-line handwritten Chinese character identification.We provide one one convenient as far as possible and perfect " dictionary " must for machine and user, and it has been arranged, and the user is as long as write strokes like a cork, and desired word just can manifest; It has been arranged, and machine can be removed complicated incomparable calculating and tediously long retrieval from, and speed can improve greatly.
With regard to prior art, hand script Chinese input equipment is for the simple stroke of identification, and its discrimination is easy to reach 100%, and error is usually from the identification to whole word.Fortunately it is not followed the beaten track in the present invention, follow go in others' back around one original just not should around circle--" whole word identification ", but utilize degree of accuracy can reach hundred-percent digitizing code table dexterously, utilize the comparison and the retrieval of digital logarithm word, make complicated originally thing just become extremely easy quickly.The positive effect that it brings is, the discrimination of machine has promoted, and travelling speed has been accelerated, and writing stroke has reduced, thereby allows the user experience a kind of service both fast and sound.
Be to adopt six digitally coded code tables to extract below according to the present invention:
5 add also little to people's
51 511 彐 5111 5112 5113 5114 5115 5116 512 513 5131 51311 51312 5132 51324 51325 513251 5133 5134 51342 51343 5135 515 5151 5152 5153 5154 5155 5156 516 5161 5165 521 5211 5212 5213 52132 52134 52136 5214 5216 522 5221 5225 523 5231 5232 5233 5234 5235 524 5241 525 53 532 5325 534 535 54 541 5412 5415 542 543 544 545 5454 5455 54553 55 551 5511 55112 5512 5513 55132 55135 55136 5514 5515 5516 552 553 554 555 56
Statistics shows, for 337 identical words, if use original handwriting pad, need writing stroke to amount to 3102,9.2 strokes write in average every word, and adopt the present invention to amount to several 1185 of writing stroke, 3.5 strokes only write in average every word, omitting stroke number is 61.8% of sum, if repeated code is selected to be converted into to write one, omits stroke number so and is 51% of sum.This shows that speed can be fast again as adopting present disclosure to write above-mentioned Chinese character.Discrimination can reach 100%.For the abnormal stroke that may occur, can be used in the way that increases tolerant code on the code table and give to solve.
In sum, can see that the method that the present invention says by the front has solved the variety of problems of handwriting pad existence in the past effectively.Design of the present invention has broken through the circle of existing hand script Chinese input equipment input, has walked out another new road.Its distinctive feature has 4 points at least: the one, and do not adopt identification to put in order the method for word, replace dictionary or sample set for the code table of encode Chinese characters for computer with numeral, change hand script Chinese input equipment fully and discerned original complicated looks.The 2nd, with simple numerical the retrieval of numeral has been replaced the complicated way of removing to compare sample with unknown Chinese character.Three have upgraded Chinese Character Recognition must be based upon the traditional concept of mating on the basis.The 4th, broken handwriting input and can't improve the old of speed and see.
Chinese Character Recognition is a comprehensive technology, it need use professional knowledges such as pattern-recognition, image processing, formal language, fuzzy mathematics, combinatorics, also needs simultaneously to be related to linguistics, philology, statistics and psychology, biology and logical general knowledge.For the expert who is well versed in computing power, they are main forces of tackling key problem, but the means that they deal with problems often bias toward the former, and ignore the utilization of general knowledge.For resembling the such cross discipline of Chinese Character Recognition, frontier branch of science, the utilization of a kind of knowledge in back should not lack really but in fact.
About second embodiment of the present invention.It does not have difference with the first string on the whole, and is only different with the former on the corresponding relation of initial numeral and one stroke.Second scheme is divided into five classes with one stroke.
Be below according to the present invention with Chinese character input single stroke be divided into five classes and with 5 corresponding tables of numeral:
Five class Chinese character input single strokes are drawn and digital corresponding tables
Figure A9911146700091
The code table of code table of Sheng Chenging and embodiment 1 has some slight differences thus, must make corresponding modify at the relevant place of software design, but overall framework is constant.Compare the difference of embodiment 1 and 2, scheme 2 standard on machine recognition is ordered than broad, but the stroke number of writing can increase, and input speed can be slow.Suppose that the stroke number of categories is constant, the numeral and the corresponding relation of stroke are changed, for example change expression into 1 and cast aside that 3 to change expression into horizontal or the like, this way is also meaningless.If but the total number of categories of stroke has had change, situation is just different.Number of categories reduces, and can cause the increase of hand-written stroke number; Number of categories increases, and hand-written stroke number is reduced.
Hereinafter be according to the present invention with Chinese character input single stroke be divided into ten classes and with 10 corresponding tables of numeral:
Ten class Chinese character input single strokes are drawn and digital corresponding tables
Figure A9911146700092
Annotate: multiple folding pen is as the criterion with the tail folding, determines that it is the still contrary folding of fold.* number implication is: " fold " deducts the set of numeral 7,8,9,0 pairing stroke for whole clockwise folding strokes.
The code table that generates by this classification also is an enforceable preferably scheme.It is than embodiment more than 1 numeral 7,8,9,0, go corresponding to original 4 clockwise folding pens in numeral 5 with these 4 numerals, the harmony that the effect of doing like this makes its gust of significant figure attitude of 20% distribute increases, and then the input stroke number can reduce, input speed can improve.But, say that correspondingly this scheme also can improve the requirement of software recognition capability thereupon, such as, it will distinguish and the such two kinds of different folding strokes of Off, and these are negligible in embodiment 1.So which kind of stroke criteria for classification we choose as embodiment, also will consider the performance of software and the material cost of hardware, seek a best of breed.

Claims (9)

1, a kind of brand-new on-line handwritten Chinese character recognition methods and hand-written inputting method, it is characterized in that: the one stroke of Chinese character is divided into some big classes by certain criteria for classification, and each class is corresponding to arabic numeral, with the code of these numerals, press sequential write and give encode Chinese characters for computer as stroke.If the word that stroke number is maximum is k, to be divided into two classes to the whole numeric strings that form the pairing numeral of each stroke of k indication from 1 (also can be 0 under certain situation): all have the corresponding with it numeric string of Chinese character to be referred to as significant figure, do not have Chinese character corresponding with it be called nonsignificant digit, put into the on-line handwritten Chinese character identification software after this code table is cut out.
2,, it is characterized in that according to claim 1: with numeral 1 corresponding horizontal-, carry
Figure A9911146700021
Represent perpendicular Shu with numeral 2; Pie is cast aside in numeral 3 expressions; , some Dian are pressed down in numeral 4 expressions; Numeral 5 all clockwise folding stroke: of expression, , Off, ,,, the numeral 6 the expression all anticlockwise folding strokes:
Figure A9911146700024
, , ∠, second,,.Multiple folding pen is as the criterion with the tail folding, determines that it is the still contrary folding of fold.The same, fold represents that with 56 contrary foldings are represented with 6.
3, according to claim 1, it is characterized in that: represent cross break with numeral 7; The horizontal left-falling stroke Off of numeral 8 expressions; Numeral 9 expression easement hooks
Figure A9911146700025
Numeral 0 expression fold hook
Figure A9911146700026
Putting before this, the fold described in the claim 2 is meant that whole clockwise folding strokes deduct the set of 7,8,9,0 pairing stroke.
4, according to claim 1, it is characterized in that: code table is cut out first principle that shortens with code length and is, determines the length of sign indicating number according to the height of Hanzi frequency count, usage frequency and code length are inversely proportional to, the code word length that frequency is high more is short more, and code length is since 1 yard, and the usage frequency of the word code length that successively decreases increases progressively.
5, according to claim 1, it is characterized in that: code table is cut out second principle that shortens with code length and is, by repeated code is set artificially, adopt the principle of priority of high frequency once more, again the pairing Chinese character of significant figure in montage, the arrangement code table, make the short significant figure of code length comparatively speaking under possible prerequisite, as far as possible corresponding to a plurality of Chinese characters.If the Chinese character number of a significant figure correspondence mostly is n most, n is then arranged more than or equal to 1 smaller or equal to 10, rationally determine the maximal value of n in this scope, the foundation that is shortened once more as code length.
6, a kind of brand-new on-line handwritten Chinese character recognition methods and hand-written inputting method, it is characterized in that: will insert in the machine internal memory by the formed code table of aforesaid right requirement, make it to become hand script Chinese input equipment recognition data dictionary, after identification, be directly changed into corresponding numeral according to the rules in stroke of importing on the handwriting pad and combination of strokes, and form new significant figure with the increase of writing stroke.
7, according to claim 6, it is characterized in that: the significant figure that obtain from writing stroke directly go to search, the retrieval code table, export Chinese character, and do not walk the approach of the whole word of identification.
8,, it is characterized in that according to claim 6: from the first stroke of a Chinese character, whenever increase and write one, all show different Chinese characters, and with the selection of repeated code word as output with continue the switching of input.
9, according to claim 1,5,6, it is characterized in that: the numeral that obtains after with writing stroke goes to search, during the retrieval code table, and definition n is a positive integer, when n is significant figure more than or equal to 1 the time, can show or export Chinese character; Be nonsignificant digit during n=0, nonsignificant digit can be used as the removing property foundation that stroke or unknown Chinese character are not known in differentiation.
CN99111467.1A 1999-08-16 1999-08-16 In-line handwritten Chinese character recognition method and handwriting input method Pending CN1343953A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN99111467.1A CN1343953A (en) 1999-08-16 1999-08-16 In-line handwritten Chinese character recognition method and handwriting input method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN99111467.1A CN1343953A (en) 1999-08-16 1999-08-16 In-line handwritten Chinese character recognition method and handwriting input method

Publications (1)

Publication Number Publication Date
CN1343953A true CN1343953A (en) 2002-04-10

Family

ID=5275110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN99111467.1A Pending CN1343953A (en) 1999-08-16 1999-08-16 In-line handwritten Chinese character recognition method and handwriting input method

Country Status (1)

Country Link
CN (1) CN1343953A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763185B (en) * 2008-12-23 2014-10-01 财团法人交大思源基金会 Virtual input system and method thereof

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763185B (en) * 2008-12-23 2014-10-01 财团法人交大思源基金会 Virtual input system and method thereof

Similar Documents

Publication Publication Date Title
CN1167030C (en) Handwriteen character recognition using multi-resolution models
CN100533470C (en) A method and apparatus for decoding handwritten characters
EP1564675B1 (en) Apparatus and method for searching for digital ink query
CN1035904C (en) Estimation of baseline, line spacing and character height for handwriting recognition
Hussain et al. A comprehensive survey of handwritten document benchmarks: structure, usage and evaluation
Dongre et al. Development of comprehensive devnagari numeral and character database for offline handwritten character recognition
CN1149508C (en) On-line character recognition system
CN1025764C (en) Characters recognition method and system
Kim et al. Word segmentation of printed text lines based on gap clustering and special symbol detection
CN1343953A (en) In-line handwritten Chinese character recognition method and handwriting input method
CN100501656C (en) Tone and shape combination method for inputting Chinese character into electronic apparatus
Bhaskarabhatla et al. Experiences in Collection of Handwriting Data for Online Handwriting Recognition in Indic Scripts.
CN1116335A (en) Chinese character screen-writing input system
Bataineh A Printed PAW Image Database of Arabic Language for Document Analysis and Recognition.
CN106650716A (en) Identification method and device for computer font
Tappert An adaptive system for handwriting recognition
CN1016747B (en) Off-line Handwritten Chinese Recognition system and recognition methods thereof
CN1082732A (en) Chinese characters in computer is input and recognition methods dynamically
CN85105023A (en) Chinese-character stroke searching coding method and disposal route thereof
CN1096110A (en) Handwriting Chinese character input arrangement for microcomputer
Ibrayim et al. A Dynamic Programming Method for Segmentation of Online Cursive Uyghur Handwritten Words into Basic Recognizable Units.
CN1180336C (en) Code-less Chinese charcter input method in computer and Chinese charcter keyboard
Nakkach et al. CHAKEL-DB: Online Database for Handwriting Diacritic Arabic Character.
Suen et al. Farsi script recognition: a survey
JP3015137B2 (en) Handwritten character recognition device

Legal Events

Date Code Title Description
C57 Notification of unclear or unknown address
DD01 Delivery of document by public notice

Addressee: Wang Songping

Document name: Correction notice

C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication