CN87108006A - The way of automatic partition of input in Chinese - Google Patents
The way of automatic partition of input in Chinese Download PDFInfo
- Publication number
- CN87108006A CN87108006A CN87108006.0A CN87108006A CN87108006A CN 87108006 A CN87108006 A CN 87108006A CN 87108006 A CN87108006 A CN 87108006A CN 87108006 A CN87108006 A CN 87108006A
- Authority
- CN
- China
- Prior art keywords
- chinese
- syllable
- input
- string
- separation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Landscapes
- Document Processing Apparatus (AREA)
- Machine Translation (AREA)
Abstract
When the invention relates to by phonetic, as separation syllable automatically, improve the efficient of input with the very high syllable of some occurrence frequencies from keyboard input Chinese character.
The operator constantly imports phonetic alphabet from keyboard, whenever above-mentioned automatic separation syllable occurs or punctuation mark occurs or when the operator presses down manually the separation Zhibao key, computing machine just automatically is converted to the Chinese character string with the character string of this a succession of key entry.
Description
The invention relates to the input of Chinese written language (hereinafter to be referred as Chinese) is separated automatically, particularly on the input in Chinese device, a series of Chinese is separated.
On the information handling system of handling Chinese, the input in Chinese device must be arranged.As the input mode of such input in Chinese device, in general, be that the two combines and encodes to import according to the font of Chinese character or pronunciation or font and pronunciation.
Phonography as expression Chinese pronunciation has two kinds of methods, and a kind of is the phonetic alphabet of being formulated by Chinese Government, and another kind is the Chinese phonetic script before phonetic alphabet are formulated.China is main flow with phonetic alphabet now, and the Chinese phonetic script is only limited to some areas uses such as Taiwan.
Input in Chinese is carried out in record according to above-mentioned phonetic, in general, by a consonant key and a final key, has also just keyed in a Chinese syllable (that is a Chinese character).According to this kind method, just can key in phonetic by repeating simple operation, still, it is very necessary that the means that the phonetic alphabet string is divided into word or significant phrase are set.That is to say that the input in Chinese device as unit, is converted to Chinese character to phonetic alphabet with the separator in the phonetic alphabet string.
In the past,, adopting fullstop and comma (being called punctuation mark in the Chinese) as separator recited above.Separator as conversion, be not enough only with fullstop and comma, because when only using punctuate, the long situation of separation pitch is a lot, when the phonetic alphabet string is converted to Chinese character string (for example whole phrase conversion), between two separators, the separation mistake of the vocabulary that its front face branch takes place, can feed through to aft section in turn, separate wrong further increase so make again.So also make the conversion process time elongated, can not expect to obtain input in Chinese efficiently, this is its shortcoming.
In order to remedy above-mentioned shortcoming, on any syllable beyond the punctuate, separation key is set, so that point out merogenesis with it.But, using this method, the operator must recognize the operation of separation key frequently, therefore, is disarrayed by a series of key entry meetings that phonetic carries out, and becomes the reason of inefficiency.
The objective of the invention is to, ad hoc the syllable that the frequency of separating Chinese phrase is very high as separating syllable automatically, thereby just can by such syllable or punctuation mark or according to the pass of continuing of Chinese tie up to that place beyond the punctuation mark keys in manual separate specify etc. the phonetic alphabet string after separating, as the unit that is converted to Chinese character string, just can overcome aforesaid shortcoming, a kind of automatic segregation method of input in Chinese is provided.
According to the present invention and the automatic conversion regime of the input in Chinese of carrying out, in that a Chinese watch sound character string of keying in is being converted on the input in Chinese device of Chinese character string, as the input in Chinese way of automatic partition on this device.It has the means of detecting and conversion means, the means of detecting detect the phonographic alphabet string that gives the specific syllable of having determined earlier, as separating syllable automatically, all watch sound character strings till aforesaid automatic separation syllable that shift means then will be keyed in are transformed to Chinese character string together them.Like this, when keying in aforementioned phonography string in turn, in above-mentioned watch sound character string, whenever running into aforementioned automatic separation syllable, punctuation mark, or when pressing assistant's work point, just can realize of the successively conversion of watch sound character string to the Chinese character word string every assignment key.
According to input in Chinese way of automatic partition of the present invention, because the very high syllable of the frequency of separating Chinese phrase as separation syllable automatically, whenever this syllable is keyed in, they are just changed the unit of the Chinese character string of front as conversion, therefore, it is what speech or phrase that the operator needn't realize, and just can key in swimmingly according to phonetic, and its effect is that the efficient of key entry is improved greatly.
Fig. 1 is the block scheme of an embodiment of the input in Chinese way of automatic partition that carries out according to the present invention; Fig. 2 is a process flow diagram of handling action step according to the input in Chinese that the input in Chinese way of automatic partition carries out.
Below, illustrate according to input in Chinese way of automatic partition proposed by the invention with reference to accompanying drawing.
Fig. 1 is the block scheme of one embodiment of the present of invention.In the figure, has the input part 1 of keying in the phonetic function; The separation detecting element 2 that separation syllable wherein detects is checked and incited somebody to action to the phonetic alphabet string; The interim buffer part 3 of getting up of preserving of phonetic alphabet string; The phonetic alphabet string is become the transformation component 4 of Chinese character string; Conversion dictionary portion 5 with phonetic and Chinese table of comparisons; And to the phonetic keyed in or change the display part 6 that 3 Chinese shows, rely on these parts, realize the automatic separation of input in Chinese.
Input part 1 corresponding to operations of operators, sends the input data 100 that include the phonetic alphabet string.
Separate 2 inputs of detecting element data 100, when exporting phonetic alphabet string 101, when the phonetic alphabet string that has the specific syllable that gives elder generation's regulation in the input data 100, punctuation mark or manual separation indicator, just export control signal 102.
Buffer part 3 is come in 101 inputs of above-mentioned phonetic alphabet string and is temporarily preserved, and it is exported as syllable signal 103.
Transformation component 4 is according to conversion control signal 102, input syllable signal 103, and have between the conversion dictionary portion 5 of the phonetic and the Chinese table of comparisons and transmit recall signal 104 mutually, simultaneously phonetic is converted to Chinese character, output shows signal 105.
The expression portion above-mentioned shows signal 105 of 6 inputs shows, makes the operator can observe affirmation.
In the table 1, express the example that gives the specific syllable of determining earlier in order to detect and export control signal 102 by above-mentioned separation detecting element.In the same table, when syllable shown in keying in according to alphabetic writing, the phonetic alphabet string is used as the unit of conversion and is separated.Though these syllables himself are monosyllabic speech, and they are very high as the frequency of the end syllable of polysyllabic word.And then, because of its usage frequency is very high, so, have the very strong ability of more subtly Chinese character string being separated at the end of word or phrase.We just are called automatic separation syllable to such syllable.
Fig. 2 is a process flow diagram of handling action step according to the input in Chinese that above-mentioned input in Chinese way of automatic partition carries out.Among this figure, handling 11 is the operations of keying in phonetic, and handling 12 is the demonstration that phonetic carried out in order to confirm to key in.
Handle 18, the phonetic alphabet tandem arrangement of key entry is changed into Chinese character string and shown.Whether the input of handling 19 pairs of Chinese has finished to judge, if do not finish, then returns and handles 11.
In the table 2, provided sample calculation about separation rate, separation pitch and the evaluation coefficient of aforementioned automatic separation syllable.Separation rate in this table has represented that this syllable is divided into Chinese character string the ratio of significant separation syllable.Calculating is (except the proper noun) that all vocabulary of being occurred in the decennary Chinese language textbook from the primary school to the senior middle school to China carries out.And, be that a words and two words are calculated, the frequency that occurs owing to the phrase that surpasses three words is extremely low, thereby ignores.In addition, the separation pitch in above-mentioned statistics has been represented the mean value of the spacing that this syllable occurs, and represents with number of words.And evaluation coefficient be have a words of this syllable and this syllable as two words of suffix ratio to the full syllable sum.This expression is exactly to have listed by the size order of evaluation coefficient respectively to separate syllable.
As separating syllable automatically, just can automatically carry out significant separation to above-mentioned such syllable to the Chinese sequence.
The 3rd table (a)
The 3rd table (b)
Table 3(a) and (b) be to utilize above-mentioned input in Chinese way of automatic partition and the example of the input in Chinese carried out.(A) hurdle in this table is the phonetic alphabet string of keying in (initial consonant capitalization, simple or compound vowel of a Chinese syllable is represented with lowercase).With the symbol in the hurdle " V " is the flow of reading aloud according to Chinese, the separation indicator that the time-out place beyond punctuation mark keys in.
(B) hurdle is all separations of extracting out from (A) hurdle.Just, "
" be according to the key entry of punctuation mark and the separation of carrying out automatically; " De| " or " He| " etc. are the automatic separations that the key entry according to above-mentioned automatic separation syllable produces; " ‖ " is the pause when reading aloud and the manual separation of keying in.
Hurdle (C) is to be divided into unit with in hurdle (B) each, the phonetic alphabet string in hurdle (A) is converted to the result of Chinese character string.Converter technique is to adopt the longest so-called consensus method repeatedly and the whole phrase conversion carried out.Automatically the word that carries out when being whole phrase conversion with "/" in the hurdle is separated.In addition, when homonym, then adopt that maximum speech in the occurrence frequency statistics of these speech.
Hurdle (D) is the kanji that is equivalent to above-mentioned Chinese character string.
In (a) and Chinese (b) of table 3, totally 59 syllables (Chinese character number of words) are divided into 19 subregions by 6 punctuation marks and 9 manual separation appointments of separating syllable and above-mentioned pause automatically and doing etc.And the suitable syllable number average out to 3(maximum 6 in every district, minimum 1), the length between the marker space of phrase conversion has shortened significantly owing to separate the employing of syllable automatically as a whole.
Above-mentioned example sentence is the Qian Xuesen doctor's of Guangming Daily (publication on August 14th, 1986) the part of paper.Conversion accuracy from phonetic to Chinese reaches 100%.But when input in Chinese, identical when importing with Japanese, be not perfect in every way to the processing of homonym, be not absolutely based on the separation ability of above-mentioned automatic separation syllable.Usually the conversion accuracy as the practical business article of office automation object is about 95%, and the accuracy of literary works is about 85%.Therefore, Chinese to input must being set again, to correct processing can syllable be that unit carries out the Chinese of being imported for example.
And, now China to " what is a speech? " " should where separating in sentence, they are divided into word one by one? " etc. problem, also there is not clear conclusions.The input in Chinese way of automatic partition that carries out according to the present invention to the separation of each word, and does not rely on operator's judgement, but is united according to intrasystem basic definition, has so just removed above-mentioned obstacle.
Claims (1)
- One input in Chinese way of automatic partition is characterized in that:Key in the phonography string of Chinese, and be converted into input in Chinese way of automatic partition on the input in Chinese device of Chinese character string, it have corresponding to the phonography string that gives the specific syllable of determining earlier as separating the means that detect that syllable is detected automatically; And the conversion means that the watch sound character string of keying in and showing integrally is converted to the Chinese character string till above-mentioned automatic separation syllable; When aforementioned phonography string is keyed in turn, in above-mentioned phonography string, have aforementioned automatic separation syllable, punctuation mark or, just the phonography string is converted to Chinese character string in turn when pressing down when manually separating assignment key.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP282731/86 | 1986-11-26 | ||
JP61282731A JPS63136163A (en) | 1986-11-26 | 1986-11-26 | Automatic punctuation system for input of chinese sentence |
Publications (2)
Publication Number | Publication Date |
---|---|
CN87108006A true CN87108006A (en) | 1988-06-08 |
CN1006252B CN1006252B (en) | 1989-12-27 |
Family
ID=17656308
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN87108006.0A Expired CN1006252B (en) | 1986-11-26 | 1987-11-26 | The way of automatic partition for chinese input |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPS63136163A (en) |
CN (1) | CN1006252B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011143827A1 (en) * | 2010-05-21 | 2011-11-24 | Google Inc. | Input method editor |
CN102566775A (en) * | 2010-12-31 | 2012-07-11 | 上海量明科技发展有限公司 | Input method and system for generating character interval |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS61190657A (en) * | 1985-02-20 | 1986-08-25 | Hitachi Ltd | Recognizing system for japanese language character string |
JPS62226268A (en) * | 1986-03-27 | 1987-10-05 | Nec Corp | Input system for chinese sentence |
-
1986
- 1986-11-26 JP JP61282731A patent/JPS63136163A/en active Pending
-
1987
- 1987-11-26 CN CN87108006.0A patent/CN1006252B/en not_active Expired
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011143827A1 (en) * | 2010-05-21 | 2011-11-24 | Google Inc. | Input method editor |
CN103026318A (en) * | 2010-05-21 | 2013-04-03 | 谷歌公司 | Input method editor |
CN103026318B (en) * | 2010-05-21 | 2016-08-17 | 谷歌公司 | Input method editor |
US9552125B2 (en) | 2010-05-21 | 2017-01-24 | Google Inc. | Input method editor |
CN102566775A (en) * | 2010-12-31 | 2012-07-11 | 上海量明科技发展有限公司 | Input method and system for generating character interval |
Also Published As
Publication number | Publication date |
---|---|
CN1006252B (en) | 1989-12-27 |
JPS63136163A (en) | 1988-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1191514C (en) | System and method for processing chinese language text | |
EP0262938B1 (en) | Language translation system | |
KR101279676B1 (en) | Method and apparatus for creating a language model and kana-kanji conversion | |
JPH03224055A (en) | Method and device for input of translation text | |
KR20230009564A (en) | Learning data correction method and apparatus thereof using ensemble score | |
JPH0352058A (en) | Document processor for voice input | |
CN87108006A (en) | The way of automatic partition of input in Chinese | |
EP0271664A3 (en) | A morphological/phonetic method for ranking word similarities | |
JP2611904B2 (en) | Character recognition device | |
KR0123238B1 (en) | Morphemes analysis system | |
CN103297709A (en) | Device for adding Chinese subtitles to Chinese audio video data | |
JPS6120887B2 (en) | ||
KR100355453B1 (en) | User Interface method using Hand-written character recognition and Speech Recognition Synchronous | |
KR890002582B1 (en) | Process and apparatus involving pattern recognition | |
JPS62134698A (en) | Voice input system for multiple word | |
CN1011167B (en) | Sound-input computer-aided chinese typewriter | |
Wagner et al. | Isolated-word recognition of the complete vocabulary of spoken Chinese | |
Nylander | Statistics and phonotactical rules in finding OCR errors | |
JPS6022227A (en) | European text processor | |
JPH0916575A (en) | Pronunciation dictionary device | |
JPS62262099A (en) | Pronunciation dictionary updating apparatus | |
CN1043490C (en) | Muti-word exchanging apparatus and Chinese character exchanging apparatus | |
JPH0362260A (en) | Detecting/correcting device for katakana word error | |
JPH0682364B2 (en) | Japanese sentence processing method | |
CN115310458A (en) | Name translation method, system, equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C06 | Publication | ||
PB01 | Publication | ||
C13 | Decision | ||
GR02 | Examined patent application | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C15 | Extension of patent right duration from 15 to 20 years for appl. with date before 31.12.1992 and still valid on 11.12.2001 (patent law change 1993) | ||
OR01 | Other related matters | ||
C17 | Cessation of patent right | ||
CX01 | Expiry of patent term |