CN87108006A - The way of automatic partition of input in Chinese - Google Patents

The way of automatic partition of input in Chinese Download PDF

Info

Publication number
CN87108006A
CN87108006A CN87108006.0A CN87108006A CN87108006A CN 87108006 A CN87108006 A CN 87108006A CN 87108006 A CN87108006 A CN 87108006A CN 87108006 A CN87108006 A CN 87108006A
Authority
CN
China
Prior art keywords
chinese
syllable
input
string
separation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CN87108006.0A
Other languages
Chinese (zh)
Other versions
CN1006252B (en
Inventor
伊藤英俊
楠井健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Publication of CN87108006A publication Critical patent/CN87108006A/en
Publication of CN1006252B publication Critical patent/CN1006252B/en
Expired legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)

Abstract

When the invention relates to by phonetic, as separation syllable automatically, improve the efficient of input with the very high syllable of some occurrence frequencies from keyboard input Chinese character.
The operator constantly imports phonetic alphabet from keyboard, whenever above-mentioned automatic separation syllable occurs or punctuation mark occurs or when the operator presses down manually the separation Zhibao key, computing machine just automatically is converted to the Chinese character string with the character string of this a succession of key entry.

Description

The invention relates to the input of Chinese written language (hereinafter to be referred as Chinese) is separated automatically, particularly on the input in Chinese device, a series of Chinese is separated.
On the information handling system of handling Chinese, the input in Chinese device must be arranged.As the input mode of such input in Chinese device, in general, be that the two combines and encodes to import according to the font of Chinese character or pronunciation or font and pronunciation.
Phonography as expression Chinese pronunciation has two kinds of methods, and a kind of is the phonetic alphabet of being formulated by Chinese Government, and another kind is the Chinese phonetic script before phonetic alphabet are formulated.China is main flow with phonetic alphabet now, and the Chinese phonetic script is only limited to some areas uses such as Taiwan.
Input in Chinese is carried out in record according to above-mentioned phonetic, in general, by a consonant key and a final key, has also just keyed in a Chinese syllable (that is a Chinese character).According to this kind method, just can key in phonetic by repeating simple operation, still, it is very necessary that the means that the phonetic alphabet string is divided into word or significant phrase are set.That is to say that the input in Chinese device as unit, is converted to Chinese character to phonetic alphabet with the separator in the phonetic alphabet string.
In the past,, adopting fullstop and comma (being called punctuation mark in the Chinese) as separator recited above.Separator as conversion, be not enough only with fullstop and comma, because when only using punctuate, the long situation of separation pitch is a lot, when the phonetic alphabet string is converted to Chinese character string (for example whole phrase conversion), between two separators, the separation mistake of the vocabulary that its front face branch takes place, can feed through to aft section in turn, separate wrong further increase so make again.So also make the conversion process time elongated, can not expect to obtain input in Chinese efficiently, this is its shortcoming.
In order to remedy above-mentioned shortcoming, on any syllable beyond the punctuate, separation key is set, so that point out merogenesis with it.But, using this method, the operator must recognize the operation of separation key frequently, therefore, is disarrayed by a series of key entry meetings that phonetic carries out, and becomes the reason of inefficiency.
The objective of the invention is to, ad hoc the syllable that the frequency of separating Chinese phrase is very high as separating syllable automatically, thereby just can by such syllable or punctuation mark or according to the pass of continuing of Chinese tie up to that place beyond the punctuation mark keys in manual separate specify etc. the phonetic alphabet string after separating, as the unit that is converted to Chinese character string, just can overcome aforesaid shortcoming, a kind of automatic segregation method of input in Chinese is provided.
According to the present invention and the automatic conversion regime of the input in Chinese of carrying out, in that a Chinese watch sound character string of keying in is being converted on the input in Chinese device of Chinese character string, as the input in Chinese way of automatic partition on this device.It has the means of detecting and conversion means, the means of detecting detect the phonographic alphabet string that gives the specific syllable of having determined earlier, as separating syllable automatically, all watch sound character strings till aforesaid automatic separation syllable that shift means then will be keyed in are transformed to Chinese character string together them.Like this, when keying in aforementioned phonography string in turn, in above-mentioned watch sound character string, whenever running into aforementioned automatic separation syllable, punctuation mark, or when pressing assistant's work point, just can realize of the successively conversion of watch sound character string to the Chinese character word string every assignment key.
According to input in Chinese way of automatic partition of the present invention, because the very high syllable of the frequency of separating Chinese phrase as separation syllable automatically, whenever this syllable is keyed in, they are just changed the unit of the Chinese character string of front as conversion, therefore, it is what speech or phrase that the operator needn't realize, and just can key in swimmingly according to phonetic, and its effect is that the efficient of key entry is improved greatly.
Fig. 1 is the block scheme of an embodiment of the input in Chinese way of automatic partition that carries out according to the present invention; Fig. 2 is a process flow diagram of handling action step according to the input in Chinese that the input in Chinese way of automatic partition carries out.
Below, illustrate according to input in Chinese way of automatic partition proposed by the invention with reference to accompanying drawing.
Fig. 1 is the block scheme of one embodiment of the present of invention.In the figure, has the input part 1 of keying in the phonetic function; The separation detecting element 2 that separation syllable wherein detects is checked and incited somebody to action to the phonetic alphabet string; The interim buffer part 3 of getting up of preserving of phonetic alphabet string; The phonetic alphabet string is become the transformation component 4 of Chinese character string; Conversion dictionary portion 5 with phonetic and Chinese table of comparisons; And to the phonetic keyed in or change the display part 6 that 3 Chinese shows, rely on these parts, realize the automatic separation of input in Chinese.
Input part 1 corresponding to operations of operators, sends the input data 100 that include the phonetic alphabet string.
Separate 2 inputs of detecting element data 100, when exporting phonetic alphabet string 101, when the phonetic alphabet string that has the specific syllable that gives elder generation's regulation in the input data 100, punctuation mark or manual separation indicator, just export control signal 102.
Buffer part 3 is come in 101 inputs of above-mentioned phonetic alphabet string and is temporarily preserved, and it is exported as syllable signal 103.
Transformation component 4 is according to conversion control signal 102, input syllable signal 103, and have between the conversion dictionary portion 5 of the phonetic and the Chinese table of comparisons and transmit recall signal 104 mutually, simultaneously phonetic is converted to Chinese character, output shows signal 105.
The expression portion above-mentioned shows signal 105 of 6 inputs shows, makes the operator can observe affirmation.
Figure 87108006_IMG2
In the table 1, express the example that gives the specific syllable of determining earlier in order to detect and export control signal 102 by above-mentioned separation detecting element.In the same table, when syllable shown in keying in according to alphabetic writing, the phonetic alphabet string is used as the unit of conversion and is separated.Though these syllables himself are monosyllabic speech, and they are very high as the frequency of the end syllable of polysyllabic word.And then, because of its usage frequency is very high, so, have the very strong ability of more subtly Chinese character string being separated at the end of word or phrase.We just are called automatic separation syllable to such syllable.
Fig. 2 is a process flow diagram of handling action step according to the input in Chinese that above-mentioned input in Chinese way of automatic partition carries out.Among this figure, handling 11 is the operations of keying in phonetic, and handling 12 is the demonstration that phonetic carried out in order to confirm to key in.
Handle 13,14 and 15, the data that to differentiate the data that input comes in respectively are punctuates, separate syllable or produced by separation key automatically, if these be not, then return and handle 11, continue to key in.If punctuate, separate syllable or automatically, then forward to and handle 16 by the code that separation key produces.
Handle 16 the above-mentioned phonetic alphabet string of having separated is converted to Chinese character string.At this moment will be with reference to the conversion dictionary file 17 that phonetic and Chinese character are mapped.
Handle 18, the phonetic alphabet tandem arrangement of key entry is changed into Chinese character string and shown.Whether the input of handling 19 pairs of Chinese has finished to judge, if do not finish, then returns and handles 11.
Figure 87108006_IMG3
In the table 2, provided sample calculation about separation rate, separation pitch and the evaluation coefficient of aforementioned automatic separation syllable.Separation rate in this table has represented that this syllable is divided into Chinese character string the ratio of significant separation syllable.Calculating is (except the proper noun) that all vocabulary of being occurred in the decennary Chinese language textbook from the primary school to the senior middle school to China carries out.And, be that a words and two words are calculated, the frequency that occurs owing to the phrase that surpasses three words is extremely low, thereby ignores.In addition, the separation pitch in above-mentioned statistics has been represented the mean value of the spacing that this syllable occurs, and represents with number of words.And evaluation coefficient be have a words of this syllable and this syllable as two words of suffix ratio to the full syllable sum.This expression is exactly to have listed by the size order of evaluation coefficient respectively to separate syllable.
As separating syllable automatically, just can automatically carry out significant separation to above-mentioned such syllable to the Chinese sequence.
The 3rd table (a)
Figure 87108006_IMG4
The 3rd table (b)
Figure 87108006_IMG5
Table 3(a) and (b) be to utilize above-mentioned input in Chinese way of automatic partition and the example of the input in Chinese carried out.(A) hurdle in this table is the phonetic alphabet string of keying in (initial consonant capitalization, simple or compound vowel of a Chinese syllable is represented with lowercase).With the symbol in the hurdle " V " is the flow of reading aloud according to Chinese, the separation indicator that the time-out place beyond punctuation mark keys in.
(B) hurdle is all separations of extracting out from (A) hurdle.Just, "
Figure 87108006_IMG6
" be according to the key entry of punctuation mark and the separation of carrying out automatically; " De| " or " He| " etc. are the automatic separations that the key entry according to above-mentioned automatic separation syllable produces; " ‖ " is the pause when reading aloud and the manual separation of keying in.
Hurdle (C) is to be divided into unit with in hurdle (B) each, the phonetic alphabet string in hurdle (A) is converted to the result of Chinese character string.Converter technique is to adopt the longest so-called consensus method repeatedly and the whole phrase conversion carried out.Automatically the word that carries out when being whole phrase conversion with "/" in the hurdle is separated.In addition, when homonym, then adopt that maximum speech in the occurrence frequency statistics of these speech.
Hurdle (D) is the kanji that is equivalent to above-mentioned Chinese character string.
In (a) and Chinese (b) of table 3, totally 59 syllables (Chinese character number of words) are divided into 19 subregions by 6 punctuation marks and 9 manual separation appointments of separating syllable and above-mentioned pause automatically and doing etc.And the suitable syllable number average out to 3(maximum 6 in every district, minimum 1), the length between the marker space of phrase conversion has shortened significantly owing to separate the employing of syllable automatically as a whole.
Above-mentioned example sentence is the Qian Xuesen doctor's of Guangming Daily (publication on August 14th, 1986) the part of paper.Conversion accuracy from phonetic to Chinese reaches 100%.But when input in Chinese, identical when importing with Japanese, be not perfect in every way to the processing of homonym, be not absolutely based on the separation ability of above-mentioned automatic separation syllable.Usually the conversion accuracy as the practical business article of office automation object is about 95%, and the accuracy of literary works is about 85%.Therefore, Chinese to input must being set again, to correct processing can syllable be that unit carries out the Chinese of being imported for example.
And, now China to " what is a speech? " " should where separating in sentence, they are divided into word one by one? " etc. problem, also there is not clear conclusions.The input in Chinese way of automatic partition that carries out according to the present invention to the separation of each word, and does not rely on operator's judgement, but is united according to intrasystem basic definition, has so just removed above-mentioned obstacle.

Claims (1)

  1. One input in Chinese way of automatic partition is characterized in that:
    Key in the phonography string of Chinese, and be converted into input in Chinese way of automatic partition on the input in Chinese device of Chinese character string, it have corresponding to the phonography string that gives the specific syllable of determining earlier as separating the means that detect that syllable is detected automatically; And the conversion means that the watch sound character string of keying in and showing integrally is converted to the Chinese character string till above-mentioned automatic separation syllable; When aforementioned phonography string is keyed in turn, in above-mentioned phonography string, have aforementioned automatic separation syllable, punctuation mark or, just the phonography string is converted to Chinese character string in turn when pressing down when manually separating assignment key.
CN87108006.0A 1986-11-26 1987-11-26 The way of automatic partition for chinese input Expired CN1006252B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP282731/86 1986-11-26
JP61282731A JPS63136163A (en) 1986-11-26 1986-11-26 Automatic punctuation system for input of chinese sentence

Publications (2)

Publication Number Publication Date
CN87108006A true CN87108006A (en) 1988-06-08
CN1006252B CN1006252B (en) 1989-12-27

Family

ID=17656308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN87108006.0A Expired CN1006252B (en) 1986-11-26 1987-11-26 The way of automatic partition for chinese input

Country Status (2)

Country Link
JP (1) JPS63136163A (en)
CN (1) CN1006252B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011143827A1 (en) * 2010-05-21 2011-11-24 Google Inc. Input method editor
CN102566775A (en) * 2010-12-31 2012-07-11 上海量明科技发展有限公司 Input method and system for generating character interval

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS61190657A (en) * 1985-02-20 1986-08-25 Hitachi Ltd Recognizing system for japanese language character string
JPS62226268A (en) * 1986-03-27 1987-10-05 Nec Corp Input system for chinese sentence

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011143827A1 (en) * 2010-05-21 2011-11-24 Google Inc. Input method editor
CN103026318A (en) * 2010-05-21 2013-04-03 谷歌公司 Input method editor
CN103026318B (en) * 2010-05-21 2016-08-17 谷歌公司 Input method editor
US9552125B2 (en) 2010-05-21 2017-01-24 Google Inc. Input method editor
CN102566775A (en) * 2010-12-31 2012-07-11 上海量明科技发展有限公司 Input method and system for generating character interval

Also Published As

Publication number Publication date
CN1006252B (en) 1989-12-27
JPS63136163A (en) 1988-06-08

Similar Documents

Publication Publication Date Title
CN1191514C (en) System and method for processing chinese language text
EP0262938B1 (en) Language translation system
KR101279676B1 (en) Method and apparatus for creating a language model and kana-kanji conversion
JPH03224055A (en) Method and device for input of translation text
KR20230009564A (en) Learning data correction method and apparatus thereof using ensemble score
JPH0352058A (en) Document processor for voice input
CN87108006A (en) The way of automatic partition of input in Chinese
EP0271664A3 (en) A morphological/phonetic method for ranking word similarities
JP2611904B2 (en) Character recognition device
KR0123238B1 (en) Morphemes analysis system
CN103297709A (en) Device for adding Chinese subtitles to Chinese audio video data
JPS6120887B2 (en)
KR100355453B1 (en) User Interface method using Hand-written character recognition and Speech Recognition Synchronous
KR890002582B1 (en) Process and apparatus involving pattern recognition
JPS62134698A (en) Voice input system for multiple word
CN1011167B (en) Sound-input computer-aided chinese typewriter
Wagner et al. Isolated-word recognition of the complete vocabulary of spoken Chinese
Nylander Statistics and phonotactical rules in finding OCR errors
JPS6022227A (en) European text processor
JPH0916575A (en) Pronunciation dictionary device
JPS62262099A (en) Pronunciation dictionary updating apparatus
CN1043490C (en) Muti-word exchanging apparatus and Chinese character exchanging apparatus
JPH0362260A (en) Detecting/correcting device for katakana word error
JPH0682364B2 (en) Japanese sentence processing method
CN115310458A (en) Name translation method, system, equipment and computer readable storage medium

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C13 Decision
GR02 Examined patent application
C14 Grant of patent or utility model
GR01 Patent grant
C15 Extension of patent right duration from 15 to 20 years for appl. with date before 31.12.1992 and still valid on 11.12.2001 (patent law change 1993)
OR01 Other related matters
C17 Cessation of patent right
CX01 Expiry of patent term