CN1208901A - Method for automatically analyzing and processing Chinese characters which having more than one sound - Google Patents

Method for automatically analyzing and processing Chinese characters which having more than one sound Download PDF

Info

Publication number
CN1208901A
CN1208901A CN97116046A CN97116046A CN1208901A CN 1208901 A CN1208901 A CN 1208901A CN 97116046 A CN97116046 A CN 97116046A CN 97116046 A CN97116046 A CN 97116046A CN 1208901 A CN1208901 A CN 1208901A
Authority
CN
China
Prior art keywords
phrases
words
word
chinese
storer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN97116046A
Other languages
Chinese (zh)
Other versions
CN1105979C (en
Inventor
张景嵩
钱力强
杨徽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Corp
Original Assignee
Inventec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Corp filed Critical Inventec Corp
Priority to CN97116046A priority Critical patent/CN1105979C/en
Publication of CN1208901A publication Critical patent/CN1208901A/en
Application granted granted Critical
Publication of CN1105979C publication Critical patent/CN1105979C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The automatic analyzing and processing method for Chinese multipronouncing characters is that to do words dividing to the characters string of inputted Chinese sentence by CPU according to the Chinese vocabulary stored in the memory, and then to check each divided word in turn according to Chinese multipronouncing words and characters library. The marked single-character word in the library is regarded as the core, to search character in front of the single-character word in context of the sentence and sign it as "front character", to search a word next to the single-character word and sign it as "rear character", then to combine the single-character word with "front character" or "rear character" to a 'temporary word', and to search the pronouciation of the multipronouciation character in the multipronouncing words library.

Description

The automatic analysis of Chinese polyphone and the method for processing
The present invention relates to a kind of automatic analysis of Chinese polyphone and the method for processing, the method that particularly a kind of CPU (central processing unit) of utilizing computing machine is discerned, analyzed and handle multitone phenomenon in the Chinese (promptly general alleged Chinese).
In the literal of Chinese, there is the pronunciation of many literal to be not limited to a kind of pronunciation, these have the literal of multiple pronunciation, different according to literary composition relation before and after itself and its, pronunciation difference not only, the meaning of its literal also will change thereupon, when so these literal pronounce with a certain pronunciation, be a certain meaning of representative,, just become another meaning if during with the pronunciation of an other pronunciation.This same font, desire have several different pronunciations respectively, and represent the Chinese character of several different meanings, promptly so-called among the present invention " polyphone " respectively.
Be example now with " OK " word, when with "
Figure A9711604600031
" when being pronunciation, the meaning of representative " walking "; If when being pronunciation, represent the meaning of " firm " or " industry " with " h á ng "; Again if when be pronunciation, be meant that then trees are arranged in the appearance of team, or the intonation of what is called " rows of trees " in the spoken language with " h à ng ", in addition, in the writing in classical Chinese, read " h à ng ", also represent firm expression, as: say in The Analects of Confucius on " sub-road, every trade as also ".Therefore, " OK " is so-called in the Chinese " polyphone ".
Because, in the literal of Chinese, the multitone phenomenon that one literal has multiple pronunciation is the characteristic of Chinese Language, so in all age, in Chinese society, polyphone is applied to general article all the time very at large, in song and the spoken language, and its indispensable meaning and importance is arranged, and in the Chinese group that with Chinese is the maximum family of languages in the whole world, how with Chinese application in automatic machineries such as computing machine, become the field of extremely paying attention in the following development of information science and technology gradually, therefore, consider this situation, following at computing machine, in the development of many automaton equipment such as electronic dictionary and electronic notebook, how making it possess the language phenomenon that is enough at Chinese discerns automatically, the ability of analyzing and handling promptly becomes a considerable problem.
The purpose of this invention is to provide a kind of automatic analysis of Chinese polyphone and the method for processing, be specifically related to a kind of method of discerning automatically, analyzing and handle at the multitone phenomenon in the Chinese language, can make the polyphone in automatic machinery such as the computing machine identification Chinese written language by method of the present invention, and right pronunciation and the meaning of definite polyphone in context.
For the automatic analysis of the Chinese polyphone provided by the present invention that achieves the above object and the method for processing are the Chinese words and phrases to be stored in the storer of computing machine with digital form, to set up a Chinese wordbank, then, CPU (central processing unit) is carried out the words and phrases division according to this Chinese wordbank to the Chinese language word string (hereinafter to be referred as sentence) in the input computing machine; With first words and phrases after dividing as " current words and phrases "; These " current words and phrases " are judged, if certain word is not " monosyllabic word ", again according to a Chinese polyphone dictionary of being set up, inspection is by this word that is marked off in the sentence, promptly in this polyphone dictionary, search these words and phrases, and, be tagged on these words and phrases the polyphone pronunciation that is searched; If " current words and phrases " are labeled as " monosyllabic word ", be the center then with the words and phrases that are marked as " monosyllabic word ", in the context of this sentence, search a word that is somebody's turn to do " monosyllabic word " front, be labeled as " preceding word ", if do not search, then seek a word that is somebody's turn to do " monosyllabic word " back, be labeled as " back word " and should " monosyllabic word " synthesize one " interim words and phrases " with " preceding word " or " back word ", then in this polyphone dictionary, search these words and phrases, with the pronunciation of the polyphone that searches, be tagged on these words and phrases again; Rechecking is till the inspection of each words and phrases that is marked off in sentence completion in regular turn.
Method of the present invention is by means of the high speed analysis of computing machine and processing power, can make fields such as the computer speech of identification of its artificial intelligence at relevant voice, computerized speech and Chinese is synthetic, all possible Chinese speech pronunciation is done accurate definite identification, analysis and processing, go out the right pronunciation and the meaning of each literal in the Chinese sentence with immediately identifying.
Figure 1 shows that system architecture synoptic diagram of the present invention;
Figure 2 shows that the automatic analysis of Chinese polyphone of the present invention and the detailed process synoptic diagram of disposal route.
Now in conjunction with the accompanying drawings the automatic analysis and the disposal route of Chinese polyphone of the present invention is elaborated.
With reference to Fig. 1, the said method of the present invention is the CPU (central processing unit) 1 by computing machine mainly, storer 2, and a Chinese wordbank 3 of setting up in the storer 2 and a Chinese polyphone dictionary 4 system that constituted finishes.
With reference to Fig. 2, the automatic analysis and the disposal route of Chinese polyphone of the present invention comprise following steps:
At first, the Chinese words and phrases are stored in the storer 2 of computing machine with digital form, to set up a Chinese wordbank 3 (example Chinese wordbank as shown in Table 1), CPU (central processing unit) 1 will be according to the Chinese wordbank of being stored in the storer 23, the Chinese language word string (hereinafter to be referred as sentence) of input in the computing machine carried out words and phrases divide (as shown in Figure 2 10,11), as the example sentence of being imported be: " genseng the doll take part in game ", after dividing, words and phrases obtain " genseng ", " doll ", " participation " reaches words and phrases such as " matches ", and these words and phrases is stored in the storer 2 of computing machine;
Table one
Numerical code The Chinese words and phrases
?????... ????...
?????... Brain drain
?????... The clamors of the people bubble up
?????... Human nature
?????... ????...
?????... Population
?????... Human feelings
?????... Genseng
?????... ????...
?????... The doll
?????... The baby
?????... ????...
?????... Uneven
?????... ????...
?????... Visit
?????... Participate in
?????... Referring to
Make CPU (central processing unit) 1 then in storer 2, read this sentence and after words and phrases are divided, be stored in first words and phrases in the storer, and with it as " current words and phrases " (as shown in Figure 2 12);
These " current words and phrases " are judged, if " current words and phrases " are one to be labeled as the words and phrases of " monosyllabic word " (promptly these words and phrases have only a word), then the CPU (central processing unit) 1 of computing machine is carried out next step (as shown 13), otherwise, carry out search to these " current words and phrases " (as shown 13,17);
CPU (central processing unit) 1 is the center with this " monosyllabic word " subsequently, by searching a Chinese character that is positioned at current " monosyllabic word " front in the sentence of being imported, if exist, be about to this Chinese character and be labeled as " preceding word ", be kept in the storer, otherwise, CPU (central processing unit) 1 will be again by searching a Chinese character that is positioned at this " monosyllabic word " back in the sentence of being imported, if exist, be about to this Chinese character and be labeled as " back word ", be kept in the storer 2 (as shown 14,15), otherwise, mark to next words and phrases (as shown 20) carried out;
Then CPU (central processing unit) 1 will be stored in this " the preceding word " or " back word " in the storer, according to its order in former sentence, be combined into one " interim words and phrases " with current " monosyllabic word ", and should " interim words and phrases " be considered as " current words and phrases ", in a Chinese polyphone dictionary of being set up, search the pronunciation that is somebody's turn to do polyphone in " interim words and phrases ", if search, the pronunciation of this polyphone that is soon searched, store in the storer 2 (as among the figure 16,17,18,19), for example reaching " cutting " two words with " OK " is example, can be by table two, the structure of three these Chinese polyphone dictionaries of explanation;
Table two
Figure A9711604600061
Table three
According to this Chinese polyphone dictionary of being set up, check in regular turn by each words and phrases that is marked off in the sentence, if certain words and phrases is not " monosyllabic word ", promptly in polyphone dictionary (as shown in Table 4), search these words and phrases, and the pronunciation of polyphone in these words and phrases that will in this distorsion character word stock, search, give record;
Table four
Numerical code Chinese polyphone dictionary
?... ????... ... ????... ?????...
?... The people ????rén
?... Ginseng Character pronunciation before doing ????cān Add, with,
?... Character pronunciation before doing ????cēn Poor, wrong,
?... Make the back character pronunciation ????shēn People, ocean, flag, sea,
?... ????... ... ????... ????...
?... The baby Character pronunciation before doing ????wá The baby,
?... The baby Make the back character pronunciation Wa (softly) The baby,
?... ????... ... ????... ????...
?... Add ????jiā
?... ????... ... ????... ????...
CPU (central processing unit) 1 is in computer memory 2 then, read this sentence and after words and phrases are divided, be stored in next words and phrases conducts " current words and phrases " this storer 2 in, if the inspection of each words and phrases that is marked off in the sentence completion in regular turn, promptly carry out next step, otherwise, return whether carry out these " current words and phrases " be the judgement (as among the figure 21,13) of " monosyllabic word "; So, the multitone phenomenon of each words and phrases in the described example sentence can obtain the result shown in the table five in regular turn through automatic analysis of the present invention and processing.
Table five
Monosyllabic word? Current words and phrases? The polyphone pronunciation
Not Genseng Ginseng: sh ē n
Not The doll Baby: w á, baby: wa
Not Participate in Ginseng: c ā n
Not Match Do not have
CPU (central processing unit) 1 finishes the sentence in the input computing machine is carried out the analysis of polyphone and processing (as shown 22) subsequently.
The present invention is to be foundation with Chinese words and phrases storehouse and Chinese polyphone dictionary, utilize the CPU (central processing unit) of computing machine to discern at multitone phenomenon in the Chinese, the method of analyzing and handling, this method is the method for the multitone phenomenon in a kind of effectively identification Chinese really, and do not make any multitone phenomenon analysis in the conventional language processor, only choose the arbitrary possible pronunciation in these polyphones randomly, pronounce to handle, cause the plural pronunciation of the equal tool of these polyphones, so this conventional language processor at the applied picked at random method of the pronunciation of polyphone, is difficult to be higher than 50% with the rate of carrying a tune that makes polyphone.Yet, use method of the present invention, its rate of carrying a tune of determining polyphone can reach more than 90% easily.
The above only is a preferred embodiment of the present invention, the scope of the present invention; be not limited thereto; all those skilled in the art are according to the content that the present invention discloses, and what may be obvious that improves and the equivalence variation, all should belong to protection scope of the present invention.

Claims (2)

1. the automatic analysis of a Chinese polyphone and the method for processing, this method comprises following steps:
(1) CPU (central processing unit) is according to the Chinese wordbank of storing in the storer, the Chinese language word string of input carried out words and phrases divide, and will divide the result and deposit in the storer;
(2) from storer, read first words and phrases conducts " current words and phrases " after dividing
(3) these " current words and phrases " are judged,, then carried out following the 4th step,, then carry out following the 6th step if " current words and phrases " are not one to be labeled as the words and phrases of " monosyllabic word " if " current words and phrases " are one to be labeled as the words and phrases of " monosyllabic word ";
(4) be the center with this " monosyllabic word ", by searching a Chinese character that is positioned at this " monosyllabic word " front in the sentence of being imported, if exist, then this Chinese character is labeled as " preceding word ", and deposit in the storer, if there is no described front Chinese character, then by searching a Chinese character that is positioned at this " monosyllabic word " back in the sentence of being imported, if exist, then this Chinese character is labeled as " back word ", be kept in the storer, if there is no following the 7th step then carried out in described back Chinese character;
(5) this " the preceding word " or " back word " that will be stored in the storer be combined into one " interim words and phrases " according to its order in former sentence and this " monosyllabic word ", and should " interim words and phrases " be considered as " current words and phrases ";
(6) search in the Chinese polyphone dictionary of from storer, being stored in advance should " current words and phrases " in the pronunciation of polyphone, if search, then the pronunciation with the polyphone that searched stores in the storer; If do not search, then directly enter next step;
(7) from storer, read this sentence and after words and phrases are divided, be stored in next words and phrases in the storer as " current words and phrases ", if the inspection of each words and phrases that is marked off in the sentence completion in regular turn, promptly carry out following the 8th step, otherwise return third step;
(8) finish the sentence of input is carried out the analysis and the processing of polyphone.
2. the automatic analysis of Chinese polyphone and the method for processing according to claim 1, wherein, this CPU (central processing unit) is the center with this " monosyllabic word ", by searching a Chinese character that is positioned at current " monosyllabic word " front or back in the sentence of being imported, if all do not exist, then finish the analysis and the processing of polyphone to this text strings of input.
CN97116046A 1997-08-15 1997-08-15 Method for automatically analyzing and processing Chinese characters which having more than one sound Expired - Fee Related CN1105979C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN97116046A CN1105979C (en) 1997-08-15 1997-08-15 Method for automatically analyzing and processing Chinese characters which having more than one sound

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN97116046A CN1105979C (en) 1997-08-15 1997-08-15 Method for automatically analyzing and processing Chinese characters which having more than one sound

Publications (2)

Publication Number Publication Date
CN1208901A true CN1208901A (en) 1999-02-24
CN1105979C CN1105979C (en) 2003-04-16

Family

ID=5173637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN97116046A Expired - Fee Related CN1105979C (en) 1997-08-15 1997-08-15 Method for automatically analyzing and processing Chinese characters which having more than one sound

Country Status (1)

Country Link
CN (1) CN1105979C (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101033977B (en) * 2007-04-18 2010-09-22 江苏华科导航科技有限公司 Voice navigation method of navigator
CN102567296A (en) * 2011-01-04 2012-07-11 中国移动通信有限公司 Chinese character information processing method and Chinese character information processing device
CN104599670A (en) * 2015-01-30 2015-05-06 成都星炫科技有限公司 Voice recognition method of touch and talk pen
CN110245071A (en) * 2019-05-07 2019-09-17 北京金山安全软件有限公司 Input method testing method and device, electronic equipment and storage medium
CN112309385A (en) * 2019-08-30 2021-02-02 北京字节跳动网络技术有限公司 Voice recognition method, device, electronic equipment and medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100371987C (en) * 2004-05-13 2008-02-27 深圳市移动核软件有限公司 Method for pronouncing Chinese characters automatically, and method for making handset read aloud short message
CN101324884B (en) * 2008-07-29 2010-06-02 无敌科技(西安)有限公司 Method of polyphone pronunciation

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101033977B (en) * 2007-04-18 2010-09-22 江苏华科导航科技有限公司 Voice navigation method of navigator
CN102567296A (en) * 2011-01-04 2012-07-11 中国移动通信有限公司 Chinese character information processing method and Chinese character information processing device
CN102567296B (en) * 2011-01-04 2016-03-30 中国移动通信有限公司 A kind of disposal route of Chinese character information and the treating apparatus of Chinese character information
CN104599670A (en) * 2015-01-30 2015-05-06 成都星炫科技有限公司 Voice recognition method of touch and talk pen
CN110245071A (en) * 2019-05-07 2019-09-17 北京金山安全软件有限公司 Input method testing method and device, electronic equipment and storage medium
CN110245071B (en) * 2019-05-07 2023-03-14 北京金山安全软件有限公司 Input method testing method and device, electronic equipment and storage medium
CN112309385A (en) * 2019-08-30 2021-02-02 北京字节跳动网络技术有限公司 Voice recognition method, device, electronic equipment and medium

Also Published As

Publication number Publication date
CN1105979C (en) 2003-04-16

Similar Documents

Publication Publication Date Title
CN109726293B (en) Causal event map construction method, system, device and storage medium
CN105718586B (en) The method and device of participle
CN107818085B (en) Answer selection method and system for reading understanding of reading robot
US5642520A (en) Method and apparatus for recognizing topic structure of language data
EP0971294A2 (en) Method and apparatus for automated search and retrieval processing
CN110866089B (en) Robot knowledge base construction system and method based on synonymous multi-context analysis
US11531693B2 (en) Information processing apparatus, method and non-transitory computer readable medium
KR20160138077A (en) Machine translation system and method
Ali et al. Genetic approach for Arabic part of speech tagging
KR100481598B1 (en) Apparatus and method for analyzing compounded morpheme
CN111104803A (en) Semantic understanding processing method, device and equipment and readable storage medium
Tlili-Guiassa Hybrid method for tagging Arabic text
Al-Kabi et al. Statistical classifier of the holy Quran verses (Fatiha and Yaseen chapters)
CN1105979C (en) Method for automatically analyzing and processing Chinese characters which having more than one sound
EP0524694B1 (en) A method of inflecting words and a data processing unit for performing such method
CN110750632B (en) Improved Chinese ALICE intelligent question-answering method and system
CN110705306B (en) Evaluation method for consistency of written and written texts
CN112632272A (en) Microblog emotion classification method and system based on syntactic analysis
CN101499056A (en) Backward reference sentence pattern language analysis method
Prutskov Algorithmic provision of a universal method for word-form generation and recognition
JP7044245B2 (en) Dialogue system reinforcement device and computer program
CN113158666A (en) Keyword extraction method for Chinese problem based on dependency syntax tree
Tukur et al. Parts-of-speech tagging of Hausa-based texts using hidden Markov model
KR20040018008A (en) Apparatus for tagging part of speech and method therefor
Sankaravelayuthan et al. A Parser for Question-answer System for Tamil

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20030416

Termination date: 20100815