JPS60239865A - Retrieving device of dictionary - Google Patents

Retrieving device of dictionary

Info

Publication number
JPS60239865A
JPS60239865A JP59095931A JP9593184A JPS60239865A JP S60239865 A JPS60239865 A JP S60239865A JP 59095931 A JP59095931 A JP 59095931A JP 9593184 A JP9593184 A JP 9593184A JP S60239865 A JPS60239865 A JP S60239865A
Authority
JP
Japan
Prior art keywords
dictionary
character string
long vowel
converted
long
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP59095931A
Other languages
Japanese (ja)
Inventor
Akiko Nakajima
中嶋 章子
Hideyuki Takagi
英行 高木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP59095931A priority Critical patent/JPS60239865A/en
Publication of JPS60239865A publication Critical patent/JPS60239865A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

PURPOSE:To process conveniently a long sound in Japanese sound information and simply by converting a long vowel code out of a character string including the long vowel code into a character with priority to retrieve a dictionary. CONSTITUTION:When a certain character string is inputted as sounds and then inputted to a character string conversion part 1, the character string is converted on the basis of the following regulations. Namely, characters other than long vowel codes are not converted. Long vowel codes following the ''a-stage'' are converted into ''a''. Long vowel codes following the ''i-stage'' are converted into ''i''. Long vowel codes following the ''u-stage'' are converted into ''u''. Long vowel codes following the ''e-stage'' are converted into ''i'' as the 1st proposed character and ''e'' as the 2nd proposed character. Long vowel codes following the ''o-stage'' are converted into ''u'' as the 1st proposed character and ''o'' as the 2nd proposed character. Consequently, plural proposed characters are formed and sent to a dictionary matching par 3. The matching part 3 outputs the correct proposed character through a dictionary 2. Consequently, long sounds in the Japanese sound information can be processed conveniently and simply.

Description

【発明の詳細な説明】 産業上の利用分野 本発明は仮名漢字変換装置における辞書検索装置に係り
、特に長母音記号を含む大刀文字列に対して適切な変換
処理を行なう辞書検索装置に関するものである。
DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a dictionary search device in a kana-kanji conversion device, and more particularly to a dictionary search device that performs appropriate conversion processing on a long vowel character string. be.

従来例の構成とその問題点 近年、日本語ワードプロセッサ等の文章作成装置が普及
してきた。この種の装置は、一般に作成文章を仮名文字
入力し、これを単語毎に漢字混じりの文字系列に変換し
て日本語文章を作成していくものである。この仮名漢字
変換は、単語辞書に予め登録された、仮名文字列にそれ
ぞれ対応した漢字を含む文字列を検索することによって
行われる。つまり、仮名文字列に対応して漢字を含む文
字列が与えられるようになっている。従って仮名文字列
を誤って入力した場合には、漢字変換がなされなかった
り、或いは誤った漢字変換文字列が出力されることにな
る。
Conventional configuration and its problems In recent years, text creation devices such as Japanese word processors have become popular. This type of device generally creates a Japanese sentence by inputting kana characters into a created sentence, and converting this into a character sequence containing kanji for each word. This kana-kanji conversion is performed by searching for character strings that are registered in the word dictionary in advance and include kanji that respectively correspond to the kana character strings. In other words, character strings containing kanji are given corresponding to kana character strings. Therefore, if a kana character string is inputted incorrectly, the kanji conversion will not be performed or an incorrect kanji conversion character string will be output.

ところで、昨今、音声および音声に対応する字種を指定
するモード情報を入力することで、日本語情報を入力処
理する方式が提案されているが。
Incidentally, recently, a method has been proposed in which Japanese information is input and processed by inputting voice and mode information that specifies the type of character corresponding to the voice.

音声により言語を入力する際の多くの問題を生じている
Many problems arise when inputting languages by voice.

特に長母音記号を含む文字列を入力した場合は。Especially if you enter a string containing long vowel symbols.

その長母音記号をどのような仮名文字に変換して単語辞
書を検索するかが、その検索時間に犬きく影響する。た
とえば入力音声中に「え−」という文字列が含まれる場
合、その長母音記号を仮名文字の「え」に変換した仮名
文字列で単語辞書を検索しても、そのような仮名文字列
が辞書にない場合が多く、新たに別の仮名文字「い」に
変換した別の仮名文字列で検索をしなおす必要があった
The type of kana character to convert the long vowel symbol to when searching the word dictionary has a significant impact on the search time. For example, if the input voice contains the character string ``e'', even if you search a word dictionary using the kana character string in which the long vowel symbol is converted to the kana character ``e'', such a kana character string will not be found. In many cases, it was not available in the dictionary, and it was necessary to search again using a new kana character string converted to a different kana character "i".

発明の目的 本発明の目的は、日本語音声情報中の長音を利便性よく
、かつ簡単に処理することのできる辞書検索装置を提供
することにある。
OBJECTS OF THE INVENTION An object of the present invention is to provide a dictionary search device that can conveniently and easily process long sounds in Japanese audio information.

発明の構成 本発明の辞書模索装置は、長母音記号を含む文字列を入
力とし、「あ段」の次にある長母音記号を「あ」、「い
段」の次にある長母音記号を「い」。
Structure of the Invention The dictionary search device of the present invention takes as input a character string including a long vowel symbol, and searches the long vowel symbol after "Adan" for "A" and the long vowel symbol after "Idan" for "A". "stomach".

「う段」の次にある長母音記号を「う」、「え段」の次
にある長母音記号を「い」または「え」。
The long vowel symbol after "Udan" is "U", and the long vowel symbol after "Edan" is "I" or "E".

「お段」の次にある長母音記号を「う」または「お」に
変換して出力し長母音記号が含まれない時はそのまま出
力する文字列変換部と、辞書と、前記文字列変換部から
得られた文字列を前記辞書で検索する辞書照合部とから
構成されるものであり、これにより長母音記号を含む文
字列を書きこ゛とば表現に適切に変換し、カナ漢字変換
の能率をA上させることができるものである。
A character string converter that converts the long vowel symbol next to "odan" into "u" or "o" and outputs it, and outputs it as is when the long vowel symbol is not included, a dictionary, and the character string converter. and a dictionary matching section that searches the character string obtained from the dictionary in the dictionary.This section appropriately converts character strings containing long vowel symbols into written and written expressions, improving the efficiency of kana-kanji conversion. It is possible to raise the value to A.

実施例の説明 図は本発明の実施例における辞書検索装置の構成図を示
すもので、1は文字列を入力し長母音記号が含まれてい
る場合は本発明において定められた規則に従った文字列
に変換する文字列変換部、2は8呆mlの書きことばが
登録されている辞書、3は文字列変換部1から得られた
文字列を辞書2で検索する辞書照合部である。
The explanatory diagram of the embodiment shows a configuration diagram of the dictionary search device in the embodiment of the present invention, and 1 indicates that a character string is input, and if a long vowel symbol is included, it is input according to the rules defined in the present invention. A character string conversion section 2 converts into a character string, a dictionary in which 8 ml of written words are registered, and a dictionary collation section 3 searches the dictionary 2 for the character string obtained from the character string conversion section 1.

以上のように構成された本実施例の辞書検索装置につい
て以下その動作を説明する。説明上、長母音記号を長で
表わす。
The operation of the dictionary search device of this embodiment configured as described above will be described below. For purposes of explanation, long vowel symbols are expressed as long.

「高校生」を音声入力した場合、正しく認識されれば、
「こ長こ長せ長」が得られる。この文字列が文字列変換
部1に入力されると以下の規則に従って文字列変換され
る。
If you input "high school student" by voice and it is recognized correctly,
"Kochokochocho" is obtained. When this character string is input to the character string converter 1, the character string is converted according to the following rules.

(1)長母音記号以外は変換しない (2)「ア段」に続く長母音記号は「あ」に変換する (3)「イ段」に続く長母音記号は「い」に変換する (4)「つ段」に続く長母音記号は「う」に変換す(四
「二段」に続く長母音記号は第1候補として「い」、第
2候補として「え」に変換する(6)「オ段」に続く長
母音記号は第1候補として「う」、第2候補として「お
」に変換する従って、入力文字列「こ長こ長せ長」に対
して第1候補 「こうこうせい」 第2候補 「こおこうせい」 第2候補 「こうこおせい」 第2候補 「こうこうせえ」 第6候補 「こおこおせい」 第5候補 「こうこおせえ」 第6候補 「こおこうせえ」 第8候補 「こおこおせえ」 が生成され、辞書照合部3に送られる。辞書2には1日
本語の書きことばが登録されているので、辞書検索部3
では第1候補を正解として対応する辞書内容(例えば漢
字コード列)を出力する。
(1) Do not convert anything other than the long vowel symbol (2) Convert the long vowel symbol following “A” to “A” (3) Convert the long vowel symbol following “I” to “i” (4) ) The long vowel symbol following "tsu-dan" is converted to "u" (four long vowel symbols following "two-dan" are converted to "i" as the first candidate and "e" as the second candidate (6) The long vowel symbol following "o-dan" is converted to "u" as the first candidate and "o" as the second candidate. Therefore, for the input character string "ko-cho-ko-naga-se-cho", the first candidate is "koukosei". ” 2nd candidate ``Kookoosei'' 2nd candidate ``Koukoosei'' 2nd candidate ``Koukousee'' 6th candidate ``Kookoosei'' 5th candidate ``Kookoosee'' 6th candidate ``Koo The eighth candidate "Ko-ko-osee" is generated and sent to the dictionary collation unit 3. Since one Japanese written word is registered in the dictionary 2, the dictionary search unit 3
Then, the first candidate is regarded as the correct answer and the corresponding dictionary contents (for example, a kanji code string) are output.

上記の例では長母音の組合せとして8候補考えられるわ
けであるから、この順位付けによって辞書検索時間が大
幅に変わることになる。しかし、本発明の辞書検索装置
においては前述の長母音記号変換規則を持つことによっ
て検索時間が最小になる。この根拠は以下に示すように
日本語単語としての存在の偏シに基づくものである。
In the above example, there are eight possible combinations of long vowels, so this ranking greatly changes the dictionary search time. However, in the dictionary search device of the present invention, the search time is minimized by having the above-mentioned long vowel symbol conversion rule. The basis for this is the uneven existence of Japanese words, as shown below.

まず、3万語の日本語単語の中から連母音を含む単語を
抽出した。ここで連母音とは、1語において又は語の連
結において母音が二つ以上連続しているものを指すもの
とする。この場合、それぞれ一つの独立した母音音節の
連続とは限らず、前の音節に含まれている母音と、その
直後にある母音節との連続の場合も連母音ということに
する。
First, words containing continuous vowels were extracted from 30,000 Japanese words. Here, continuous vowels refer to two or more consecutive vowels in one word or in a combination of words. In this case, it is not limited to the continuation of one independent vowel syllable, but also the continuation of a vowel included in the previous syllable and the vowel syllable that immediately follows it.

例えば、「愛(アイ)」、「甥(オイ)」。For example, "love (ai)" and "nephew (oi)."

「魚(ウオ)」などは連母音であシ、[貝(カイ月。``Fish (Uo)'' etc. are continuous vowels, [shellfish (kai month).

r、11(:フイ)」、「酸い(スイ)」などもまた連
母音を含んでいる。連母音がある場合に、それぞれの母
音を孤立させて発するものと、あとの母音が直前の母音
に密接に結び付いてあたかも−っの母音として長母音の
ように発するものがある。後者のものを、特に「重母音
」ということにする。
r, 11 (: hui)'' and ``sui'' also contain continuous vowels. When there are continuous vowels, there are cases in which each vowel is pronounced in isolation, and cases in which the following vowel is closely connected to the previous vowel and is pronounced as if it were a long vowel. The latter ones are especially called "diphthongs."

前述の3万語の日本語単語の中で連母音を含む単語につ
いて調べた結果、次のようなデータが得られた。
As a result of researching words containing continuous vowels among the 30,000 Japanese words mentioned above, the following data was obtained.

■ 「え段」+「い」の文字71Jが「え長」と長音的
に発声され得る単語数は約1400% 「え段」+「え
」の文字列が「え長」と長音的に発声され得る単語は僅
か4単語である。
■ The number of words in which the character 71J of "edan" + "i" can be pronounced as "echo" in a long consonant manner is approximately 1400% The character string "edan" + "e" can be pronounced as "echo" in a long consonant manner Only four words can be uttered.

■ 「お段」+「う」の文字列が「お長」と長音的に発
声され得る単語数は約3960. 「お段」+「お」の
文字列が「お長」と長音的に発声、され得る単語数は約
200である。
■ The number of words in which the character string "Odan" + "U" can be pronounced as "Onaga" is approximately 3,960. The number of words in which the character string "Odan" + "O" can be pronounced long as "Onaga" is about 200.

自然言語日本語には、上記■、■のような偏シ規則が存
在し、本発明はこの規則に基づいて、前記規則(5) 
、 (6)の特徴を持たせたものである。
In the natural language Japanese, there are biased rules such as ■ and ■ above, and the present invention uses the rules (5) based on these rules.
, which has the characteristics of (6).

以上のように本実施例によれば、長母音記号に対する変
換規則を用いることにより、長母音記号を含む文字列に
対しても効果的に正しくカナ漢字変換することができる
As described above, according to this embodiment, by using the conversion rule for long vowel symbols, it is possible to effectively and correctly convert a character string including long vowel symbols into kana-kanji.

発明の効果 以上の説明から明らかなように、本発明は長母音記号を
含むような文字列に対して、長母音記tを優先順位をつ
けて文字に変換して辞書検索を行なうように構成してい
るので、長母音記号を含む文字列に対しても効率よく適
切なカナ漢字変換出力を得ることができるという優れた
効果が得られる0
Effects of the Invention As is clear from the above explanation, the present invention is configured such that, for character strings including long vowel symbols, the long vowel notation t is prioritized, converted into characters, and dictionary search is performed. Because of this, the excellent effect of being able to efficiently obtain appropriate kana-kanji conversion output even for character strings that include long vowel symbols can be obtained.

【図面の簡単な説明】 図は本発明の一実施例における辞書検索装置の構成を示
す概略構成図である。 1・・・・・・文字列変換部%2・・・・・・辞書、3
・・・・・・辞書照合部。 代理人の氏名 弁理士 中 尾 敏 男 ほか1名人刀
史害列
BRIEF DESCRIPTION OF THE DRAWINGS The figure is a schematic configuration diagram showing the configuration of a dictionary search device according to an embodiment of the present invention. 1...Character string conversion section%2...Dictionary, 3
...Dictionary checking section. Name of agent: Patent attorney Toshio Nakao and one other master swordsman

Claims (3)

【特許請求の範囲】[Claims] (1)長母音記号を含む文字列を入力とし、「ア段」1
0次にある長母音記号を「あ」、「い段」の次にある長
母音記号を「い」、「う段」の次にある長母音記号を「
う」、「え段」の次にある長母音記号を「い」または「
え」、「お段」の次にある長母音記号を「う」または「
お」に変換して出力し、長母音記号が含まれない時はそ
のまま出力する文字列変換部と、辞書と、前記文字列変
換部から得られた文字列を前記辞書で検索する辞書照合
部とからなる辞書検索装置。
(1) Input a character string including long vowel symbols, "A" 1
The long vowel symbol next to 0th is "a", the long vowel symbol next to "i-dan" is "i", and the long vowel symbol after "u-dan" is "i".
Change the long vowel symbol after ``u'' or ``edan'' to ``i'' or ``edan''.
Change the long vowel symbol after ``e'' or ``odan'' to ``u'' or ``
a character string converter that converts the character string into "O" and outputs it as is when it does not include a long vowel symbol, a dictionary, and a dictionary matching unit that searches the dictionary for the character string obtained from the character string converter. A dictionary search device consisting of.
(2)辞書照合部は、文字列変換部が「え段」の次VC
ある長母音記号を「い」または「え」に変換した文字列
についてハ、「い」に変換した文字列を最初に辞書検索
し辞書に存在しない場合のみ「え」に変換した文字列を
辞書で検索することを特徴とする特許請求の範囲第1項
記載の辞書検索装置。
(2) The dictionary matching section uses the character string conversion section to check the next VC of "edan".
For a character string that has a certain long vowel symbol converted to "i" or "e", first search the dictionary for the character string converted to "i", and only if it does not exist in the dictionary, search for the character string converted to "e" in the dictionary. 2. A dictionary search device according to claim 1, wherein the dictionary search device performs a search using .
(3)辞書照合部は、文字列変換部が「お段」の次にあ
る長母音記号を「う」または「お」に変換した文字列に
ついてハ、「う」に変換した文字列を最初に辞書検索し
辞書に存在しない場合のみ「お」に変換した文字列を辞
書で検索することを特徴とする特許請求の範囲第1項記
載の辞書検索装置。
(3) The dictionary matching unit converts the long vowel symbol next to “odan” into “u” or “o” by the character string conversion unit. 2. The dictionary search device according to claim 1, wherein the dictionary is searched for a character string converted to "o" only when the character string does not exist in the dictionary.
JP59095931A 1984-05-14 1984-05-14 Retrieving device of dictionary Pending JPS60239865A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP59095931A JPS60239865A (en) 1984-05-14 1984-05-14 Retrieving device of dictionary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP59095931A JPS60239865A (en) 1984-05-14 1984-05-14 Retrieving device of dictionary

Publications (1)

Publication Number Publication Date
JPS60239865A true JPS60239865A (en) 1985-11-28

Family

ID=14151013

Family Applications (1)

Application Number Title Priority Date Filing Date
JP59095931A Pending JPS60239865A (en) 1984-05-14 1984-05-14 Retrieving device of dictionary

Country Status (1)

Country Link
JP (1) JPS60239865A (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5464446A (en) * 1977-10-31 1979-05-24 Fujitsu Ltd Information processing system for japanese word

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5464446A (en) * 1977-10-31 1979-05-24 Fujitsu Ltd Information processing system for japanese word

Similar Documents

Publication Publication Date Title
JP3196868B2 (en) Relevant word form restricted state transducer for indexing and searching text
US6108627A (en) Automatic transcription tool
Elshafei et al. Statistical methods for automatic diacritization of Arabic text
JPH01501977A (en) language translation system
CN1989547A (en) Handling of acronyms and digits in a speech recognition and text-to-speech engine
Sullivan et al. Novel-word pronunciation: A cross-language study
Vasiu et al. Enhancing tokenization by embedding romanian language specific morphology
JPS60239865A (en) Retrieving device of dictionary
KR100474359B1 (en) A Method for the N-gram Language Modeling Based on Keyword
JP2005063030A (en) Method for expressing concept, method and device for creating expression of concept, program for implementing this method, and recording medium for recording this program
Chaware et al. Rule-based phonetic matching approach for Hindi and Marathi
Osborne et al. Learning unification-based grammars using the Spoken English Corpus
JPS6229796B2 (en)
KR19990015131A (en) How to translate idioms in the English-Korean automatic translation system
Rao et al. Word boundary hypothesization in Hindi speech
JPH11250063A (en) Retrieval device and method therefor
JP2798931B2 (en) Chinese phonetic delimiter and phonetic kanji conversion
Chang et al. A corpus-based statistical approach to automatic book indexing
KR20230155156A (en) Voice recognition system by using a single tokenizer and method of tokenization in the voice recognition system
JPS61128364A (en) Retrieving device of dictionary
Wang et al. Learning Mutually Informed Representations for Characters and Subwords
CN115050034A (en) Full-component recognition algorithm for modern Tibetan syllable characters
Minn Myanmar Word Stemming and POS Tagging using Rule Based Approach
JPS61125678A (en) Dictionary retrieving device
JPH0916575A (en) Pronunciation dictionary device