JPS60239865A

JPS60239865A - Retrieving device of dictionary

Info

Publication number: JPS60239865A
Application number: JP59095931A
Authority: JP
Inventors: Akiko Nakajima; 中嶋　章子; Hideyuki Takagi; 英行高木
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1984-05-14
Filing date: 1984-05-14
Publication date: 1985-11-28

Abstract

PURPOSE:To process conveniently a long sound in Japanese sound information and simply by converting a long vowel code out of a character string including the long vowel code into a character with priority to retrieve a dictionary. CONSTITUTION:When a certain character string is inputted as sounds and then inputted to a character string conversion part 1, the character string is converted on the basis of the following regulations. Namely, characters other than long vowel codes are not converted. Long vowel codes following the ''a-stage'' are converted into ''a''. Long vowel codes following the ''i-stage'' are converted into ''i''. Long vowel codes following the ''u-stage'' are converted into ''u''. Long vowel codes following the ''e-stage'' are converted into ''i'' as the 1st proposed character and ''e'' as the 2nd proposed character. Long vowel codes following the ''o-stage'' are converted into ''u'' as the 1st proposed character and ''o'' as the 2nd proposed character. Consequently, plural proposed characters are formed and sent to a dictionary matching par 3. The matching part 3 outputs the correct proposed character through a dictionary 2. Consequently, long sounds in the Japanese sound information can be processed conveniently and simply.

Description

【発明の詳細な説明】産業上の利用分野本発明は仮名漢字変換装置における辞書検索装置に係り
、特に長母音記号を含む大刀文字列に対して適切な変換
処理を行なう辞書検索装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a dictionary search device in a kana-kanji conversion device, and more particularly to a dictionary search device that performs appropriate conversion processing on a long vowel character string. be.

従来例の構成とその問題点近年、日本語ワードプロセッサ等の文章作成装置が普及
してきた。この種の装置は、一般に作成文章を仮名文字
入力し、これを単語毎に漢字混じりの文字系列に変換し
て日本語文章を作成していくものである。この仮名漢字
変換は、単語辞書に予め登録された、仮名文字列にそれ
ぞれ対応した漢字を含む文字列を検索することによって
行われる。つまり、仮名文字列に対応して漢字を含む文
字列が与えられるようになっている。従って仮名文字列
を誤って入力した場合には、漢字変換がなされなかった
り、或いは誤った漢字変換文字列が出力されることにな
る。Conventional configuration and its problems In recent years, text creation devices such as Japanese word processors have become popular. This type of device generally creates a Japanese sentence by inputting kana characters into a created sentence, and converting this into a character sequence containing kanji for each word. This kana-kanji conversion is performed by searching for character strings that are registered in the word dictionary in advance and include kanji that respectively correspond to the kana character strings. In other words, character strings containing kanji are given corresponding to kana character strings. Therefore, if a kana character string is inputted incorrectly, the kanji conversion will not be performed or an incorrect kanji conversion character string will be output.

ところで、昨今、音声および音声に対応する字種を指定
するモード情報を入力することで、日本語情報を入力処
理する方式が提案されているが。Incidentally, recently, a method has been proposed in which Japanese information is input and processed by inputting voice and mode information that specifies the type of character corresponding to the voice.

音声により言語を入力する際の多くの問題を生じている
。Many problems arise when inputting languages by voice.

特に長母音記号を含む文字列を入力した場合は。Especially if you enter a string containing long vowel symbols.

その長母音記号をどのような仮名文字に変換して単語辞
書を検索するかが、その検索時間に犬きく影響する。た
とえば入力音声中に「え−」という文字列が含まれる場
合、その長母音記号を仮名文字の「え」に変換した仮名
文字列で単語辞書を検索しても、そのような仮名文字列
が辞書にない場合が多く、新たに別の仮名文字「い」に
変換した別の仮名文字列で検索をしなおす必要があった
。The type of kana character to convert the long vowel symbol to when searching the word dictionary has a significant impact on the search time. For example, if the input voice contains the character string ``e'', even if you search a word dictionary using the kana character string in which the long vowel symbol is converted to the kana character ``e'', such a kana character string will not be found. In many cases, it was not available in the dictionary, and it was necessary to search again using a new kana character string converted to a different kana character "i".

発明の目的本発明の目的は、日本語音声情報中の長音を利便性よく
、かつ簡単に処理することのできる辞書検索装置を提供
することにある。OBJECTS OF THE INVENTION An object of the present invention is to provide a dictionary search device that can conveniently and easily process long sounds in Japanese audio information.

発明の構成本発明の辞書模索装置は、長母音記号を含む文字列を入
力とし、「あ段」の次にある長母音記号を「あ」、「い
段」の次にある長母音記号を「い」。Structure of the Invention The dictionary search device of the present invention takes as input a character string including a long vowel symbol, and searches the long vowel symbol after "Adan" for "A" and the long vowel symbol after "Idan" for "A". "stomach".

「う段」の次にある長母音記号を「う」、「え段」の次
にある長母音記号を「い」または「え」。The long vowel symbol after "Udan" is "U", and the long vowel symbol after "Edan" is "I" or "E".

「お段」の次にある長母音記号を「う」または「お」に
変換して出力し長母音記号が含まれない時はそのまま出
力する文字列変換部と、辞書と、前記文字列変換部から
得られた文字列を前記辞書で検索する辞書照合部とから
構成されるものであり、これにより長母音記号を含む文
字列を書きこ゛とば表現に適切に変換し、カナ漢字変換
の能率をＡ上させることができるものである。A character string converter that converts the long vowel symbol next to "odan" into "u" or "o" and outputs it, and outputs it as is when the long vowel symbol is not included, a dictionary, and the character string converter. and a dictionary matching section that searches the character string obtained from the dictionary in the dictionary.This section appropriately converts character strings containing long vowel symbols into written and written expressions, improving the efficiency of kana-kanji conversion. It is possible to raise the value to A.

実施例の説明図は本発明の実施例における辞書検索装置の構成図を示
すもので、１は文字列を入力し長母音記号が含まれてい
る場合は本発明において定められた規則に従った文字列
に変換する文字列変換部、２は８呆ｍｌの書きことばが
登録されている辞書、３は文字列変換部１から得られた
文字列を辞書２で検索する辞書照合部である。The explanatory diagram of the embodiment shows a configuration diagram of the dictionary search device in the embodiment of the present invention, and 1 indicates that a character string is input, and if a long vowel symbol is included, it is input according to the rules defined in the present invention. A character string conversion section 2 converts into a character string, a dictionary in which 8 ml of written words are registered, and a dictionary collation section 3 searches the dictionary 2 for the character string obtained from the character string conversion section 1.

以上のように構成された本実施例の辞書検索装置につい
て以下その動作を説明する。説明上、長母音記号を長で
表わす。The operation of the dictionary search device of this embodiment configured as described above will be described below. For purposes of explanation, long vowel symbols are expressed as long.

「高校生」を音声入力した場合、正しく認識されれば、
「こ長こ長せ長」が得られる。この文字列が文字列変換
部１に入力されると以下の規則に従って文字列変換され
る。If you input "high school student" by voice and it is recognized correctly,
"Kochokochocho" is obtained. When this character string is input to the character string converter 1, the character string is converted according to the following rules.

（１）長母音記号以外は変換しない（２）「ア段」に続く長母音記号は「あ」に変換する（３）「イ段」に続く長母音記号は「い」に変換する（４）「つ段」に続く長母音記号は「う」に変換す（四
「二段」に続く長母音記号は第１候補として「い」、第
２候補として「え」に変換する（６）「オ段」に続く長
母音記号は第１候補として「う」、第２候補として「お
」に変換する従って、入力文字列「こ長こ長せ長」に対
して第１候補　「こうこうせい」第２候補　「こおこうせい」第２候補　「こうこおせい」第２候補　「こうこうせえ」第６候補　「こおこおせい」第５候補　「こうこおせえ」第６候補　「こおこうせえ」第８候補　「こおこおせえ」が生成され、辞書照合部３に送られる。辞書２には１日
本語の書きことばが登録されているので、辞書検索部３
では第１候補を正解として対応する辞書内容（例えば漢
字コード列）を出力する。(1) Do not convert anything other than the long vowel symbol (2) Convert the long vowel symbol following “A” to “A” (3) Convert the long vowel symbol following “I” to “i” (4) ) The long vowel symbol following "tsu-dan" is converted to "u" (four long vowel symbols following "two-dan" are converted to "i" as the first candidate and "e" as the second candidate (6) The long vowel symbol following "o-dan" is converted to "u" as the first candidate and "o" as the second candidate. Therefore, for the input character string "ko-cho-ko-naga-se-cho", the first candidate is "koukosei". ” 2nd candidate ``Kookoosei'' 2nd candidate ``Koukoosei'' 2nd candidate ``Koukousee'' 6th candidate ``Kookoosei'' 5th candidate ``Kookoosee'' 6th candidate ``Koo The eighth candidate "Ko-ko-osee" is generated and sent to the dictionary collation unit 3. Since one Japanese written word is registered in the dictionary 2, the dictionary search unit 3
Then, the first candidate is regarded as the correct answer and the corresponding dictionary contents (for example, a kanji code string) are output.

上記の例では長母音の組合せとして８候補考えられるわ
けであるから、この順位付けによって辞書検索時間が大
幅に変わることになる。しかし、本発明の辞書検索装置
においては前述の長母音記号変換規則を持つことによっ
て検索時間が最小になる。この根拠は以下に示すように
日本語単語としての存在の偏シに基づくものである。In the above example, there are eight possible combinations of long vowels, so this ranking greatly changes the dictionary search time. However, in the dictionary search device of the present invention, the search time is minimized by having the above-mentioned long vowel symbol conversion rule. The basis for this is the uneven existence of Japanese words, as shown below.

まず、３万語の日本語単語の中から連母音を含む単語を
抽出した。ここで連母音とは、１語において又は語の連
結において母音が二つ以上連続しているものを指すもの
とする。この場合、それぞれ一つの独立した母音音節の
連続とは限らず、前の音節に含まれている母音と、その
直後にある母音節との連続の場合も連母音ということに
する。First, words containing continuous vowels were extracted from 30,000 Japanese words. Here, continuous vowels refer to two or more consecutive vowels in one word or in a combination of words. In this case, it is not limited to the continuation of one independent vowel syllable, but also the continuation of a vowel included in the previous syllable and the vowel syllable that immediately follows it.

例えば、「愛（アイ）」、「甥（オイ）」。For example, "love (ai)" and "nephew (oi)."

「魚（ウオ）」などは連母音であシ、［貝（カイ月。``Fish (Uo)'' etc. are continuous vowels, [shellfish (kai month).

ｒ、１１（：フイ）」、「酸い（スイ）」などもまた連
母音を含んでいる。連母音がある場合に、それぞれの母
音を孤立させて発するものと、あとの母音が直前の母音
に密接に結び付いてあたかも−っの母音として長母音の
ように発するものがある。後者のものを、特に「重母音
」ということにする。r, 11 (: hui)'' and ``sui'' also contain continuous vowels. When there are continuous vowels, there are cases in which each vowel is pronounced in isolation, and cases in which the following vowel is closely connected to the previous vowel and is pronounced as if it were a long vowel. The latter ones are especially called "diphthongs."

前述の３万語の日本語単語の中で連母音を含む単語につ
いて調べた結果、次のようなデータが得られた。As a result of researching words containing continuous vowels among the 30,000 Japanese words mentioned above, the following data was obtained.

■　「え段」＋「い」の文字７１Ｊが「え長」と長音的
に発声され得る単語数は約１４００％　「え段」＋「え
」の文字列が「え長」と長音的に発声され得る単語は僅
か４単語である。■ The number of words in which the character 71J of "edan" + "i" can be pronounced as "echo" in a long consonant manner is approximately 1400% The character string "edan" + "e" can be pronounced as "echo" in a long consonant manner Only four words can be uttered.

■　「お段」＋「う」の文字列が「お長」と長音的に発
声され得る単語数は約３９６０．　「お段」＋「お」の
文字列が「お長」と長音的に発声、され得る単語数は約
２００である。■ The number of words in which the character string "Odan" + "U" can be pronounced as "Onaga" is approximately 3,960. The number of words in which the character string "Odan" + "O" can be pronounced long as "Onaga" is about 200.

自然言語日本語には、上記■、■のような偏シ規則が存
在し、本発明はこの規則に基づいて、前記規則（５）　
、　（６）の特徴を持たせたものである。In the natural language Japanese, there are biased rules such as ■ and ■ above, and the present invention uses the rules (5) based on these rules.
, which has the characteristics of (6).

以上のように本実施例によれば、長母音記号に対する変
換規則を用いることにより、長母音記号を含む文字列に
対しても効果的に正しくカナ漢字変換することができる
。As described above, according to this embodiment, by using the conversion rule for long vowel symbols, it is possible to effectively and correctly convert a character string including long vowel symbols into kana-kanji.

発明の効果以上の説明から明らかなように、本発明は長母音記号を
含むような文字列に対して、長母音記ｔを優先順位をつ
けて文字に変換して辞書検索を行なうように構成してい
るので、長母音記号を含む文字列に対しても効率よく適
切なカナ漢字変換出力を得ることができるという優れた
効果が得られる０Effects of the Invention As is clear from the above explanation, the present invention is configured such that, for character strings including long vowel symbols, the long vowel notation t is prioritized, converted into characters, and dictionary search is performed. Because of this, the excellent effect of being able to efficiently obtain appropriate kana-kanji conversion output even for character strings that include long vowel symbols can be obtained.

【図面の簡単な説明】図は本発明の一実施例における辞書検索装置の構成を示
す概略構成図である。１・・・・・・文字列変換部％２・・・・・・辞書、３
・・・・・・辞書照合部。代理人の氏名　弁理士　中　尾　敏　男　ほか１名人刀
史害列BRIEF DESCRIPTION OF THE DRAWINGS The figure is a schematic configuration diagram showing the configuration of a dictionary search device according to an embodiment of the present invention. 1...Character string conversion section%2...Dictionary, 3
...Dictionary checking section. Name of agent: Patent attorney Toshio Nakao and one other master swordsman

Claims

[Claims]

(1) Input a character string including long vowel symbols, "A" 1
The long vowel symbol next to 0th is "a", the long vowel symbol next to "i-dan" is "i", and the long vowel symbol after "u-dan" is "i".
Change the long vowel symbol after ``u'' or ``edan'' to ``i'' or ``edan''.
Change the long vowel symbol after ``e'' or ``odan'' to ``u'' or ``
a character string converter that converts the character string into "O" and outputs it as is when it does not include a long vowel symbol, a dictionary, and a dictionary matching unit that searches the dictionary for the character string obtained from the character string converter. A dictionary search device consisting of.

(2) The dictionary matching section uses the character string conversion section to check the next VC of "edan".
For a character string that has a certain long vowel symbol converted to "i" or "e", first search the dictionary for the character string converted to "i", and only if it does not exist in the dictionary, search for the character string converted to "e" in the dictionary. 2. A dictionary search device according to claim 1, wherein the dictionary search device performs a search using .

(3) The dictionary matching unit converts the long vowel symbol next to “odan” into “u” or “o” by the character string conversion unit. 2. The dictionary search device according to claim 1, wherein the dictionary is searched for a character string converted to "o" only when the character string does not exist in the dictionary.