JPS58123126A

JPS58123126A - Dictionary retrieving device

Info

Publication number: JPS58123126A
Application number: JP57004708A
Authority: JP
Inventors: Kimito Takeda; 武田　公人; Noriko Yamanaka; 紀子山中; Kazuo Yanai; 矢内　一生; Hiromi Saito; 裕美斎藤; Tsutomu Kawada; 河田　勉
Original assignee: Toshiba Corp; Tokyo Shibaura Electric Co Ltd
Current assignee: Toshiba Corp
Priority date: 1982-01-14
Filing date: 1982-01-14
Publication date: 1983-07-22

Abstract

PURPOSE:To convert a character string containing characters which are hardly discriminated from the pronunciation into a KANJI (Chinese character) character string effectively by generating an auxiliary character string wherein the characters which are hardly discriminated are substituted by other characters with the same pronunciation when said character string is supplied. CONSTITUTION:An input KANA (Japanese syllabary) character string is stored temporarily in a temporary storage device 1 and supplied to a dictionary look-up control part 3 through a code conversion part 2. The code conversion part 2 is equipped with a conversion table part 4 wherein characters which are hardly discriminated from pronunciation are stored corresponding to characters with the same pronunciation and when the input KANA character string stored in the storage device 1 contains characters which can not be discriminated from pronunciation, those characters are substituted by other characters and code conversion is performed. Then, an auxiliary KANA character string generated by the code conversion is also supplied to the dictionary look-up control part 3 similarly with the input KANA character string.

Description

【発明の詳細な説明】発明の技術分野本発明は仮名漢字変換装置における辞書検索装置に係り
、特に発音では区別することのできない仮名文字を含む
入力文字列忙対して適切な変換処理を可能とする辞書検
索装置に関する。[Detailed Description of the Invention] Technical Field of the Invention The present invention relates to a dictionary search device in a kana-kanji conversion device, and in particular, to a dictionary search device for a kana-kanji conversion device, and in particular to a dictionary search device that enables appropriate conversion processing for busy input character strings containing kana characters that cannot be distinguished by pronunciation. The present invention relates to a dictionary search device.

発明の技術的背景近時５日本語ワード・プロセッサ等の文章作成装置が普
及してきな、この種の装置は、一般に作成文章を仮名文
字入力し、これを単語毎に漢字混りの文字系列に変換し
て日本語文章を作成していくものである。この仮名・漢
字変換は、単語辞書に予め登録されえ、仮名文字列にそ
れぞれ対応し九漢字を含む文字列を検索することによっ
て行われる。つマ）、仮名文字列に対応して漢字を含む
文字列が与えられるようになりている。従って仮名文字
列を誤って入力した場合には、当然のことながら漢字変
換がなされなかり九シ、或いは誤った漢字変換文字列が
出力されることＫなる。Technical Background of the Invention Recently, text creation devices such as Japanese word processors have become popular.This type of device generally inputs kana characters into the created text and converts them into a character sequence containing kanji for each word. It converts and creates Japanese sentences. This kana/kanji conversion is performed by searching for character strings that can be registered in a word dictionary in advance and include nine kanji characters that correspond to each kana character string. ), character strings containing kanji are now given in response to kana character strings. Therefore, if a kana character string is inputted incorrectly, the kanji conversion will not be performed and either 9 shi or an incorrect kanji conversion character string will be output.

背景技術の問題点とζろが日本語単語の中には、「ず」と「づ」。Problems with background technology Among the Japanese words that are ζro are ``zu'' and ``zu.''

「じ」と「ぢ」等のように発音上区別できない文字を含
むものが多くある。この場合、御名文字列を正しく入力
しないと、誤った漢字変換が行われることになる０例え
ば「築く」なる文章を作成せんとする場合、仮名文字列
として「きずく］と入力する必要があるが、このとき「
きづく」と入力すると「気付く」なる漢字変換出力を得
ることになる。従って、このようなときには、仮名文字
列の再入力による文章訂正が必要となる。ｔ＋近年、音
声入力による文章編集が指向されている。仁の装置は音
声入力された音韻を解析して発音文字列化し喪のち、漢
字変換して文章化するものであるが、このときにも上述
した発音上区別することのできない文字の取扱いが問題
となる。Many of them contain characters that are phonetically indistinguishable, such as ``ji'' and ``ji''. In this case, if you do not enter the name string correctly, incorrect kanji conversion will occur.For example, if you want to create the sentence ``to build'', you need to enter ``kizuku'' as the kana string. ,At this time"
If you input ``Kizuku'', you will get the kanji conversion output ``Kizuku''. Therefore, in such a case, it is necessary to correct the text by re-entering the kana character string. t+ In recent years, text editing using voice input has become more popular. Jin's device analyzes the phonemes input by voice, converts them into phonetic strings, and after mourning, converts them into kanji and converts them into written text. However, the above-mentioned handling of characters that cannot be distinguished in terms of pronunciation is a problem. becomes.

発１内の目的本宛男はこのような事情を考慮してなされたもので、そ
の目的とするところは、発音上区別することのできない
仮名文字が含まれる文字列が入１ｊされた場合であって
も、簡易に適切な漢字変換出力を見出すことの可能な実
用性の高い辞薯検索装置を提供することにある。The purpose of Message 1 was to take this situation into consideration, and its purpose is to handle cases where a character string containing kana characters that cannot be distinguished phonetically is entered. To provide a highly practical dictionary search device that can easily find an appropriate kanji conversion output even if there is a kanji conversion output.

発明の概要本発明は、発音からは区別することのできない文字を含
む文字列が与えられ大とき、上記文字と同発音の他の文
字にて上記文字を蓋換えた補助文字列を生成し、この補
助文字列と前記入力文字列との両方をもって単語辞書を
検索することにより簡易に正しい変換出力である漢字を
含む文字列を得るようにした４のである。SUMMARY OF THE INVENTION The present invention, when given a character string containing characters that cannot be distinguished from each other based on pronunciation, generates an auxiliary character string in which the character is replaced with another character that has the same pronunciation as the character, By searching a word dictionary using both this auxiliary character string and the input character string, a character string containing Chinese characters, which is a correct conversion output, can be easily obtained.

発明の効果従って本発明によれば発音上向じくなる単語であっても
、その発音を表わす文字列として入力するだけで簡易に
適切な漢字変換処理を行うことが可能となる。従って、
オ（レータの誤った仮名文字入力に対しても、また音声
入力によシ与えられる仮名文字列であっても常に正確に
漢字変換して日本語文章を作成することが可能となり、
その利点効果は絶大である。また装置構成およびその制
御性もさほど複雑化することがない等の効果が−せられ
る。Effects of the Invention Therefore, according to the present invention, even if a word is difficult to pronounce, it is possible to easily perform appropriate Kanji conversion processing by simply inputting the word as a character string representing the pronunciation. Therefore,
It is now possible to always accurately convert kanji to kanji and create Japanese sentences, even when the operator inputs incorrect kana characters, or when inputting kana characters by voice input.
The benefits and effects are enormous. Further, the device configuration and its controllability are not so complicated.

発明の実施例以下、図面を参照して本発明の一実施例につき説明する
。Embodiment of the Invention Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

図は実施例装置の要部を示す概略構成図である。文章作
成に供される仮名文字列は、図示しない入力装置を介し
て文字コード化され、また早−毎ｔζ区切られる等の前
処理がなされたのち入力される。この人力装愛は、カナ
・中−＆　−ド等の鍵盤装置にて構成されたり、音声入
力認識装置で構成されたりする。しかしてこのよう罠し
て入力された入力仮名文字列は、一時紀憶装置１にて一
旦紀憧されたのち、コード変換部２全介して辞書引き制
御部３に与えられるようになっている。コード変換部２
Ｉ／ｉ、発音では区別することのできない文字を、同発
音で異りな文字と内応して記憶した変換テーブル記憶部
４をＩｆｔえており、前記記憶装置１に記憶された入力
仮名文字列中に上記発音では区別することのできない文
字が含まれているとき、その文字を他の文字に＃Ｉｔ侯
してコード変換を行っている。The figure is a schematic configuration diagram showing the main parts of the embodiment device. The kana character string used for text creation is inputted after being converted into a character code through an input device (not shown), and subjected to preprocessing such as being separated by tζ. This human power system may be constructed from a keyboard device such as kana, middle & -do, or a voice input recognition device. However, the input kana character string input in such a trap is temporarily stored in the memory device 1 and then given to the dictionary lookup control section 3 via the code conversion section 2. . Code converter 2
I/i has a conversion table storage unit 4 in which characters that cannot be distinguished by pronunciation are stored in correspondence with characters that have the same pronunciation but are different from each other. When a character that cannot be distinguished by the above pronunciation is included, code conversion is performed by converting that character to another character.

そして、このコード変換により生成され九補助板名文字
列も、前記入力仮名文字列と同様にして前記辞書引き制
御部３に与えられるようになっている。The nine supplementary board name character strings generated by this code conversion are also provided to the dictionary lookup control section 3 in the same manner as the input kana character strings.

即ち、変換テーブル記憶部４＃′ｉ、「じ」と「ぢ」、
「ず」と「づ」、等の発音上、両者を明確に区別できな
い文字コードを相互に対応して記憶している。これによ
シ、例えば「きず〈」なる入力仮名文字列が与えられ九
とき、「きづく」なる補助板名文字列が生成されるよう
になりている。ま九個の例としては、「おおじ」　。That is, conversion table storage unit 4#'i, "ji" and "ji",
Character codes such as "zu" and "zu" that cannot be clearly distinguished due to their pronunciation are stored in correspondence with each other. With this, for example, when an input kana character string "Kizu" is given, an auxiliary board name character string "Kizuku" is generated. An example of nine is ``Oji''.

「おりじ」岬が挙げられ、これに対しても同様な処理が
施される。Cape ``Oriji'' is an example, and similar processing is applied to this as well.

一方、単語辞書である辞書記憶部５＃′ｉ、漢字を含む
文字列からなる単語と、その発音を示す仮名文字列とを
対応させて複数重１ＩＫｊｉりで記憶登録している。こ
れらの単語情報は、予め登録設定された−ので、例えば
「あ」項から「ん」項に至るまで、順にアドレス割轟て
して登録されている。？：、のような辞書記憶部５に対
して、前記辞書引き制御部１は先ず、与えられた入力仮
名文字列と、辞書中の仮名文字列との照合を順次行い、
七〇一致検出を行っている。そして、一致検出され九単
語辞書中の漢字混りの文字列を選択的に読出している。On the other hand, in the dictionary storage unit 5#'i, which is a word dictionary, words consisting of character strings including kanji and kana character strings indicating their pronunciation are stored and registered in correspondence with each other in multiple layers. Since these pieces of word information have been registered in advance, they are registered by assigning addresses in order, for example, from the term "A" to the term "N". ? For the dictionary storage unit 5 such as :, the dictionary lookup control unit 1 first sequentially compares the given input kana character string with the kana character string in the dictionary,
70 match detection is performed. Then, the character strings containing kanji in the nine-word dictionary that have been detected as a match are selectively read out.

尚、このとき、上記入力仮名文字列が発音によって区別
できない文字を含むことがないときには、上記選択され
た漢字混りの文字列は、その１１漢字変換出力データと
して出力されることになる。また入力文字列が発音によ
って区別できない文字を含むことがなく、しかも単語辞
書中に該当単語を見出し得な匹ときには、上記入力仮名
文字列はその１１出力される。一方、発音に工って区別
できない文字が入力仮名文字列中に含まれる場合、辞書
引き制御部３は、先ず入力仮名文字列に対して一飴辞書
の検索を行い、続いて文字コード変換により作成され九
補助板名文字列について学諸辞書の検索を行う。そして
、これらの両検索によって求められた漢字混り文字から
なる変僧出力を変換候補として得る。そして、この変換
候補出力のうちの正しいものの選択は、例えばこれを？
イス！レイ表示してオペレータの指示？こよね行つたり
、或いは文脈解析に基づいて行り九り、文章中の出現Ｓ
変の情報を参照する等して行われる。At this time, if the input kana character string does not include any characters that cannot be distinguished by pronunciation, the selected character string containing kanji will be output as the 11 kanji conversion output data. Further, when the input character string does not include any characters that cannot be distinguished by pronunciation and the corresponding word cannot be found in the word dictionary, the input kana character string is outputted as the 11th one. On the other hand, if the input kana character string contains characters that cannot be distinguished due to pronunciation, the dictionary lookup control unit 3 first searches the Ichime dictionary for the input kana character string, and then performs character code conversion. Perform a search in academic dictionaries for the nine auxiliary board name character strings created. Then, a strange monk output consisting of characters containing kanji found by both searches is obtained as a conversion candidate. And how to select the correct one among these conversion candidate outputs, for example, this?
chair! Ray display and operator instructions? Occurrence S in a sentence by going through or going through context analysis
This is done by, for example, referring to information about changes.

かくしてこのように構成され九本装置によれげ、「きす
く」と入力され大入力仮名文字列に対しては「きづく」
なる補助板名文字列が生成され、これらの文字列と単語
辞書との照合によシ「築＜」「気付く」なる変換出力が
求められ、そのうちの適切なものが選択出力される仁と
になる。を九「はなじ」と誤って入力され九入力仮名文
字列に対して「はなぢ」なる補助板名文字列によ〕「鼻
血」なる変換出力が得られる。Thus, when ``kisuku'' is inputted by the nine devices configured in this way, and a large input kana character string is input, ``kizuku'' is returned.
Auxiliary board name character strings are generated, and by comparing these character strings with a word dictionary, conversion outputs such as ``chiku<'' and ``notice'' are obtained, and the appropriate one is selected and output. Become. is incorrectly entered as ``Hanaji'', and the converted output ``Nosebleed'' is obtained by using the auxiliary board name string ``Hanaji'' for the input kana character string 9.

同様に「まじか」なる入力仮名文字列については「ｔぢ
か」なる補助板名文字列により「間近」なる漢字変換出
力が得られるととＫなる。Similarly, for the input kana character string ``majika'', the kanji conversion output ``near'' is obtained by the auxiliary board name character string ``tjika''.

このようにして、本装置にあっては、発音上区別するこ
とのできない文字を含む文字列が与見られても、これを
効ｌ果的に正しい漢字文字に変換す、よ＄−ｃｌ＆。従
９７、オイい−ｚｏ　　　　　’午−人力オスによる文
章作成の畝りや、これの校正作業を行う等Ｏ煩られしさ
かなくなる。ｆた音声入力による文章作成に対しても十
分に対処することが可能とな）、その効果は非常に大き
い。In this way, even if a character string containing phonetically indistinguishable characters is encountered, this device can effectively convert it into correct Kanji characters. 97, Oii-zo' 小- There will be no hassles such as having to manually create sentences and proofreading them. (It is also possible to sufficiently cope with text creation using voice input), and its effects are extremely large.

を走装置構成としても、同発音の文字を認識し、変換テ
ーブルを参照して補正仮名文字列を生成するだけで良い
ので、実用上極めて好都合である。従って、極めて簡易
に、且つ効果的な辞書検索が可能となる。Even if the system is configured as a scanning device, it is extremely convenient in practice because it is only necessary to recognize the characters with the same pronunciation and generate a corrected kana character string by referring to the conversion table. Therefore, dictionary searches can be performed extremely easily and effectively.

尚、本発明は上記実施例に限定されるものではない。例
えば単語辞書中に同発音で文字の異なる文字列を相互に
対応させて登録しておき、入力された文字列により、こ
れらを同時に検索するようにしてもよい。また入力文字
列に対して一旦辞薔照合を行ったのち、該当単語が存在
しない場合や変換出力が不適当なものであるとき罠、文
字コード変換を行って再度辞書照合を行うように制御ジ
−タンスを定めてもよい。畏するに本発明はその要旨を
逸脱しない範囲で種神實杉して実施することができる。Note that the present invention is not limited to the above embodiments. For example, character strings with the same pronunciation and different characters may be registered in a word dictionary in a manner that corresponds to each other, and these may be simultaneously searched based on the input character string. In addition, after once performing dictionary verification on the input character string, if the corresponding word does not exist or the converted output is inappropriate, the control module performs character code conversion and performs dictionary verification again. - A chest of drawers may be established. It is to be noted that the present invention can be practiced in various ways without departing from the scope thereof.

[Brief explanation of drawings]

図は本発明の一実施例装置の要部を示す概略構成図であ
る。１・・・一時記憶装瞳、２・・・コード変換部、１・・
・辞書引き制御部、４・・・変換テーブル記憶部、Ｉ・
・・辞書記憶部。The figure is a schematic configuration diagram showing the main parts of an apparatus according to an embodiment of the present invention. 1...Temporary memory device pupil, 2...Code converter, 1...
・Dictionary lookup control unit, 4... Conversion table storage unit, I・
...Dictionary storage section.

Claims

[Claims]

A word dictionary that stores a plurality of words including kanji in correspondence with character strings representing their pronunciations, a storage device that temporarily stores input round board name character strings, and the input kana character strings that cannot be distinguished by pronunciation. Means for generating an auxiliary board name character string in which, when a kana character is included, the kana character is replaced with another kana character with the same pronunciation, and the word dictionary is searched and inputted respectively for the input kana character string and the auxiliary board name character string. 9. A dictionary search device comprising: a dictionary matching unit for obtaining a word including a kanji character indicated in a kana character string.