JPS62256075A

JPS62256075A - Dictionary retrieving system

Info

Publication number: JPS62256075A
Application number: JP61097935A
Authority: JP
Inventors: Yoshinori Kitahara; 義典北原
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1986-04-30
Filing date: 1986-04-30
Publication date: 1987-11-07

Abstract

PURPOSE:To properly retrieve a dictionary even when various fluctuations in the notation in an item according to an operator by forming the plural patterns in the fluctuation of the notation of a KANJI (Chinese character) KANA (Japanese syllabary) mixed word and retrieving the dictionary as a character sequence candidate. CONSTITUTION:In a dictionary reading part 5, initially, the 'kakitori' having ka of KANJI, ki of KANA, to of KANJI and ri of KANA of the character code sequence candidate is taken out from a buffer memory 4 and the character code sequence is retrieved from the heading of the dictionary 6. Since there is no heading of the 'kakitori' in the dictionary 6, the second character code sequence candidate 'kakito' having ka of KANJI, ki of KANA and to of KANJI is taken out from the buffer memory 4 and said character code string is retrieved from the heading of the dictionary 6. Since there is no heading of the 'kakito' in the dictionary 6, the third character code sequence candidate 'kakitori' having kaki of KANJI, to of KANJI and ri of KANA is taken out from the buffer memory 4 and said character code sequence is retrieved from the heading of the dictionary 6. Since there is the character code string 'kakitori' kaki or KANJI and to of KANJI and ri of KANA in the item of the dictionary 6, the dictionary reading part 5 reads information corresponding to the item of the 'kakitori' from the dictionary 6.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、漢字かな混じり語および文を合成音声に変換
する音声合成装置１項目を入力して情報を読み出す情報
検索装置等１文字列の辞書検索方式に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a speech synthesis device that converts kanji-kana mixed words and sentences into synthesized speech, an information retrieval device that reads information by inputting one item, etc. Concerning dictionary search methods.

[Conventional technology]

従来の方式は、特開昭６０−２４６３２に記載のように
、仮名文字列項目の入力に対しては仮名コードに変換し
、漢字文字列項目の入力に対しては漢字コード及び仮名
コードに変換して辞書検索を行なう方式となっていた。In the conventional method, as described in Japanese Patent Application Laid-Open No. 60-24632, input of a kana character string item is converted to a kana code, and input of a kanji character string item is converted to a kanji code and a kana code. The method was to perform a dictionary search.

しかし、辞書項目には仮名文字列のみ、あるいは漢字文
字列のみのものばかりでなく、漢字仮名混じりの文字列
もあり、このような漢字仮名混じり文字列の表記ゆれに
ついては配慮されていなかった。However, dictionary entries do not only include only kana character strings or only kanji character strings, but also include character strings that include a mixture of kanji and kana, and no consideration was given to such variations in the notation of character strings that include kanji and kana.

[Problem that the invention seeks to solve]

上記従来方式は、漢字仮名混じり文字列の辞書検索につ
いて配慮されておらず、例えば「書き取りｊという辞書
具８しに対して、「書取り」。The above-mentioned conventional method does not take into account the dictionary search of character strings containing kanji and kana, and for example, for the dictionary tool 8 "dictori j", it is "dictori".

「書取」のような文字列が入力されると辞書検索が不可
能になるという問題点があった。また、このような表記
のゆれに対して、辞書項目に考えられる全ての表記を登
録すると、辞書の容量が膨大になるという問題がある。There was a problem in that when a character string such as ``dictori'' was entered, dictionary searches were impossible. Furthermore, if all possible notations are registered in dictionary items due to such variations in notation, there is a problem in that the capacity of the dictionary becomes enormous.

本発明の目的は、辞書項目を増大させることなしに、表
記ゆれのある漢字仮名混じり文字列の入力に対して、正
当な辞書検索が行なわれる方式を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a method that allows a valid dictionary search to be performed on input of a character string containing kanji, kana, and kanji with variations in spelling, without increasing the number of dictionary items.

[Means for solving problems]

上記目的は、辞書項目あるいは六方項目文字列を構成す
る各文字コードが漢字コードであるがあるいは仮名コー
ドであるがを識別し、漢字コードを全て残し１文字コー
ド列の順序を保持したまま、仮名コードを１個以上除去
していき、文字コードの組合せを生成し、辞書検索を行
なうことにより達成される。The above purpose is to identify whether each character code constituting a dictionary entry or hexagonal entry character string is a kanji code or a kana code, and to leave all kanji codes and preserve the order of the one-character code string. This is achieved by removing one or more codes, generating a combination of character codes, and performing a dictionary search.

[Effect]

本発明においては、漢字仮名混じり文字列中の漢字のみ
を残し仮名文字を除去することによって複数の文字列を
生成する。我々は通常漢字仮名混じり語を筆記あるいは
キーλカする際に１例えば「行なう」は「行う」、「書
き取り」は「書取り」や「書取」のように仮名文字は省
略することがあっても漢字は省略することがないので有
効に作用する。In the present invention, a plurality of character strings are generated by leaving only the kanji characters in the kanji-kana-mixed character string and removing the kana characters. When we usually write or write words containing kanji and kana, we sometimes omit the kana characters, for example, ``do'' is used as ``do'', and ``dicti'' is written as ``dictori'' or ``dictori''. However, kanji are not omitted, so it works effectively.

〔Example〕

以下、本発明の一実施例を図を使用して説明する。 Hereinafter, one embodiment of the present invention will be described using the drawings.

第１図は本発明の一実施例の構成図である。文字コード
識別部１は入力された文字コード列を構成する各文字コ
ードが漢字コードであるが仮名コードであるかを識別す
る処理ブロック、文字列生成部２は入力文字列を格納し
たバッファメモリ３より文字コードを読み出し、辞書検
索のための文字列候補を生成し、バッファメモリ４に格
納する処理ブロック、辞書読み出し部５は文字列生成部
２において生成されバッファメモリ４に格納された文字
列と辞９６の見出しとを照合し一致すれば辞書項目情報
を読み出す処理ブロックである。FIG. 1 is a block diagram of an embodiment of the present invention. A character code identification unit 1 is a processing block that identifies whether each character code constituting an input character code string is a kanji code or a kana code, and a character string generation unit 2 is a buffer memory 3 that stores input character strings. A processing block that reads out character codes, generates character string candidates for dictionary search, and stores them in the buffer memory 4. The dictionary reading unit 5 reads character strings generated in the character string generation unit 2 and stores them in the buffer memory 4. This is a processing block that compares the heading of the dictionary 96 and reads dictionary item information if there is a match.

次に、各処理ブロックの動作を例を用いて説明する１例
として、辞書６には「書取り」という見出しが存在し、
入力コード文字列が「書き取り」である場合を考える。Next, as an example to explain the operation of each processing block using an example, there is a heading "dictori" in the dictionary 6,
Consider the case where the input code string is "dictation".

入力コード文字列は、文字コード識別部１において各構
成文字コードが各々漢字コードであるかあるいは仮名コ
ードであるがを識別し、漢字コードであれば１、仮名コ
ードであれば０を字種フラグとして文字コードとともに
バッファメモリ３に書き込む、したがって「書き取り」
の入力文字列に対しては第２図のように「書」および「
取」の字種フラグが１となり。In the input code character string, the character code identification unit 1 identifies whether each constituent character code is a kanji code or a kana code, and sets the character type flag to 1 if it is a kanji code and 0 if it is a kana code. write to buffer memory 3 along with the character code as ``write''
For input character strings, "calligraphy" and "calligraphy" as shown in Figure 2.
The character type flag for "tori" becomes 1.

「き」および「す」は０となる。ただし、コード列中の
特定のビットあるいはコード列の構成より容易に漢字コ
ードおよび仮名コードの識別が可能である場合には字種
フラグは不要である０次に、文字列生成部２は、バッフ
ァメモリ３に書き込まれている文字コードのうち字種フ
ラグが１である文字コード「書」および「取」は必ず残
し１文字コードの順序は保持したまま字種フラグが０で
ある文字コード「き」および「す」のうちがら０個以上
の文字コードを除去して文字コード列の組合せを作すバ
ッファメモリ４に格納する。「書き取り」の例では、第
３図に示すように「書き取り」、「書き取」、「書取り
」、「書取」の文字コード列候補がメモリバッファ４に
格納される。"ki" and "su" are 0. However, if the kanji code and kana code can be easily identified from specific bits in the code string or the structure of the code string, the character type flag is not necessary. Among the character codes written in memory 3, the character codes "Ki" and "Tori" whose character type flag is 1 must be left out, and the character codes "Ki" whose character type flag is 0 are retained while preserving the order of the character codes. '' and ``su'', zero or more character codes are removed and stored in a buffer memory 4 that creates a combination of character code strings. In the example of ``dictation'', character code string candidates of ``dictation'', ``dictation'', ``dictation'', and ``dictation'' are stored in the memory buffer 4, as shown in FIG.

辞書読み出し部５では、バッファメモリ４よりまず文字
コード列候補「書き取り」を取り出し、辞書６の見出し
の中から該文字コード列を検索する。辞書６の中には「
書き取り」という見出しは存在しないので、次にバッフ
ァメモリ４より第２番目の文字コード列候補「書き取」
を取り出し辞書６の見出しの中から該文字コード列を検
索する。The dictionary reading unit 5 first takes out the character code string candidate "Kitori" from the buffer memory 4 and searches for the character code string among the headings in the dictionary 6. Dictionary 6 contains “
Since there is no heading ``Kitori'', next select the second character code string candidate ``Kitori'' from buffer memory 4.
is retrieved and the character code string is searched from among the headings in the dictionary 6.

辞書６の中にはｒ書き取」という見出しは存在しないの
で、次にバッファメモリ４より第３番目の文字列コード
候補「書取り」を取り出し辞書６の見出しの中から該文
字コード列を検索する１文字コード列「書取り」は辞書
６の項目中に存在するので、辞書読み出し部５は辞書６
より「書取りＪの項目に対応する情報を読出す。Since there is no heading ``r dictori'' in the dictionary 6, the third character string code candidate ``dictori'' is retrieved from the buffer memory 4 and searched for the character code string among the headings in the dictionary 6. Since the one-character code string "Kidori" exists in the dictionary 6, the dictionary reading unit 5 reads the dictionary 6.
``Read out the information corresponding to the item of dictation J.''

以上は本発明の一実施例であり、入力文字コード列より
検索文字列候補を生成し、辞書項目を検索する方式にな
っているが、辞書項目の文字コード列より発生し得る表
記ゆれパタンを生成してもよい、つまり、まず入力文字
コード列に対して、該文字コード列の第１文字目の漢字
コードと同一の漢字コードで始まる辞書項目を検索する
０次に、該辞書項目の文字コード列について、各文字コ
ードが漢字コードであるか仮名コードであるかを識別し
、該文字コード列中の漢字コードは全て残し、文字コー
ド列の順序を保持したまま仮名コードを０個以上除去す
ることによって複数の文字コード列候補を生成する。そ
して、該複数文字コード列候補の各々と入力文字コード
との照合を行ない、該複数文字コード列候補のいずれか
と一致すれば。The above is an embodiment of the present invention, in which a search character string candidate is generated from an input character code string and a dictionary item is searched. In other words, first, the input character code string is searched for a dictionary entry that starts with the same kanji code as the first character of the character code string. Next, the characters of the dictionary entry are searched for. Regarding the code string, identify whether each character code is a kanji code or a kana code, leave all the kanji codes in the character code string, and remove 0 or more kana codes while maintaining the order of the character code string. By doing this, multiple character code string candidates are generated. Then, each of the plurality of character code string candidates is compared with the input character code, and if it matches any one of the plurality of character code string candidates.

該辞書項目に対応する辞書情報を読み出す１以上のよう
に構成することもできる。It is also possible to configure one or more reading dictionary information corresponding to the dictionary item.

〔Effect of the invention〕

本発明によれば、漢字仮名混じり語について複数の表記
のゆらぎのバタンを生成し文字列候補として辞書検索を
行なうので１人によって項目に様様な表記のゆれが生じ
ても正当に辞書検索を行なうことができる。また、辞書
に複数の表記のゆらぎのパタンを登録しておく必要がな
いので辞書容量を増大させることもない。According to the present invention, a plurality of spelling variations for words containing kanji and kana are generated and a dictionary search is performed as a character string candidate, so even if one person has various spelling variations for an item, the dictionary search can be performed correctly. be able to. Furthermore, since there is no need to register a plurality of spelling fluctuation patterns in the dictionary, the dictionary capacity does not need to be increased.

[Brief explanation of drawings]

第１図は本発明の一実施例の構成図、第２図は第１図中
のバッファメモリ３に文字列「書き取り」が格納された
場合の一例を示す図、第３図は第１図中のバッファメモ
リ４に文字列「書き取り」より生成された文字列候補が
格納された場合の一実施例を示す図である。１・・・文字コード識別部、２・・・文字列生成部、３
・・・メモリバッファ、４・・・メモリバッファ、５・
・・辞書読み出し部、６・・・辞書、７・・・文字コー
ド、８・・・字゛ζ′１、〜FIG. 1 is a block diagram of an embodiment of the present invention, FIG. 2 is a diagram showing an example of a case where the character string "write" is stored in the buffer memory 3 in FIG. 1, and FIG. 3 is a diagram similar to the one shown in FIG. FIG. 4 is a diagram showing an example in which a character string candidate generated from the character string "dicti" is stored in the buffer memory 4 in the computer. 1...Character code identification section, 2...Character string generation section, 3
...Memory buffer, 4...Memory buffer, 5.
...Dictionary reading section, 6...Dictionary, 7...Character code, 8...Character ゛ζ'1, ~

Claims

[Claims]

1. In a dictionary search device that searches a dictionary for an item corresponding to a kanji/kana mixed character code string by notation and reads out the dictionary item information, each character code constituting the dictionary item character code string or input item character code string is a kanji code. Dictionary item string or input item string by identifying whether it is a character code string or a kana code, leaving all kanji codes in the character code string, and removing all kana codes while maintaining the order of the character code string. A dictionary search method characterized by generating a dictionary and performing a dictionary search.