JPH0338765A

JPH0338765A - Method and device for processing character

Info

Publication number: JPH0338765A
Application number: JP1174459A
Authority: JP
Inventors: Yukie Kinugawa; 衣川　幸恵; Junichi Kubota; 淳市久保田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1989-07-05
Filing date: 1989-07-05
Publication date: 1991-02-19

Abstract

PURPOSE:To detect the swing of the KATAKANA (square form of Japanese syllabary) description included in a compound word by dividing a KATAKANA character string consisting of plural KATAKANA words into those KATAKANA words stored in a KATAKANA description dictionary and processing each of these KATAKANA words. CONSTITUTION:A KATAKANA character string extracting part 12 extracts the KATAKANA character strings out of the sentences, and a KATAKANA compound word deciding part 14 decides whether the KATAKANA character strings are equal to the compound words or not. A KATAKANA compound word dividing part 15 divides a KATAKANA character string decides as a compound word into each KATAKANA word based on the KATAKANA word description stored in a KATAKANA description dictionary 13. Then a KATAKANA character string deforming part 16 replaces or deletes partly a KATAKANA character string and deforms the partial character string. A swing candidate detecting part 18 compares the deformed results with each other and detects the same deformed results in a group. Then a description swing deciding part 19 compares the original descriptions of each group with each other to detect the swinging descriptions. Thus it is possible to detect the description swings of KATAKANA words which are used in the compound words.

Description

【発明の詳細な説明】産業上の利用分野本発明１よ　文書処理を目的とした文字処理装置および
その方法に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention 1 relates to a character processing device and method for document processing.

従来の技術日本語文章の中では外来語を表わすためにカタカナが用
いられている。しかしながら、外来語のカタカナ表記方
法は一定でなく、一つの文章中で１つの外来語について
複数の表記が混在していることが多鶏　このような現象
（よ　文章の統一性を損な（＼　読み易さを阻害すると
言われている。これに対し　カタカナ表記のゆれを自動
的に検出するカタカナ表記のゆれ検出装置が近゛年考案
されている。　（特開昭６２−２９０９６５）第２図は
　前記の従来のカタカナ表記のゆれ検出装置の構成国で
ある。In conventional technical Japanese texts, katakana is used to represent foreign words. However, the method of writing foreign words in katakana is not fixed, and it is common for a single foreign word to be written in multiple ways in one sentence. It is said that this hinders readability.In response to this problem, a katakana notation deviation detection device that automatically detects katakana notation deviations has been devised in recent years. (Japanese Unexamined Patent Publication No. 62-290965) Figure 2 is the constituent country of the conventional katakana notation fluctuation detection device.

図１＋、２１１４　　文章記憶部であり、入力された文
章を記憶する。文章記憶部２１としてｉ＆Ｉｃメモリ、
磁気ディスク装置などが用いられている。1+, 2114 is a text storage unit and stores the input text. i&Ic memory as the text storage unit 21;
Magnetic disk devices and the like are used.

２２（戴　カタカナ列抽出部であり、文章記憶部２１が
記憶している文章の中からカタカナ列を抽出する。２３
（よ　カタカナ列記憶部であり、カタカナ列抽出部２２
によって抽出されたカタカナ列とともに　文章記憶部２
１に記憶された文章における位置の情報も記憶する。カ
タカナ列記憶部２３として（友　文章記憶部２１と同様
に　ＩＣメモリ、磁気ディスク装゛置などが用いられて
いる。　２４１よりり、カナ列変形部であり、カタカナ
列記憶部２３に記憶されたカタカナ列に対してカタカナ
あるいは部分カタカナ列を削除または置換することによ
って変形を加゛える。　２５は　変形結果一時記憶部で
あり、カタカナ列変形部２４によって変形された結果を
、カタカナ列記憶部２３に記憶されたカタカナ列と対応
づけて記憶する。２６（よ　変形結果比較部であり、変
形結果一時記憶部２５に記憶された変形結果の一致する
カタカナ列の一群を検出する。２７（よ　カタカナ列比
較部であり、変形結果の一致する変形前のカタカナ列の
一群についてカタカナ列記憶部２３に記憶されたカタカ
ナ列を比較することによって、変形結果は一致するが変
形前のカタカナ列は異なるカタカナ列の一群を検出する
。２８４；Ｌ　　ゆれ部分表示部であり、カタカナ列比
較部２７によって検出されたカタカナ列の一群を表示す
る。ゆれ部分表示部２８として４ｉたとえ＋；１ｌ１１
．ｃＲＴデイスプレィ、液晶デイスプレィなどを用いる
。表示の方法は　検出されたカタカナ列の部分を反転表
示したり、カラー表示したりする方法がある。これら以
外にも構成要素が存在する力交　本発明との対比のため
には必要がないので省略する。22 (Dai) This is a katakana sequence extraction unit, which extracts katakana sequences from the sentences stored in the sentence storage unit 21. 23
(Yo is the katakana string storage section, and the katakana string extraction section 22
Along with the katakana string extracted by
Information about the position in the sentence stored in No. 1 is also stored. As the katakana string storage section 23, an IC memory, a magnetic disk device, etc. are used (similar to the sentence storage section 21). Transformation is added to the katakana string by deleting or replacing katakana or partial katakana strings. 25 is a transformation result temporary storage section, and the result transformed by the katakana string transformation section 24 is stored in the katakana string storage section 23. This is a transformation result comparison section that detects a group of katakana sequences that match the transformation results stored in the transformation result temporary storage section 25. 27 (yo) The column comparison unit compares the katakana strings stored in the katakana string storage unit 23 with respect to a group of untransformed katakana strings that match the deformation results, and compares the katakana strings that match the deformation results but differ from the katakana strings before transformation. Detects a group of columns. 284;L This is a wobbling part display section and displays a group of katakana strings detected by the katakana string comparison section 27. As the wobbling part display section 28, 4i +; 1l11
．． A cRT display, liquid crystal display, etc. is used. There are several display methods, such as highlighting the detected katakana string or displaying it in color. Force exchange in which there are other constituent elements besides these These are omitted as they are not necessary for comparison with the present invention.

上記のように構成された従来のカタカナ表記のゆれ検出
装置において、まず、文章中からひと続きのカタカナ文
字列を抽出し　次に　「フエ」と「へ」などのように相
互にゆれの可能性がある部分を一方に変形した後、変形
結果を比較してゆれを検出している。たとえば　文章中
に「インタフェース」と「インクフェイス」が混在する
とき、「フェーヘ」、　「ヘイーヘー」、　ｒ−一（削
除）」のようなカタカナ文字列の削除または置換を行っ
て、　「インタフェース」と「インタフェイス」を、次
のように変形する。　「インタフェース」→「インクヘ
ース」→「インクヘス」、「インクフェイス」→「イン
タヘイス」→「インタヘース」→「インタヘス」。この
結電　「インタフェース」と「インタフェイス」（ヨ　
　変形結果が一致し　しかも変形前のカタカナ列は異な
るので表記がゆれていると判定する。In the conventional katakana notation deviation detection device configured as above, first, a continuous katakana character string is extracted from a sentence, and then the possibility of mutual deviation, such as "hue" and "he", is extracted. After deforming a certain part in one direction, the deformation results are compared to detect fluctuations. For example, when "interface" and "ink face" are mixed in a sentence, delete or replace the katakana strings such as "fehe", "heihe", r-1 (delete), and write "interface". Transform the "interface" as follows. ``Interface'' → ``Inkheath'' → ``Inkheath'', ``Inkface'' → ``Interface'' → ``Interface'' → ``Interface''. This electrical connection “interface” and “interface”
Since the transformation results match, and the katakana strings before transformation are different, it is determined that the notation is distorted.

発明が解決しようとする課題従来のカタカナ表記めゆれ検出装置を用いてカタカナ表
記のゆれを検出する場合、カタカナ文学列を抽出する際
に　文章中でひと続きになっているカタカナ文字列を抽
出して、　１つのカタカナ語としていｔも　　このたム
　１つの文章中で「マンマシンインタフェース」のよう
な複合語と「インタフェイス」が存在して、複合語の中
に含まれている「インタフェース」の部分の表記がゆれ
ている場合には　表記のゆれとして検出することができ
なかっ氾本発明（よ　従来のカタカナ表記のゆれ検出装置が有し
ていた前記の問題点に鑑ム　文章中のカタカナ文字列で
、複数のカタカナ語が連なって１つのカタカナ文字列を
形成している場合に　分割して１つずつのカタカナ語と
して処理することにより、複合語の中で使用されている
カタカナ語と単独で使われているカタカナ語・の表記の
ゆれを検出できる文字処理装置およびその方法を提供す
ることを目的とする。Problem to be Solved by the Invention When detecting deviations in katakana notation using a conventional katakana notation deviation detection device, when extracting katakana literary strings, it is difficult to extract katakana character strings that are continuous in a sentence. So, as one katakana word, there is also a compound word such as ``man-machine interface'' and ``interface'' in one sentence, and ``interface'' is included in the compound word. If the notation in the ``Katakana'' part is deviated, it cannot be detected as a ``deviation in the notation''. When a character string has multiple katakana words connected to form a single katakana character string, by dividing the string and processing it as each katakana word, you can separate the katakana words used in the compound word and the katakana words used in the compound word. An object of the present invention is to provide a character processing device and method capable of detecting variations in the notation of Katakana words used alone.

課題を解決するための手段（１）文章を記憶する文章記憶部と、前記文章記憶部に
記憶している文章の中からカタカナ文字列を抽出するカ
タカナ列抽出部と、カタカナ語の表記を少なくとも１つ
以上記憶しているカタカナ表記辞書と、カタカナ文字列
がカタカナ語の複合語であるか否かを前記カタカナ表記
辞書に記憶しているカタカナ語と比較することにより判
定するカタカナ複合語判定部と、前記カタカナ複合語判
定部でカタカナ複合語であると判定されたカタカナ文字
列を前記カタカナ表記辞書に記憶しているカタカナ語に
よって１単語ずつに分割するカタカナ複合語分割部と、
カタカナ文字列中の部分カタカナ列を書き換えるカタカ
ナ列変形部と、前記カタカナ列変形部で変形した変形結
果を元のカタカナ文字列と対応づけて一時記憶する変形
結果一時記憶部と、前記変形結果一時記憶部に一時記憶
している変形結果同志を比較して、一致するものを検出
するゆれ候補検出部と、前記ゆれ候補検出部で検出した
ゆれ候補の中で元のカタカナ表記が異なるものがあるか
否かを判定する表記のゆれ判定部とを備えた文字処理装
置である。Means for Solving the Problems (1) A sentence storage unit that stores sentences, a katakana string extraction unit that extracts katakana character strings from the sentences stored in the sentence storage unit, and a katakana string extraction unit that extracts at least katakana character strings from the sentences stored in the sentence storage unit. A katakana compound word determination unit that determines whether a katakana character string is a katakana compound word by comparing it with one or more katakana notation dictionaries stored in the katakana notation dictionary. and a katakana compound word division unit that divides the katakana character string determined to be a katakana compound word by the katakana compound word determination unit into words by katakana words stored in the katakana notation dictionary;
a katakana string transformation section that rewrites a partial katakana string in a katakana string; a transformation result temporary storage section that temporarily stores the transformation result transformed by the katakana string transformation section in association with the original katakana character string; A shake candidate detection unit that compares deformation results temporarily stored in a storage unit and detects a match, and some of the shake candidates detected by the shake candidate detection unit have different original katakana notation. This is a character processing device that includes a notation fluctuation determination unit that determines whether or not.

（２）カタカナ語の表記を少なくとも１つ以上記憶して
いるカタカナ表記辞書を有し　文章の中からカタカナ文
字列を抽出するカタカナ列抽出段階と、カタカナ文字列
がカタカナ語の複合語であるか否かを前記カタカナ表記
辞書に記憶しているカタカナ語と比較することにより判
定するカタカナ複合語判定段階と、前記カタカナ複合語
判定段階でカタカナ複合語であると判定されたカタカナ
文字列を前記カタカナ表記辞書に記憶しているカタカナ
語によってｌ単語ずつに分割するカタカナ複合語分割段
階と、カタカナ文字列中の部分カタカナ列を書き換える
カタカナ列変形段階と、変形結果同志を比較して、一致
するものを検出するゆれ候補検出段階と、前記ゆれ候補
検出段階で検出したゆれ候補の中で元のカタカナ表記が
異なるものがあるか否かを判定する表記のゆれ判定段階
とを備えた文字処理方法である。(2) Have a katakana notation dictionary that stores at least one katakana notation, and a katakana string extraction stage that extracts katakana character strings from a sentence, and whether the katakana character string is a compound word of katakana. a katakana compound word determination step in which the katakana character string determined to be a katakana compound word in the katakana compound word determination step is determined by comparing it with the katakana words stored in the katakana notation dictionary; A katakana compound word division stage in which each katakana word is divided into l words based on the katakana words stored in the notation dictionary, a katakana string transformation stage in which partial katakana strings in a katakana character string are rewritten, and the transformation results are compared to determine which ones match. A character processing method comprising: a distortion candidate detection step for detecting a distortion candidate; and a writing variation determination step for determining whether any of the distortion candidates detected in the distortion candidate detection step has a different original katakana notation. be.

作用本発明は前記した構成より、１力タカナ列抽出部力交　
文章中からカタカナ文字列を抽出し　カタカナ複合語判
定部？！　　カタカナ文字列が複合語であるか否かを判
定する。カタカナ複合語分割部は複合語であると判定さ
れたカタカナ文字列をカタカナ表記辞書に記憶している
カタカナ語表記で１つずつのカタカナ語に分割する。次
に　カタカナ列変形部？Ｌ　　カタカナ文字列の部分文
字列を置換または削除して変形を加える。さらに　ゆれ
候補検出部（よ　変形結果を比較して同じものをグルー
プにして検出し　表記のゆれ判定部（よ　各グルー９− プの元の表記を比較して表記がゆれているものを検出す
る。Operation The present invention has the above-mentioned configuration, so that the force exchange of the single force Takana sequence extraction part is
Extracts katakana character strings from sentences and determines katakana compound words? ! Determine whether a katakana string is a compound word. The katakana compound word division unit divides the katakana character string determined to be a compound word into each katakana word using the katakana notation stored in the katakana notation dictionary. Next is the katakana sequence transformation part? L Transforms a katakana string by replacing or deleting a substring. Furthermore, the distortion candidate detection section (Y) compares the deformation results and detects the same ones as groups, and the notation variation determination section (Y) compares the original notation of each group and detects those whose notation is distorted. .

実施例以下、本発明の実施例を図面を用いて説明する。Example Embodiments of the present invention will be described below with reference to the drawings.

第１図（よ　本発明における一実施例の文字処理装置の
構成図である。FIG. 1 is a block diagram of a character processing device according to an embodiment of the present invention.

第１図において１１は　文章記憶部であり、入力された
文章を記憶する。　１２（よ　カタカナ列抽出部であり
、文章記憶部１１が記憶している文章の中からカタカナ
文字列を１つずつ順に抽出する。In FIG. 1, numeral 11 is a text storage section, which stores input text. 12 (Y) This is a katakana string extraction section, which sequentially extracts katakana character strings one by one from the sentences stored in the sentence storage section 11.

１３は　カタカナ表記辞書であり、複合語でないカタカ
ナ語で表記にゆれが生じないもののみを記憶する。　１
４（よ　カタカナ複合語判定部であり、カタカナ列抽出
部１２で抽出したカタカナ文字列とカタカナ表記辞書１
３で記憶しているカタカナ語表記を照合して抽出したカ
タカナ文゛字列が複合語であるか否かを判定する。　１
５　ｉ：Ｌ　　カタカナ複合語分割部であり、カタカナ
複合語判定部１４が複合語であると判定したカタカナ列
をカタカナ表記辞書１３で記憶しているカタカナ語表記
と照合１〇− してｌ単語ずつカタカナ語に分割す４１６１；Ｌカタカ
ナ列変形部であり、抽出あるいは分割されたカタカナ語
の部分カタカナ列を書き換える。　１７は　変形結果一
時記憶部であり、カタカナ列変形部１６で書き換えたカ
タカナ列を元のカタカナ文字列と対応づけて、カタカナ
列抽出部１２で抽出された順に全て一時記憶する。　１
８【よ　ゆれ候補検出部であり、カタカナ列抽出部１２
からカタカナ文字列の抽出終了の指示が与えられると、
変形結果一時記憶部１７に一時記憶している変形結果を
比較して、一致するものをグループにして検出する。　
１９は　表記のゆれ判定部であり、ゆれ候補検出部１８
で検出されたゆれ候補の各グループの中で元の表記が異
なるものがあるか否かを判定する。２０（よ　ゆれ部分
表示部であり、表記のゆれ判定部１９で表記の異なるも
のがあると判定されたグループのカタカナ文字列を表示
する。ゆれ部分表示部２０としてｌ友　ｃＲＴデイスプ
レィ、液晶デイスプレィなどがある。表記方法として（
よゆれのグループの一覧表示する方法負　文章中に１おいてゆれの部分を反転表示　下線表示して他の部分と
異なる表示をする方法などがある。Reference numeral 13 is a katakana notation dictionary, which stores only katakana words that are not compound words and do not have any variations in their notation. 1
4 (Yo) is a katakana compound word determination unit that extracts katakana character strings and katakana notation dictionary 1 by katakana string extraction unit 12.
In step 3, it is determined whether the extracted katakana character string is a compound word by comparing the katakana notation stored in the memory. 1
5 i:L is a katakana compound word dividing unit, which compares the katakana string that the katakana compound word determining unit 14 has determined to be a compound word with the katakana notation stored in the katakana notation dictionary 13, and then converts it into l words. Divide into katakana words 4161; L katakana string transformation unit, which rewrites the partial katakana strings of the extracted or divided katakana words. Reference numeral 17 denotes a transformation result temporary storage section, which temporarily stores all the katakana strings rewritten by the katakana string transformation section 16 in the order in which they were extracted by the katakana string extraction section 12, in association with the original katakana character strings. 1
8 [Yo Shake candidate detection unit, Katakana sequence extraction unit 12
When an instruction to finish extracting a katakana string is given from
The transformation results temporarily stored in the transformation result temporary storage section 17 are compared, and matching results are detected as a group.
Reference numeral 19 denotes a notation deviation determination unit, and a deviation candidate detection unit 18
It is determined whether or not any of the groups of shake candidates detected in the above have different original notations. 20 (This is a wobbling part display section, and displays the katakana character strings of groups for which the notation wobbling judgment section 19 has determined that there are different notations. There is a notation method (
How to display a list of wobbling groups Negative There are methods such as placing 1 in a sentence and displaying the wobbling part in reverse video or underlining it to display it differently from other parts.

第３図（よ　本発明の一実施例の文字処理方法を説明す
るフロー図である。FIG. 3 is a flow diagram illustrating a character processing method according to an embodiment of the present invention.

３１＆１　　カタカナ列抽出段階であり、文章記憶部１
１で記憶している文章中からカタカナ文字列を抽出する
。３２（よ　カタカナ複合語判定段階であり、　３１で
抽出したカタカナ文字列が複合語であるか否かを判定す
る。複合語であれば３３の段階に進へ　複合語でなけれ
ば３４の段階に進払３３　Ｃｌ　　カタカナ複合語分割
段階であり、複合語をカタカナ表記辞書１３に記憶して
いるカタカナ語表記によって１つずつの単語に分割する
。　３４（よ　カタカナ列変形段階であり、　３２で複
合語でないと判定されたカタカナ表記１　　または３３
で分割されたカタカナ列の部分カタカナ列を一定の規則
にしたがって書き換える。書き換えた変形結果は変形結
果一時記憶部１７に一時記憶される。　３５（上　文章
中からカタカナ文字列をすべて抽出したか否かを判定し
　まだカタカナ文字列が残ってい２− れば３１の段階に戻る。カタカナ列をすべて抽出したら
３６の段階に進払　３６は　ゆれ候補検出段階であり、
変形結果一時記憶部１７に一時記憶している変形結果同
士を比較して同じものをグループにして検出す＆３７ｉ
上　ゆれ判定段階であり、　３６で検出したカタカナ列
の各グループの元の表記を比較して表記の異なるものが
あれば　表記がゆれていると判定する。31 & 1 Katakana string extraction stage, sentence storage section 1
Extract the katakana character string from the memorized sentence in step 1. 32 (Yo) This is the katakana compound word determination stage, and it is determined whether the katakana character string extracted in 31 is a compound word. If it is a compound word, proceed to step 33. If it is not a compound word, proceed to step 34. Shinpay 33 Cl This is the katakana compound word division stage, in which compound words are divided into words one by one according to the katakana notation stored in the katakana notation dictionary 13. Katakana notation 1 or 33 determined not to be a word
Rewrite the partial katakana string of the katakana string divided by according to certain rules. The rewritten transformation result is temporarily stored in the transformation result temporary storage section 17. 35 (above) Determine whether all the katakana character strings have been extracted from the text, and if there are still katakana character strings remaining, go back to step 31. If all the katakana strings have been extracted, proceed to step 36. This is the shaking candidate detection stage.
Compare the deformation results temporarily stored in the deformation result temporary storage section 17 and detect the same results as a group &37i
In this stage, the original notation of each group of katakana sequences detected in step 36 is compared, and if there is a different notation, it is determined that the notation is deviated.

以上のように構成された本実施例の文字処理装置および
その方法について以下その動作を例を用いて具体的に説
明する。The operation of the character processing device and method of this embodiment configured as described above will be specifically explained below using an example.

第４図（よ　文章記憶部１１に記憶されている文章例で
ある。この文章中で（ヨ「ワインメーカ」と「メーカー
」の表記がゆれている。FIG. 4 is an example of a sentence stored in the text storage unit 11. In this sentence, the words ``wine maker'' and ``manufacturer'' are written interchangeably.

第５図１よ　カタカナ表記辞書１２が記憶しているカタ
カナ表記の例である。FIG. 5 1 is an example of katakana notation stored in the katakana notation dictionary 12.

第６図は　カタカナ列変形部１６で変形を行う際に用い
る変形規則の例である。FIG. 6 shows an example of transformation rules used when the Katakana string transformation unit 16 performs transformation.

まず、カタカナ列抽出部１２（上　文章記憶部１１に記
憶している文章の中から、カタカナ文字列１３− を順に抽出する。最初に「ワインメーカ」が抽出される
。そこで、カタカナ複合語判定部１４（友カタカナ列抽
出部１２で抽出されたカタカナ文字列中にカタカナ表記
辞書１３に記憶しているカタカナ語表記が含まれるか否
かを判定する。含まれている場合、カタカナ複合語分割
部１５（よ　カタカナ表記辞書に記憶しているカタカナ
語表記によってカタカナ抽出部１２で抽出されたカタカ
ナ文字列を分割する。　「ワインメーカ」の場合、カタ
カナ表記辞書に記憶している「ワイン」が含まれている
ので「ワイン」と残り部分の「メーカ」の２つに分割す
る。カタカナ抽出部１２で抽出されたカタカナ文字列が
カタカナ表記辞書に記憶しているカタカナ語のいずれか
と一致するとき、またはどの部分文字列もカタカナ表記
辞書に記憶しているカタカナ語と一致しないときは　抽
出したカタカナ文字列は複合語でないと判定する。First, the katakana character string 13- is sequentially extracted from the sentences stored in the katakana string extraction unit 12 (upper sentence storage unit 11). “Winemaker” is extracted first. Then, katakana compound word judgment is performed. Part 14 (Determine whether or not the Katakana character string extracted by the friend Katakana string extraction part 12 includes the Katakana notation stored in the Katakana notation dictionary 13. If so, perform Katakana compound word division. Part 15 (Yo) The katakana character string extracted by the katakana extraction unit 12 is divided according to the katakana notation stored in the katakana notation dictionary. In the case of "wine maker", "wine" stored in the katakana notation dictionary is Since it contains "wine", it is divided into two parts: "wine" and the remaining part "manufacturer".When the katakana character string extracted by the katakana extraction unit 12 matches any of the katakana words stored in the katakana notation dictionary , or if none of the substrings match the katakana words stored in the katakana notation dictionary, it is determined that the extracted katakana string is not a compound word.

次に　カタカナ列変形部１６（よ　第６図に示す変形規
則にしたがって、複合語でないと判定されたカタカナ語
またはカタカナ複合語分割部におい４− て分割されたカタカナ語のうちの特定の部分カタカナ列
に対して削除または置換を行う。Next, according to the transformation rules shown in FIG. Delete or replace columns.

この例に従って変形を行うと、　「ワイン」→「ワイ」
、　「メーカ」→「メカ」となる。　　さら？Ｑ変変形
結果随時記憶部１’Ｎｉ　変形結果「ワイ」に対しては
文章中の先頭位置を示す「４」とカタカナ文字列の長さ
を示す「３」を対応付けて記憶し「メカ」に対しては「
７」と「３」を対応付けて記憶する。　「ワインメーカ
」に対しての変形処理が終了すると、カタカナ列抽出部
１２ｉ、ｔ、　　文章記憶部１１に記憶している文章の
中から次のカタカナ文字列を抽出し　カタカナ列変形部
１６で同様の変形を加える。　２番目に抽出される「メ
ーカー」（ヨ「メカ」に変形する。以下、文章記憶部１
１に記憶している文章のすべての文字列に対して同様の
処理を行うと、変形結果一時記憶部１７に記憶される結
果は次のようになる。　［］内の数字はカタカナ文字列
の連番であり、　０内の数字はそのカタカナ文字列の位
置情報であり、文章の先頭から数えた文字数とカタカナ
文字列の長さであ５− ［１コワイ　　　　　　　　（４，３）［２］メカ　　
　　　　　（７，３）［３］メカ　　　　　　　（’１５．４）［４］ワイ　
　　　　　　　（２６，３）これらに対してゆれ候補検
出部１８は　変形結果が一致するものを検出する。この
場合＋１［１］と［４］、　「２」と「３」が検出され
る。さらに表記のゆれ判定部１９１ｔ、、　　位置情報
からそれぞれ元の表記を比較する。　「２」と「３」の
それぞれの元の表記が異なるのでこれらは表記がゆれて
いると判定する。ゆれ部分表示部２０で「２」と「３」
に相当するカタカナ文字列を表記がゆれているとして表
示する。If you transform according to this example, "Wine" → "Wai"
, "Manufacturer" → "Mecha". Sara? Q-variant transformation result occasional storage unit 1'Ni For the transformation result "Wai", "4" indicating the beginning position in the sentence and "3" indicating the length of the katakana character string are stored in association with each other and "mecha" is stored. For “
7" and "3" are stored in association with each other. When the transformation processing for "winemaker" is completed, the katakana string extraction section 12i, t extracts the next katakana character string from the sentences stored in the sentence storage section 11, and the katakana string transformation section 16 does the same. Add the transformation. "Maker" (transformed into "mecha") extracted second.Hereafter, text storage part 1
When similar processing is performed on all character strings of the sentences stored in 1, the results stored in the temporary transformation result storage section 17 will be as follows. The numbers in [ ] are the serial numbers of the katakana string, and the numbers in 0 are the position information of the katakana string, and the number of characters counted from the beginning of the sentence and the length of the katakana string are 5 - [1 Scary (4, 3) [2] Mecha
(7,3) [3] Mecha ('15.4) [4] Wai
(26, 3) For these, the shaking candidate detection unit 18 detects those whose deformation results match. In this case, +1 [1] and [4], "2" and "3" are detected. Furthermore, the notation deviation determination unit 191t compares the original notations based on the position information. Since the original notations of "2" and "3" are different, it is determined that the notations of these are mixed. "2" and "3" on the shaking part display section 20
Displays the katakana string corresponding to , as if the notation is distorted.

以上のように　本実施例によれば　カタカナ表記辞書と
カタカナ複合語判定部とカタカナ複合語分割部を投法　
２つのカタカナ語が連なって１つのカタカナ文字列を形
成している「ワインメーカ」を「ワイン」と「メーカ」
に分割して１つずつのカタカナ語として処理することに
より、複合語「６− ワインメーカ」の中に含まれている「メーカ」と単独で
用いられている「メーカーｊのような表記のゆれを検出
することができる。また　カタカナ表記辞書に記憶する
カタカナ語を複合語でないカタカナ語で表記にゆれがな
いものと限定することによって、辞書サイズを小さくす
ることができる。As described above, according to this embodiment, the katakana notation dictionary, the katakana compound word determination section, and the katakana compound word division section are used to
"Winemaker" is a combination of two katakana words that form one katakana character string, "wine" and "maker".
By dividing the word into katakana words and processing them as individual katakana words, we can distinguish between the spelling variations such as ``maker'' included in the compound word ``6-winemaker'' and ``maker j'' used alone. In addition, by limiting the katakana words stored in the katakana notation dictionary to katakana words that are not compound words and have no variations in spelling, the dictionary size can be reduced.

すなわち、全てのカタカナ語を記憶する場合、さらに　
表記にゆれが生じるものについては生じうる全ての表記
も記憶する場合ζよ　記憶すべきカタカナ語の表記の量
が膨大になることは明らかである。In other words, if you memorize all katakana words,
It is clear that if we memorize all possible notations for words that have variations in notation, the amount of katakana notations to be memorized will be enormous.

な抵　本実施例で番上　カタカナ表記辞書に記憶するカ
タカナ文字列を表記にゆれが生じないもののみとした力
交　全てのカタカナ語を記憶し　表記にゆれが生じるも
のについて（よ　生じつる全ての表記も記憶するとして
もよい。ざら＆へ　カタカナ表記辞書の見出しと見出し
に対してカタカナ列変形部で行われる処理を施した変形
結果を対にして記憶し　カタカナ列変形部で行う変形を
省略するとしてもよ（１７− 発明の効果本発明においてｉｔ　　複数のカタカナ語が連なって１
つのカタカナ文字列を形成している場合にカタカナ表記
辞書が記憶しているカタカナ語で分割して１つずつのカ
タカナ語として処理することにより、複合語の中に含ま
れるカタカナ表記のゆれを検出することができ、その実
用的効果は犬き鶏In this example, we will store all katakana strings in the katakana notation dictionary only those that do not cause any fluctuations in the orthography. The notation may also be memorized. Zara&he The header of the katakana notation dictionary and the transformation result of the process performed by the katakana sequence transformation unit on the heading are stored as a pair, and the transformation performed by the katakana sequence transformation unit is omitted. (1 7- Effect of the Invention In the present invention, it is a combination of multiple katakana words.
When a compound word has two katakana character strings, it is divided into katakana words stored in the katakana notation dictionary and processed as individual katakana words to detect variations in katakana notation contained in compound words. can be used, and its practical effect is

[Brief explanation of drawings]

第１図は本発明の一実施例の文字処理装置の構成阻　第
２図は従来の文字処理装置の構成＠　第３図は本発明の
一実施例の文字処理方法のフロー飄　第４図は文章記憶
部に記憶している文章例の説明は　第５図はカタカナ表
記辞書が記憶しているカタカナ表記例の説明＠　第６図
はカタカナ列変形部で変形を行う際に用いる変形規則の
例の説明図である。１１・・・文章記憶部　１２・・・カタカナ列抽出敵１
３・・・カタカナ表記辞書、　１４・・・カタカナ複合
語判定Ｒ，１５・・・カタカナ複合語分割敵　１６・８
− ・・カタカナ列変形訊　１７・・・変形結果一時記憶敵
１８・・・ゆれ候補検出紙　１９・・・表記のゆれ判定
敞　２０・・・ゆれ部分表示糺Figure 1 shows the configuration of a character processing device according to an embodiment of the present invention; Figure 2 shows the configuration of a conventional character processing device @ Figure 3 shows the flowchart of a character processing method according to an embodiment of the present invention. Figure 5 is an explanation of examples of sentences stored in the sentence storage section. Figure 5 is an explanation of examples of katakana notation stored in the katakana notation dictionary. Figure 6 is an example of transformation rules used when performing transformations in the katakana string transformation section. FIG. 11... Sentence storage section 12... Katakana string extraction enemy 1
3...Katakana notation dictionary, 14...Katakana compound word judgment R, 15...Katakana compound word division enemy 16.8
- Katakana sequence transformation question 17... Temporary storage of transformation results 18... Shake candidate detection paper 19... Shake judgment of notation 20... Shake part display

Claims

[Claims]

(1) A sentence storage unit that stores sentences; a katakana string extraction unit that extracts katakana character strings from the sentences stored in the sentence storage unit; and at least one katakana notation. a katakana notation dictionary; a katakana compound word determination unit that determines whether a katakana character string is a katakana compound word by comparing it with katakana words stored in the katakana notation dictionary; and the katakana compound word determination unit. a katakana compound word dividing unit that divides the katakana character string determined to be a katakana compound word by the katakana word division unit into words by katakana words stored in the katakana notation dictionary;
a katakana string transformation section that rewrites a partial katakana string in a katakana string; a transformation result temporary storage section that temporarily stores the transformation result transformed by the katakana string transformation section in association with the original katakana character string; A shake candidate detection unit that compares deformation results temporarily stored in a storage unit and detects a match, and some of the shake candidates detected by the shake candidate detection unit have different original katakana notation. 1. A character processing device, comprising: a notation deviation determining unit that determines whether or not the notation is correct.

(2) A katakana string extraction stage that includes a katakana notation dictionary that stores at least one katakana notation, and extracts katakana character strings from a sentence, and the katakana string is a katakana compound word. a katakana compound word determination step in which it is determined whether or not the katakana compound word is a katakana compound word by comparing it with katakana words stored in the katakana notation dictionary; The katakana compound word division stage divides the katakana words stored in the katakana notation dictionary into individual words, the katakana string transformation stage rewrites partial katakana strings in the katakana character string, and the transformation results are compared to find a match. The present invention is characterized by comprising a shaking candidate detection step of detecting an object, and a notation deviation determining step of determining whether or not any of the shaking candidates detected in the shaking candidate detection step has a different original katakana notation. Character processing method.