JPH04673A

JPH04673A - Method and device for registering compound word

Info

Publication number: JPH04673A
Application number: JP2100484A
Authority: JP
Inventors: Masasuke Tominaga; 冨永　雅介
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1990-04-18
Filing date: 1990-04-18
Publication date: 1992-01-06

Abstract

PURPOSE:To properly extract a compound word from a text and to register it in a dictionary by possessing a compound word candidate extracting step, a compound word candidate presentation step and an inclusive relation presentation step. CONSTITUTION:A compound word candidate extracting means 6 to extract a compound word candidate based on the dictionary information of a word dictionary 5, a compound candidate file 2 to store an extracted compound word candidate and a list generating means 9 able to grasp the mutual inclusive relation of the compound word candidate of the compound word candidate file 2 are provided. A meaningless word string is removed and the compound word candidate is obtained by executing the deletion of the word string corresponding to necessity from a text 1. Besides, the inclusive relation list of a form able to grasp the inclusive relation between the compound word candidates easily is generated and is displayed. Thus, a compound word can be extracted properly from the text 1 and can be registered in a dictionary.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、連語登録方法および装置に関し、さらに詳し
くは、テキストから連語を抽出して辞書に登録する連語
登録方法および装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a collocation registration method and device, and more particularly to a collocation registration method and device for extracting collocations from a text and registering them in a dictionary.

［従来の技術］自然言語処理システム（例えば自然言語文による質問応
答システムや機械翻訳システム）の処理精度は辞書の精
度に大きく依存しており、対象分野で用いられる専門用
語を漏れなく辞書に登録する必要がある。[Prior art] The processing accuracy of natural language processing systems (e.g. natural language question answering systems and machine translation systems) is highly dependent on the accuracy of dictionaries, and it is necessary to register all technical terms used in the target field in the dictionary. There is a need to.

しかし、専門用語は不変ではなく　（例えば、科学技術
の分野における専門用語は技術革新とともに増大する）
、システムを運用しつつ、何回も辞書に追加登録する必
要がある。However, technical terms do not remain constant (for example, technical terms in the field of science and technology increase with technological innovation).
, it is necessary to add additional information to the dictionary many times while operating the system.

ところで、専門用語は、連語によって構成されているこ
とが多いため１、連語を如何に効率良く辞書に登録する
かが重要になる。By the way, since technical terms are often composed of collocations, it is important to efficiently register the collocations in a dictionary.

この連語の登録に関する従来技術としては、まずテキス
トから単語列を機械的に抽出し、次にその単語列の出現
頻度の小さいものを削除し、残りを連語とする技術が、
［テキスト・データ・ベースからの慣用表現の自動抽出
／情報処理学会第３７回（昭和６３年後期）全国大会論
文誌７Ｂ−６Ｐ１０３２〜Ｐ、　１０３３Ｊに開示され
ている。The conventional technology for registering such collocations is to first mechanically extract a word string from a text, then delete words with a low frequency of occurrence, and use the remaining word strings as collocations.
[Automatic extraction of idiomatic expressions from a text data base/Disclosed in Information Processing Society of Japan 37th National Conference Papers 7B-6P1032-P, 1033J.

また、まずテキストから単語列を機械的に抽出し、次に
その単語列の出現頻度に基づいて重複のある単語列を削
除し、残りを連語とする技術が、特開平１−１０２６７
９号に開示されている。In addition, a technology that first mechanically extracts word strings from text, then deletes duplicate word strings based on the frequency of occurrence of the word strings, and converts the remaining words into collocations is disclosed in Japanese Patent Application Laid-Open No. 1-10267.
It is disclosed in No. 9.

さらに、連語についてＫＷＩＣリストを表示し、それを
ガイドとしてユーザに連語についての情報を入力させ、
辞書を更新する技術が、特開昭６３２６１４６７号に開
示されている。Furthermore, the KWIC list for collocations is displayed, and the user is prompted to input information about the collocations using the KWIC list as a guide.
A technique for updating a dictionary is disclosed in Japanese Patent Laid-Open No. 63261467.

［発明が解決しようとする課題］上記従来技術のうち、単語列の出現頻度に基づいて単語
列の一部を削除し、残りを連語とするものでは、連語と
したものの中にも連語として不適切な単語列が残る問題
点がある。[Problems to be Solved by the Invention] Among the above-mentioned conventional techniques, in which a part of a word string is deleted based on the frequency of occurrence of the word string and the remaining words are made into collocations, some of the collocations may not be considered as collocations. There is a problem that the appropriate word string remains.

他方、連語についてＫＷＩＣリストを表示する従来技術
では、その表示を見てユーザか連語としての適切性を判
定することが出来るが、単語列の出現回数だけその単語
列を含む文が表示されるため、リストの量が膨大となり
、ユーザにかかる負担が大きくなる問題がある。On the other hand, in the conventional technology that displays a KWIC list for collocations, the user can judge the suitability of the collocation by looking at the display, but sentences containing the word string are displayed for the number of times the word string appears. , there is a problem that the amount of the list becomes enormous and the burden on the user increases.

そこで、本発明の目的は、テキストから連語を適切に抽
出して辞書に登録することが出来ると共に、ユーザにか
かる負担を軽減することができる連語登録方法および装
置を提供することにある。SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a collocation registration method and device that can appropriately extract collocations from text and register them in a dictionary, and that can reduce the burden on the user.

［課題を解決するための手段〕本発明は、所与のテキストから所定の連語候補抽出規則
に基づいて連語候補を抽出する連語候補抽出ステップと
、前記抽出した各連語候補をユーザに提示し、１つの連
語候補を指定させる連語候補提示ステップと、ユーザが
指定した連語候補と包含関係にある他の連語候補を選び
出し、これらの連語候補を包含関係が分かる形式でユー
ザに提示する包含関係提示ステップと、ユーザが指定し
た連語候補についての所定の情報をユーザに入力させ、
その連語候補を連語として前記情報と共に辞書に登録す
る辞書登録ステップとを有することをさらに有する連語
登録方法を提供する。[Means for Solving the Problems] The present invention includes a collocation candidate extraction step of extracting collocation candidates from a given text based on a predetermined collocation candidate extraction rule, and presenting each of the extracted collocation candidates to a user; A collocation candidate presentation step in which one collocation candidate is specified; and an inclusion relationship presentation step in which other collocation candidates that have an inclusive relationship with the collocation candidate specified by the user are selected and these collocation candidates are presented to the user in a format that allows the inclusion relationship to be understood. and has the user input predetermined information about the collocation candidate specified by the user,
The present invention provides a collocation registration method further comprising a dictionary registration step of registering the collocation candidate as a collocation in a dictionary together with the information.

また、本発明は、上記連語登録方法を好適に実施する連
語登録装置を提供する。Furthermore, the present invention provides a collocation registration device that suitably implements the collocation registration method described above.

［作用］本発明の連語登録方法では、連語候補抽出ステップにお
いて、テキストから単語列を抽出し、必要に応じて出現
頻度に基づく単語列の削除や、形態素解析／構文解析に
よる単語列の削除を行うことにより、可能な限り無意味
な単語列を除去して、連語候補を得る。また、包含関係
提示ステップにおいて、連語候補間の包含関係が容易に
把握できる形式の包含関係リストを生成し、表示する。[Operation] In the collocation registration method of the present invention, in the collocation candidate extraction step, a word string is extracted from the text, and if necessary, the word string is deleted based on the frequency of occurrence or by morphological analysis/syntax analysis. By doing this, we remove as many meaningless word strings as possible and obtain collocation candidates. Furthermore, in the inclusion relationship presentation step, an inclusion relationship list is generated and displayed in a format that allows the inclusion relationships between compound word candidates to be easily grasped.

このように、連語候補間の包含関係が容易に把握できる
形式の包含関係リストを表示するから、ユーザは、容易
に無意味な単語列を除去できるようになる。従って、テ
キストから連語を適切に抽出し、辞書に登録することが
出来るようになる。In this way, since the inclusion relationship list is displayed in a format that allows the inclusion relationships between compound word candidates to be easily grasped, the user can easily eliminate meaningless word strings. Therefore, it becomes possible to appropriately extract collocations from the text and register them in the dictionary.

［実施例コ以下、本発明の実施例を図面に基づいて詳細に説明する
。なお、これにより本発明が限定されるものではない。[Embodiments] Hereinafter, embodiments of the present invention will be described in detail based on the drawings. Note that the present invention is not limited thereby.

第１図は、本発明の一実施例の連語辞書保守袋Ｗ５０の
ブロック図である。FIG. 1 is a block diagram of a collocation dictionary maintenance bag W50 according to an embodiment of the present invention.

この連語辞書保守装置５０は、英文テキストを記憶する
テキストファイル１と、そのテキストファイル１の英文
テキストを編集するためのテキスト編集手段７と、英文
テキストから連語候補を抽出するための規則を記憶した
連語候補抽出規則ファイルと、単語情報を蓄えた単語辞
書５と、前記テキストファイル１の英文テキストから前
記連語候補抽出規則ファイル３の連語候補抽出規則や前
記単語辞書５の辞書情報に基づいて連語候補を抽出する
連語候補抽出手段６と、抽出した連語候補を記憶する連
語候補ファイル２と、その連語候補ファイル２の連語候
補の相互の包含関係を把握できる形式のリストを生成し
たり連語候補が出現する文脈を把握できるようにＫＷＩ
Ｃリストを生成するリスト生成手段９と、連語候補に対
してユーザが設定した訳語などの情報をもとに辞書情報
を作成する辞書保守手段８と１作成した辞書情報を蓄え
る連語辞書４と、前記連語候補や前記リストや前記辞書
情報をユーザに提示するための表示手段１２と、ユーザ
がデータを入力するための入力手段１１と、前記テキス
ト編集手段７や前記リスト生成手段９や前記辞書保守手
段８を制御すると共にデータの入出力などの制御を行な
う制御手段１０とを具備している。This collocation dictionary maintenance device 50 stores a text file 1 for storing English text, a text editing means 7 for editing the English text of the text file 1, and rules for extracting collocation candidates from the English text. A collocation candidate extraction rule file, a word dictionary 5 storing word information, and collocation candidates are extracted from the English text of the text file 1 based on the collocation candidate extraction rules of the collocation candidate extraction rule file 3 and dictionary information of the word dictionary 5. A collocation candidate extracting means 6 that extracts collocation candidates, a collocation candidate file 2 that stores the extracted collocation candidates, and a list in a format that allows understanding the mutual inclusion relationship of collocation candidates in the collocation candidate file 2, and collocation candidates appearing. KWI so that you can understand the context
List generation means 9 for generating a C list; dictionary maintenance means 8 for generating dictionary information based on information such as translations set by the user for collocation candidates; and collocation dictionary 4 for storing the dictionary information created in 1. Display means 12 for presenting the collocation candidates, the list, and the dictionary information to the user; input means 11 for the user to input data; and the text editing means 7, the list generation means 9, and the dictionary maintenance. It is provided with a control means 10 which controls the means 8 and also controls data input/output.

第２図は、前記連語辞書保守装置５０の作動のフローチ
ャートである。FIG. 2 is a flowchart of the operation of the collocation dictionary maintenance device 50.

ステップ１０１では、制御手段１０は、ユーザに対して
パラメータの値を入力するように要求する。ユーザは、
パラメータとして、連語候補を抽出する対象となる英文
テキストの指定情報や、抽出する連語候補の単語数や、
出現頻度の閾値や。In step 101, the control means 10 requests the user to input the values of the parameters. The user is
As parameters, the specification information of the English text from which collocation candidates are extracted, the number of words of collocation candidates to be extracted,
Appearance frequency threshold.

連語候補を抽出する処理方式の指定情報などを入力する
。Input information such as specifying the processing method for extracting compound word candidates.

ステップ１０２では、連語候補抽出手段６は、前記ユー
ザの入力したパラメータに基づいて英文テキストから連
語候補を抽出し、抽出した連語候補を連語候補ファイル
２に記憶する。この連語候補抽出処理のフローチャート
を第３図に示す。In step 102, the collocation candidate extraction means 6 extracts collocation candidates from the English text based on the parameters input by the user, and stores the extracted collocation candidates in the collocation candidate file 2. A flowchart of this compound word candidate extraction process is shown in FIG.

すなわち、第３図において、ステップ３０１では、制御
手段１ｏは、前記第１図のステップ１０１でユーザが入
力したパラメータの値を連語抽出手段６に設定する。That is, in step 301 in FIG. 3, the control means 1o sets the value of the parameter input by the user in step 101 in FIG. 1 to the collocation extraction means 6.

ステップ３０２では、連語候補抽出手段６は、指定の英
文テキストから指定の条件にあう単語列を指定の抽出処
理方式で抽出する。In step 302, the collocation candidate extraction means 6 extracts a word string that meets the specified conditions from the specified English text using the specified extraction processing method.

抽出処理方式としては、（１）指定の単語数の条件を満
たす単語列を全て抽出する方式や、（２）所定の抽出パ
ターンにマツチする単語列だけを抽出する方式がある。Extraction processing methods include (1) a method that extracts all word strings that satisfy a specified word count condition, and (2) a method that extracts only word strings that match a predetermined extraction pattern.

前者を第４図（１）に示す（但し、指定の単語数を「２
」とした場合）。後者を第４図（２）に示す（但し、抽
出パターンを名詞句とした場合）。The former is shown in Figure 4 (1) (however, if the specified number of words is
). The latter is shown in FIG. 4 (2) (however, when the extraction pattern is a noun phrase).

なお、第４図（２）では、各単語には１つの品詞だけを
あてているが、複数の品詞をあてて、その場合に形成さ
れる全ての名詞句を抽出するようにしてもよい。また、
第４図（２）では、専門用語は名詞句が多いという特徴
を考慮して、抽出パターンを名詞句に固定しているが、
ユーザがパラメータの一つとして抽出パターンを指定す
るようにしてもよい。Although only one part of speech is assigned to each word in FIG. 4(2), multiple parts of speech may be assigned and all noun phrases formed in that case may be extracted. Also,
In Figure 4 (2), the extraction pattern is fixed to noun phrases, considering the characteristic that technical terms often have noun phrases.
The user may specify an extraction pattern as one of the parameters.

ステップ３０３では、抽出した単語列の中から無意味な
単語列を削除する。例えば、第４図（１）では、連語候
補として不適切な単語列（例えば、■■）も抽出されて
いるが、これらは無意味な単語列を削除するための規則
を連語抽出規則ファイル３に設定し、その規則を適用す
ることによって除去可能である。例えば「冠詞（あるい
は、単語“ａ”　　　”ｔｈｅ”、　　ａｎ″）は単語
列の右端にはならない」という規則を設定しておけば、
第４図（１）の■を除去できる。また、連語辞書４に既
に登録されている単語列か否かをチェックし、既に登録
されていれば、削除する。In step 303, meaningless word strings are deleted from the extracted word strings. For example, in Figure 4 (1), word strings that are inappropriate as compound word candidates (for example, It can be removed by setting it to , and applying that rule. For example, if you set a rule that ``articles (or words "a,""the," or "an") cannot be at the right end of a word string,
■ in Figure 4 (1) can be removed. Also, it is checked whether the word string is already registered in the collocation dictionary 4, and if it is already registered, it is deleted.

ステップ３０４では、連語候補抽出手段６は、抽出した
単語列をソートし、重複する単語列を削除し、各単語列
の出現頻度を求める。この際、活用形を含む単語列に対
しては、パターンとして一般化できる場合は一般化して
出現頻度を算出する。In step 304, the collocation candidate extraction means 6 sorts the extracted word strings, deletes duplicate word strings, and calculates the frequency of appearance of each word string. At this time, for a word string that includes an inflected form, if it can be generalized as a pattern, it is generalized and the appearance frequency is calculated.

例えば、ｊｓｔｒｕｃｔｕｒａｌ　ａｍｂｉｇｕｉｔｙ
ｊ　　と　ｒｓｔｒｕｃ−ｔｕｒａｌ　ａｍｂｉｇｕｉ
ｔｉｅｓＪの２つの単語列の場合、複数型である後者を
単数型の前者に一般化し、出現頻度を合計する。そして
、この種の単語列をユーザに提示する際には、どの単語
か活用形を一般化したものかを認識可能な形態で提示す
る。一方、例えばｒｖｅｎｄｉｎｇ　ｍａｃｈｉｎｅＪ
　　という単語列を抽出し、活用形を考えて一般化しｆ
ｖｅｎｄ　ｍａｃｈｉｎｅＪとしても、ｆｖｅｎｄ　ｍ
ａｃｈｉｎｅＪという単語列が存在しない場合は、元の
ｒｖｅｎｄｉｎｇ　ｍａｃｈｉｎｅＪをそのまま単語列
として採用する。For example, structural ambiguity
j and rstruc-tural ambigui
In the case of two word strings in tiesJ, the latter plural type is generalized to the singular former type, and the frequency of occurrence is summed. When presenting this type of word string to the user, it is presented in a form that allows the user to recognize which word is a generalized conjugation form. On the other hand, for example, rvending machine J
Extract the word string , consider the conjugation form, and generalize it.
As a bend machine J, fvend m
If the word string achineJ does not exist, the original rvending machineJ is used as it is as the word string.

出現頻度を算出した後、指定された出現頻度の閾値未満
の出現頻度の単語列を削除する。After calculating the appearance frequency, word strings whose appearance frequency is less than the specified appearance frequency threshold are deleted.

ステップ３０５では、連語候補抽出手段６は、残った単
語列を連語候補としてその出現頻度と共に連語候補ファ
イル２に書き込む。In step 305, the collocation candidate extraction means 6 writes the remaining word string as a collocation candidate into the collocation candidate file 2 together with its appearance frequency.

以上によって、英文テキストから連語候補が抽出され、
連語候補ファイル２に蓄積される。Through the above steps, collocation candidates are extracted from the English text,
The words are stored in the collocation candidate file 2.

ステップ１０３では、制御手段１０は、表示手段１２に
おいて前記連語候補やその他の情報を表示する。第５図
はその表示画面の一例であり、１２０２が連語候補の表
示ウィンドウである。また、１２０１は各種処理を起動
するためのメニューウィンドウである。なお、第５図で
は、連語候補を英文テキスト中での出現頻度の大きい順
に表示しているが、他の順序（例えばＡＢＣ順）で表示
するようにメニューウィンドウ１２０１で選択できるよ
うにすることも可能である。In step 103, the control means 10 displays the collocation candidates and other information on the display means 12. FIG. 5 shows an example of the display screen, and 1202 is a display window for compound word candidates. Further, 1201 is a menu window for starting various processes. In FIG. 5, the collocation candidates are displayed in descending order of frequency of occurrence in the English text, but it may also be possible to select to display them in another order (for example, ABC order) in the menu window 1201. It is possible.

ステップ１０４では、ユーザは、表示ウィンドウ１２０
２を参照し、注目する連語候補があるかチェックする。In step 104, the user displays the display window 120
2, and check whether there is a collocation candidate of interest.

この際、メニューウィンドウ１２０１のスクロール機能
「↑」　「↓」を用いて、ウィンドウ１２０２に表示し
きれない連語候補を参照できる。注目する連語候補がな
ければ、連語抽出処理を終了する。注目する連語候補が
あれば、ステップ１０５に進む。At this time, by using the scrolling functions "↑" and "↓" of the menu window 1201, it is possible to refer to compound word candidates that cannot be displayed in the window 1202. If there is no target collocation candidate, the collocation extraction process ends. If there is a compound word candidate of interest, the process advances to step 105.

ステップ１０５では、ユーザは、マウス等の入力手段１
１を用いて、注目する連語候補を指定する。In step 105, the user inputs the input means 1 such as a mouse.
1 to specify the collocation candidate of interest.

ステップ１０６では、ユーザは、メニューウィンドウ１
２０１の「包含関係Ｊ　　ｒＫＷＩＣＪ　　ｒ辞書登録
」　「完了」のいずれかの機能を選択する。In step 106, the user selects menu window 1
201: "Inclusion relationship J rKWIC J r dictionary registration" or "Complete".

「包含関係」を選択すると、ステップ１０７に進む。ｒ
ＫＷＩｃＪを選択すると、ステップ１０８に進む。「辞
書登録」を選択すると、ステップ１０９に進む。「完了
」を選択すると、前記ステップ１０４に戻る。If "inclusion relationship" is selected, the process advances to step 107. r
If KWIcJ is selected, the process proceeds to step 108. If "Dictionary Registration" is selected, the process advances to step 109. If "Complete" is selected, the process returns to step 104.

ステップ１０７では、リスト生成手段９は、前記ステッ
プ１０５で指定した連語候補と包含関係にある他の連語
候補を前記連語候補ファイル２から取り出し、相互に共
通する単語列の位置を揃えてリスト化した包含関係リス
トを生成し、表示装置１２において表示する。第６図は
その表示画面の一例であり、１２０３が前記包含関係リ
ストを表示するウィンドウである。なお、第６図は、前
記ステップ１０５で連語候補ｒＬＩＮＥ　ＦＥＥＤＪが
指定された場合で、連語候補ｒＬＩＮＥ　ＦＥＥＤＪ　
　を含む他の連語候補ｒＬＴＮＥ　ＦＥＥＤ　ＭＯＤＥ
、Ｊ　、　　ｒＬＩＮＥ　ＦＥＥＤＫＥＹＪ　、　　ｒ
ＰＲＥｓｓＩＮＧ　　ＬＩＮＥ　ＦＥＥＤＪ　、　　ｒ
ＰＲＥｓｓｒＮＧＬＩＮＥ　ＦＥＥＤ　　ＫＥＹＪ等が
連語候補ファイル２から取り出され、共通する単語列［
ＬＩＮＥ　ＦＥＥＤＩ　　の位置を揃えてウィンドウ１
２０３に表示されている。In step 107, the list generating means 9 extracts other compound word candidates that have an inclusive relationship with the compound compound candidate specified in step 105 from the compound word candidate file 2, aligns the positions of common word strings, and creates a list. An inclusion relationship list is generated and displayed on the display device 12. FIG. 6 shows an example of the display screen, and 1203 is a window that displays the inclusion relationship list. Note that FIG. 6 shows a case where the collocation candidate rLINE FEEDJ is specified in step 105, and the collocation candidate rLINE FEEDJ
Other collocation candidates including rLTNE FEED MODE
, J, rLINE FEEDKEYJ, r
PREssING LINE FEEDJ, r
PREssrNGLINE FEED KEYJ etc. are extracted from the collocation candidate file 2 and the common word string [
Align LINE FEEDI and open window 1
It is displayed in 203.

ユーザは、注目する連語候補と包含関係にある連語候補
系列を参照することによって、無意味な単語列を容易に
見つけ出すことが出来る。例えば、第５図の連語候補ｒ
ＰＲＥｓｓＩＮＧ　　ＬＩＮＥ　ＦＥＥＤＪは、その出
現頻度より明らかに、ｒＰＲＥｓｓＩＮＧ　　ＬＩＮＥ
ＦＥＥＤ　　ＫＥＹＪの部分としてだけ出現するから、
独立した連語としては意味を持たないことが分かる。The user can easily find meaningless word strings by referring to collocation candidate series that have an inclusive relationship with the collocation candidate of interest. For example, the collocation candidate r in Figure 5
PREssING LINE FEEDJ is clearly rPREssING LINE from its appearance frequency.
Because it only appears as a part of FEED KEYJ,
It turns out that it has no meaning as an independent combination.

なお、連語候補の出現頻度を重複分を引いた値で表示す
れば、無意味な単語列をさらに容易に見つけ出すことが
出来る。Note that if the appearance frequency of compound word candidates is displayed as a value after subtracting the overlap, meaningless word strings can be found even more easily.

ステップ１０８では、リスト生成手段９は、前記ステッ
プ１０５で指定した連語候補を含む文を編集してＫＷＩ
Ｃリストを生成し、表示手段１２において表示する。第
７図はその表示画面の一例であり、１２０４が前記ＫＷ
ＩＣリストを表示するウィンドウである。なお、ＫＷＩ
Ｃリストは、注目する連語候補の位置をそろえて文を見
やすくリスト化したものであり、注目する連語候補か如
何なる文脈に出現しているのかを容易に把握することが
出来る。In step 108, the list generating means 9 edits the sentence including the collocation candidate specified in step 105 and creates a KWI.
A C list is generated and displayed on the display means 12. FIG. 7 is an example of the display screen, and 1204 is the KW
This is a window that displays an IC list. In addition, KWI
The C list is a list of sentences in which the positions of the collocation candidates of interest are aligned for easy viewing, and it is possible to easily grasp in what context the collocation candidates of interest appear.

ステップ１０９では、辞書保守手段８は、前記ステップ
１０５で指定した連語候補を連語辞書４に登録するため
に必要な辞書情報の入力をユーザに要求し、ユーザか入
力すると、連語辞書４に登録する。第８図に示すウィン
ドウ１２０５は、辞書情報の入力をユーザに要求するウ
ィンドウであり、指定された連語候補ｒＬＩＮＥ　　Ｆ
ＥＥＤＪが専門用語の欄に表示されている。ユーザは、
見出し語。In step 109, the dictionary maintenance means 8 requests the user to input dictionary information necessary for registering the collocation candidate specified in step 105 in the collocation dictionary 4, and when the user inputs the collocation candidate, the collocation candidate is registered in the collocation dictionary 4. . A window 1205 shown in FIG. 8 is a window that requests the user to input dictionary information, and is a window that requests the user to input dictionary information.
EEDJ is displayed in the technical term column. The user is
Headword.

品詞、意味コード、訳語の欄に入力する。例えば、見出
し語の欄にはｒ　ＦＥＥＤＪを入力し、品詞の欄には名
詞を表わすｒＮＪを入力し、意味コードの欄には動作を
表わすｒＢＪを入力し、訳語の欄には「改行」を入力す
る。Enter in the part of speech, meaning code, and translation fields. For example, enter rFEEDJ in the headword field, enter rNJ representing a noun in the part of speech field, enter rBJ representing action in the meaning code field, and enter "line break" in the translation field. input.

なお、辞書保守手段８は、注目する連語候補の構文的・
意味的なパターンから中心語（ヘッド）を決定し、その
中心語を見出し語の欄に暫定値として設定すると共にそ
の中心語の辞書情報に基づいて品詞、意味コード、訳語
の値を推定し、暫定値として各欄に自動設定する。この
ため、多くの場合は自動設定された値を確定する入力を
行なうだけでよく、ユーザの入力の手間が軽減される。Note that the dictionary maintenance means 8 analyzes the syntactical and
Determine the central word (head) from the semantic pattern, set the central word as a provisional value in the headword column, and estimate the part of speech, meaning code, and translation value based on the dictionary information of the central word, Automatically set in each column as a provisional value. Therefore, in many cases, the user only needs to make an input to confirm the automatically set value, which reduces the user's input effort.

前記中心語の決定方法としては、例えば連語候補の最後
の単語や、前置詞句の前の単語を中心語とするものが挙
げられる。Examples of the method for determining the central word include a method in which the last word of a collocation candidate or the word before a prepositional phrase is used as the central word.

ユーザは、ウィンドウ１２０５に暫定値が表示されてい
る状態で見出し語を変えることか出来る。The user can change the headword while the provisional value is displayed in the window 1205.

見出し語を変えると、辞書保守手段８は、その見出し語
の辞書情報に基づいて品詞、意味コード。When a headword is changed, the dictionary maintenance means 8 changes the part of speech and meaning code based on the dictionary information of the headword.

訳語の暫定値を自動設定する。Automatically set provisional values for translated words.

また、ユーザは、ウィンドウ１２０５に表示されている
値を確定させた後で、見出し語１品詞。Further, after the user confirms the value displayed in the window 1205, the user selects the entry word 1 part of speech.

意味コード、訳語を個別に変えることが出来る。Semantic codes and translations can be changed individually.

以上説明したように、ユーザはシステムが抽出した連語
候補に関し、その出現頻度や他の連語候補との包含関係
、英文テキスト上での文脈などを包含関係リストやＫＷ
ＩＣリストで調べ、登録すべき連語か否かを判断するこ
とが出来る。そして、登録すべき連語であると判断した
場合は、容易に情報を設定して、連語辞書４に登録する
ことが出来る。As explained above, regarding the collocation candidates extracted by the system, the user can check their frequency of occurrence, inclusion relationships with other collocation candidates, context in the English text, etc. using the inclusion relationship list and KWW.
By checking the IC list, you can determine whether or not it is a collocation that should be registered. If it is determined that the compound word should be registered, the information can be easily set and registered in the compound word dictionary 4.

本発明の他の実施例としては、第３図のステップ３０２
またはステップ３０４において、抽出した単語列が既に
連語辞書４に登録済みか否かをチェックし、登録済みの
単語列は自動削除するものが挙げられる。あるいは、登
録済みの単語列を自動削除せずに、登録済みであること
が分かるように表示色を変えるなどして表示装置１２の
ウィンドウ１２０２に表示し、ユーザに削除させるもの
が挙げられる。いずれにしても、無駄な辞書登録処理を
避けることが出来る。In another embodiment of the invention, step 302 of FIG.
Alternatively, in step 304, it is checked whether the extracted word string has already been registered in the collocation dictionary 4, and the registered word string is automatically deleted. Alternatively, instead of automatically deleting the registered word string, the word string may be displayed on the window 1202 of the display device 12 by changing the display color so that it can be seen that it has been registered, and the user can delete it. In any case, unnecessary dictionary registration processing can be avoided.

また、他の実施例としては、前記第５図、第６図、第７
図のように、「包含関係」のウィンドウ１２０３と、　
　ｒＫＷＩ　ＣＪのウィンドウ１２０４と、　「辞書登
録」のウィンドウ１２０５を択一的に表示せずに、これ
らウィンドウ１２０３，１２０４．１２０５をオーバー
ラツプして一時に表示するものが挙げられる。In addition, as other embodiments, the above-mentioned FIGS. 5, 6, and 7
As shown in the figure, a “containment relationship” window 1203,
An example is one in which the rKWI CJ window 1204 and the "Dictionary Registration" window 1205 are not displayed selectively, but these windows 1203, 1204, and 1205 are displayed simultaneously in an overlapping manner.

また、さらに他の実施例としては、第３図のステップ１
０５で、連語候補表示用ウィンドウ１２０２に表示され
た連語候補の中からユーザが所望の連語候補を指定する
だけでなく、「包含関係」のウィンドウ１２０３や、ｒ
ＫＷＩｃＪのウインドウ１２０４や７　「辞書登録」の
ウィンドウ１２０５に表示された単語列をも連語候補と
して指定できるようにしたものが挙げられる。また、編
集手段７で英文テキストを編集するときのウィンドウ（
図示省略）に表示された単語列をも連語候補として指定
できるようにしたものが挙げられる。In addition, as yet another embodiment, step 1 in FIG.
05, the user not only specifies a desired collocation candidate from among the collocation candidates displayed in the collocation candidate display window 1202, but also selects the "inclusion relation" window 1203, r
For example, the word strings displayed in the KWIcJ window 1204 or the ``Dictionary Registration'' window 1205 can also be specified as collocation candidates. In addition, the window (
One example is a system in which word strings displayed in (not shown) can also be specified as compound word candidates.

さらに他の実施例としては、英語−日本語以外の言語の
翻訳用辞書を保守する装置として本発明を適用したもの
が挙げられる。Still another embodiment is an apparatus to which the present invention is applied as a device for maintaining a translation dictionary for languages other than English and Japanese.

また、翻訳以外の処理（例えばデータベース検索）を行
なう自然言語処理システムで使う連語辞書を保守する装
置として本発明を適用したものが挙げられる。Another example is an apparatus to which the present invention is applied as a device for maintaining a collocation dictionary used in a natural language processing system that performs processing other than translation (for example, database search).

［発明の効果コ本発明の連語登録方法および装置によれば、自動的に除
去することのできない不適切な連語候補をユーザが容易
に認識することができるような形式で、テキストから抽
出した連語候補を提示するから、ユーザに大きな負担を
かけることなく、適切な連語を辞書に登録することが出
来るようになる。[Effects of the Invention] According to the collocation registration method and device of the present invention, collocations extracted from text are created in a format that allows the user to easily recognize inappropriate collocation candidates that cannot be automatically removed. Since candidates are presented, it becomes possible to register appropriate collocations in the dictionary without placing a large burden on the user.

[Brief explanation of drawings]

第１図は本発明の一実施例の連語辞書保守装置のブロッ
ク図、第２図は第１図の連語辞書保守装置の処理のフロ
ーチャート、第３図は連語候補抽出処理のフローチャー
ト、第４図（１）（２）は連語候補抽出方式の説明図、
第５図は連語候補を提示する画面の例示図、第６図は包
含関係リストを提示する画面の例示図、第７図はＫＷＩ
Ｃリストを提示する画面の例示図、第８図は辞書情報を
提示する画面の例示図である。（符号の説明）５０・・・連語辞書保守装置１・・・テキストファイル、２・・・連語候補ファイル、３・・・連語候補抽出規則ファイル、４・・・連語辞書、５・・・単語辞書、６・・・連語候補抽出手段、７・・・テキスト編集手段、８・・・辞書保守手段、９・・・リスト生成手段、１０・・・制御手段、１１・・・入力手段、１２・・・表示手段。第１図FIG. 1 is a block diagram of a collocation dictionary maintenance device according to an embodiment of the present invention, FIG. 2 is a flowchart of processing of the collocation dictionary maintenance device of FIG. 1, FIG. 3 is a flowchart of collocation candidate extraction processing, and FIG. 4 (1) and (2) are explanatory diagrams of the collocation candidate extraction method,
Figure 5 is an example of a screen that presents collocation candidates, Figure 6 is an example of a screen that presents an inclusion relationship list, and Figure 7 is a KWI
FIG. 8 is an example diagram of a screen that presents the C list, and FIG. 8 is an example diagram of a screen that presents dictionary information. (Explanation of symbols) 50... Collocation dictionary maintenance device 1... Text file, 2... Collocation candidate file, 3... Collocation candidate extraction rule file, 4... Collocation dictionary, 5... Word Dictionary, 6... Collocation candidate extraction means, 7... Text editing means, 8... Dictionary maintenance means, 9... List generation means, 10... Control means, 11... Input means, 12 ...Display means. Figure 1

Claims

[Claims] 1. A collocation candidate extraction step of extracting collocation candidates from a given text based on a predetermined collocation candidate extraction rule, and presenting each of the extracted collocation candidates to the user and selecting one collocation candidate. a step of presenting a collocation candidate that the user specifies; a step of presenting a collocation relationship that selects other collocation candidates that have an inclusive relationship with the collocation candidate specified by the user and presenting these collocation candidates to the user in a format that allows the user to specify the inclusive relationship; A collocation registration method comprising the steps of: having a user input predetermined information about the collocation candidate, and registering the collocation candidate as a collocation in a dictionary together with the information. 2. A KW that extracts sentences containing collocation candidates specified by the user from the text and presents them to the user in the form of a KWIC list.
2. The collocation registration method according to claim 1, further comprising the step of presenting an IC. 3. In the dictionary registration step, the central word is estimated according to the pattern of collocation candidates specified by the user, the dictionary information of the central word is extracted from the dictionary, and it is presented to the user as a provisional value of the dictionary information of the collocation candidate. Claim 1 or Claim 2, wherein the user is made to input information as to whether the provisional value is appropriate or not, and if there is an input that the provisional value is appropriate, the provisional value is registered in a dictionary as dictionary information of the compound word candidate.
How to register collocations. 4. In the dictionary registration step, the user selects one of the words constituting the collocation candidate specified by the user as the central word, extracts the dictionary information of the central word from the dictionary, and uses it as the provisional dictionary information of the collocation candidate. 3. The provisional value is presented to the user as a value, the user is made to input information as to whether the provisional value is appropriate, and if the provisional value is appropriate, the provisional value is registered in a dictionary as dictionary information of the compound word candidate. How to register collocations. 5. In the dictionary registration step, the dictionary information of each word constituting the collocation candidate specified by the user is extracted from the dictionary, combined, and presented to the user as a provisional value of the dictionary information of the collocation candidate, and the provisional value is 3. The collocation registration method according to claim 1, wherein the user is made to input information as to whether or not the collocation is appropriate, and if the provisional value is appropriate, the provisional value is registered in a dictionary as dictionary information of the collocation candidate. 6. In the collocation candidate extraction step, it is checked whether each collocation candidate extracted from the text has already been registered in the dictionary, and if it has been registered, it is deleted from the collocation candidates. How to register. 7. A dictionary in which dictionary information of words and collocations is registered; a collocation candidate extraction rule file that stores collocation candidate extraction rules for extracting collocation candidates from text; and a collocation candidate extraction rule file for extracting collocation candidates from text based on the dictionary information and collocation extraction rules. a collocation candidate extraction means for extracting collocation candidates using a combination of words, a collocation candidate display means for displaying the extracted collocation candidates on a display device, and a collocation candidate specifying means for allowing a user to specify one of the displayed collocation candidates. and an inclusion relationship display means for selecting other collocation candidates that have an inclusive relationship with the collocation candidate specified by the user, and displaying these collocation candidates on a display device in a format that allows the inclusion relationship to be understood; A collocation registration device comprising: information input means for allowing a user to input predetermined information; and dictionary registration means for registering a collocation candidate designated by the user as a collocation in a dictionary together with the input information. 8. A KW that extracts sentences containing collocation candidates specified by the user from the text and presents them to the user in the form of a KWIC list.
8. The collocation registration device according to claim 7, further comprising IC presentation means. 9. The information input means estimates a central word according to the pattern of collocation candidates specified by the user, extracts dictionary information of the central word from the dictionary, and presents it to the user as a provisional value of the dictionary information of the collocation candidate. The dictionary registration means includes provisional value presentation means and confirmation input means for allowing the user to input information as to whether the provisional value is appropriate, and when the dictionary registration means receives an input that the provisional value is appropriate, 9. A collocation registration device according to claim 7, wherein said provisional value is registered in a dictionary as dictionary information of said collocation candidate. 10. The information input means includes a central word specifying means for specifying one of the words constituting the collocation candidate specified by the user as a central word, and extracts dictionary information of the designated central word from the dictionary and uses it as the collocation candidate. provisional value presentation means for presenting the provisional value of the dictionary information to the user; and confirmation input means for prompting the user to input information as to whether the provisional value is appropriate; 9. The collocation registration device according to claim 7 or 8, wherein the provisional value is registered in a dictionary as dictionary information of the collocation candidate when there is an input that the collocation candidate is appropriate. 11. Temporary value presenting means, wherein the information input means extracts dictionary information of each word constituting the collocation candidate specified by the user from the dictionary, combines it, and presents it to the user as a provisional value of the dictionary information of the collocation candidate; and confirmation input means for prompting a user to input information as to whether the provisional value is appropriate, and the dictionary registration means selects the provisional value as the collocation candidate when there is an input that the provisional value is appropriate. 9. The collocation registration device according to claim 7 or 8, wherein the collocation registration device registers in a dictionary as dictionary information of . 12. The dictionary is a bilingual dictionary for language translation, and includes translated words as dictionary information, and the information input means extracts the translated words of each word constituting the collocation candidate specified by the user from the dictionary, combines them, and combines them into the above-mentioned A provisional value presentation means for presenting a provisional value of a translation of a compound word candidate to a user, and a confirmation input means for allowing the user to input information as to whether or not the provisional value is appropriate; 12. The collocation registration device according to claim 7, wherein the provisional value is registered in a dictionary as dictionary information of the collocation candidate when there is an input that the value is appropriate. 13. The collocation candidate extraction means includes a registration check means for checking whether each collocation candidate extracted from the text has already been registered in a dictionary, and a collocation candidate narrowing means for deleting registered collocation candidates from the collocation candidates. The collocation registration device according to any one of claims 7 to 12.