JP2632806B2

JP2632806B2 - Language analyzer

Info

Publication number: JP2632806B2
Application number: JP61110871A
Authority: JP
Inventors: 壽彦横川
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-05-16
Filing date: 1986-05-16
Publication date: 1997-07-23
Anticipated expiration: 2012-07-23
Also published as: JPS62267872A

Description

【発明の詳細な説明】技術分野本発明は言語解析装置、とくに自動翻訳装置に有用な
言語解析装置に関する。Description: TECHNICAL FIELD The present invention relates to a language analyzer, and particularly to a language analyzer useful for an automatic translator.

従来技術たとえば英語などの外国語の文からそれに対応する日
本語の文を作成する場合、入力された英文の形態素を解
析し、その構文を解析し、その文構造を変換し、そのの
ち日本語の訳文を生成する。2. Description of the Related Art When a corresponding Japanese sentence is created from a foreign language sentence such as English, for example, the morpheme of the input English sentence is analyzed, its syntax is analyzed, and its sentence structure is converted. Generate a translation of

cfg文法（context−free grammar:文脈自由文法）に
おいて、文の末尾から先頭に向けて構文解析を行なうい
わゆるbottom−up解析手法においてもtop−down解析手
法においても文において部分的に文法に適合する解析解
も出力される。そこで、最終的に使用できない無駄な解
も多数出力されるのが欠点である。このような無駄な解
は、人間がこれを読んだときに明らかに誤りであるとわ
かるものも多い。In the cfg grammar (context-free grammar: context-free grammar), the sentence partially conforms to both the so-called bottom-up analysis method and the top-down analysis method that perform parsing from the end of the sentence to the beginning. An analytical solution is also output. Therefore, a disadvantage is that many useless solutions that cannot be used finally are output. Many of these useless solutions are clearly wrong when humans read them.

このような表面的に誤りとわかる解を多く含む解析結
果が構文解析で多数得られると、その後の工程である構
造変換や訳文生成の効率が低下することになる。つま
り、無駄な解についても構造変換を行ない、あるいは訳
文生成を実行し、それらの結果の適切性をそれぞれの処
理過程で判断することになり、処理時間を浪費する結果
を招く。If a large number of analysis results including such solutions that are apparently erroneous are obtained by the syntax analysis, the efficiency of the subsequent steps, such as the structure conversion and the translation generation, will be reduced. In other words, structural conversion is performed on a useless solution, or translation generation is performed, and the appropriateness of those results is determined in each processing step, resulting in a waste of processing time.

たとえば英文における前置詞句や副詞句は、その係り
先すなわち修飾先が文脈上から自由に選択できるのがよ
い。そこで、それらの被修飾語句が確定しないものは、
可能な係り先をすべて解の可能性としてもつ方式が考え
られる。また、辞書記述を優先的に考慮する方式もあ
る。後者の場合は、辞書を索引して特定の単語に特定の
前置詞をとる旨の記述があると、必ずその共起関係を採
用するものである。これは、その文の本来の意図が無視
される危険性がある。たとえば、英文“I saw a man in
the park with a telescope."の例では、辞書の単語
“saw"に前置詞“with"をとる旨の記述があると、前置
詞句“with a telescope"は必ず動詞“saw"を係り先と
してとることになり、他の可能性が無視されてしまう。For example, a preposition phrase or an adverbial phrase in an English sentence may preferably be freely selected in terms of context, that is, a modification destination. So, if those modifiers are not fixed,
A method is conceivable in which all possible parties are considered as solution possibilities. There is also a method of giving priority to dictionary description. In the latter case, if the dictionary is indexed and a description is given that a specific word takes a specific preposition, the co-occurrence relation is always adopted. This risks ignoring the original intent of the statement. For example, the English sentence "I saw a man in
In the example of "the park with a telescope.", if the dictionary word "saw" is described as taking the preposition "with", the preposition phrase "with a telescope" must always take the verb "saw" as the destination. And other possibilities are ignored.

しかし、構文解析においてそのような可能な係り先の
候補をすべて解としてしまうと、候補の数が多くなり、
その後の構造変換や訳文生成の処理で生ずる解は莫大な
数となってしまう。このように解の候補の数が多いこと
は、それ以降の処理の速度を著しく低下させる結果を招
く。However, if all such possible candidates are solved in the parsing, the number of candidates increases,
The number of solutions generated in the subsequent structure conversion and translation generation processing is enormous. Such a large number of solution candidates results in a significant reduction in the speed of subsequent processing.

そこで、自動翻訳プロセス全体の効率を向上させるに
は、このような無駄な解の数を減らして解析の効率を高
くするとともに、解析結果をより確からしいものにする
ことが要求される。Therefore, in order to improve the efficiency of the entire automatic translation process, it is necessary to reduce the number of such useless solutions to increase the efficiency of the analysis and to make the analysis result more reliable.

目的本発明はこのような要求に鑑み、構文解析を効率的に
行なうことのできる言語解析装置を提供することを目的
とする。Objectives In view of such demands, an object of the present invention is to provide a language analysis device capable of efficiently performing syntax analysis.

構成本発明は上記の目的を達成させるため、所定の言語の
文を解析するための辞書データを格納した辞書手段と、
入力された所定の言語の文について辞書手段を索引して
形態素解析を行ない、その解析結果をもとにその文につ
いて構文解析を行なう解析手段とを有し、解析手段は、
前記言語の文の構成要素の係り先を選択する条件を規定
したテーブルを有し、文法をルールを適用してその文の
表層構造を解析し、その文に含まれる構成要素の係り先
を暫定的に設定した構文木の候補を優先度を割り当てて
作成し、これらの候補のうちから優先度の高いものを選
択し、選択された候補の優先度を条件テーブル、および
辞書手段からの辞書データを参照して修正する言語解析
装置を特徴としたものである。以下、本発明の一実施例
に基づいて具体的に説明する。Configuration In order to achieve the above object, the present invention provides a dictionary unit storing dictionary data for analyzing a sentence in a predetermined language,
Analyzing means for performing morphological analysis by indexing the dictionary means for the input sentence of the predetermined language and performing syntax analysis on the sentence based on the analysis result;
It has a table that defines conditions for selecting the destinations of the components of the language sentence, analyzes the surface structure of the sentence by applying rules to the grammar, and provisionally determines the destinations of the components included in the sentence. Randomly set syntax tree candidates by assigning priorities, selecting high-priority ones from these candidates, setting the priorities of the selected candidates to a condition table, and dictionary data from the dictionary means. Is characterized by a language analysis device that corrects by referring to. Hereinafter, a specific description will be given based on an embodiment of the present invention.

第１図を参照すると、本発明による言語解析装置を英
日自動翻訳装置に適用した実施例の全体構成が示されて
いる。なお本発明は、英語を日本語に翻訳する英日自動
翻訳装置のみならず、ある１つの言語を他の言語に翻訳
する自動翻訳装置にも効果的に適用されることは、言う
までもない。Referring to FIG. 1, there is shown an overall configuration of an embodiment in which a language analyzer according to the present invention is applied to an English-Japanese automatic translator. Needless to say, the present invention is effectively applied not only to an English-Japanese automatic translator that translates English into Japanese, but also to an automatic translator that translates one language into another language.

同実施例は入力部10を有し、日本語に翻訳すべき英文
テキスト12がこれにより入力される。入力部10はたとえ
ば、英数字キーなどの文字キーや機能キーなどを有する
キーボード、紙に記録された英文テキストを読み取る光
学的文字読取装置（OCR），および（または）磁気ディ
スクなどの記憶媒体に記録された英文テキストを読み込
むファイル記憶装置などを含んでよい。This embodiment has an input unit 10 for inputting an English text 12 to be translated into Japanese. The input unit 10 may be, for example, a keyboard having character keys such as alphanumeric keys and function keys, an optical character reader (OCR) for reading English text recorded on paper, and / or a storage medium such as a magnetic disk. It may include a file storage device that reads the recorded English text.

入力部10により入力された英文テキストは、前編集部
14に読み込まれ、翻訳の前処理が行なわれる。ここで
は、主として文の認定と未知語の処理を行なう。これは
形態素解析の一部として機能する。The English text input by the input unit 10 is
It is read into 14 and pre-processing for translation is performed. Here, recognition of sentences and processing of unknown words are mainly performed. This functions as part of the morphological analysis.

前編集された英文データは、前編集で得られた情報と
ともに形態素解析部16に転送される。形態素解析部16で
は、単語辞書18を索引して文に分割し、英文の形態素を
解析し、未知語の処理、固有名詞、時の表現、数の表現
などの各種のまとめあげを行ない、付加疑問、同格の認
定などの文全体の処理を行なう。その形態素解析ルール
は解析ルールファイル36に格納されている。The pre-edited English sentence data is transferred to the morphological analysis unit 16 together with the information obtained by the pre-editing. The morphological analysis unit 16 indexes the word dictionary 18 and divides the sentence into sentences, analyzes the morphemes of the English sentence, performs various processing such as processing of unknown words, proper nouns, expressions of time, expressions of numbers, etc. , And performs the processing of the entire sentence such as recognition of the same rank. The morphological analysis rules are stored in the analysis rule file 36.

こうして形態素解析された英文データは、形態素解析
で得られた辞書情報とともに構文解析Ｉ部20に転送され
る。構文解析Ｉ部20は、本実施例ではcfg文法ルールを
英文データに適用して文についてbottom−up,right−to
−leftに表層構造の解析を行ない、すべての構文的可能
性を見つけ出す機能部である。The English sentence data thus morphologically analyzed is transferred to the syntactic analysis unit 20 together with the dictionary information obtained by the morphological analysis. In this embodiment, the syntax analysis I unit 20 applies bottom-up, right-to
This is a functional unit that analyzes the surface structure on the left and finds all syntactic possibilities.

構文解析Ｉ部20で構文解析（パーズ）された英文デー
タは、その解析情報とともに構文解析II部22に送られ
る。ここでは、構文解析Ｉによる表層的なパーズ結果か
ら、構造記述を適用して解を選択する。これによって英
語文の確からしい解析木を作成し、その構造を作る。こ
れらの構文解析ルールはやはり、解析ルールファイル36
に格納されている。The English sentence data parsed (parsed) by the parsing I unit 20 is sent to the parsing II unit 22 together with the parsing information. Here, a solution is selected by applying the structural description from the surface parsing result by the syntax analysis I. This creates a probable parse tree of the English sentence and creates its structure. These parsing rules are still in parsing rule file 36
Is stored in

構文解析された英文データは、解析木のデータとして
構造変換部24に転送される。構造変換部24では、英語文
の中間的構造である構文木から対応する日本語文の構文
木を作成し、日本語文を訳出しやすい日本語基底構造に
変換する。The parsed English sentence data is transferred to the structure conversion unit 24 as parse tree data. The structure conversion unit 24 creates a syntax tree of a corresponding Japanese sentence from a syntax tree that is an intermediate structure of the English sentence, and converts the Japanese sentence into a Japanese base structure that is easy to translate.

こうして構造変換された日本語の構文木を示す構文木
データは訳文生成部26に送出され、後者にて訳文の生成
が行なわれる。これは、日本語の基底構造から日本語の
文を生成する機能である。まず、語順を日本語のそれに
一致させるため、順序の入換えを行なって木構造を変更
する構文生成を行ない、次に形態素生成を行なって構文
木においてtop−down,left−to−rightに訳文を生成す
る。The syntax tree data indicating the Japanese syntax tree whose structure has been converted in this manner is sent to the translated sentence generating unit 26, and the translated sentence is generated by the latter. This is a function for generating a Japanese sentence from a Japanese base structure. First, in order to match the word order with that of Japanese, a syntax generation that changes the tree structure by rearranging the order is performed, and then a morpheme generation is performed, and the translated text is translated into top-down, left-to-right in the syntax tree. Generate

訳文生成された日本語文データ、すなわち訳文データ
は、後編集部30に送られる。後編集部30では、翻訳処理
に利用した情報を使用し、辞書18を索引して訳文データ
を修正し、より自然な日本語文を完成する。この日本語
文データは出力部32に転送され、翻訳された日本語文34
として出力部32から出力される。出力部32は、たとえば
プリンタ、ディスプレイ、および（または）磁気ディス
クなどのファイル記憶装置を含む。The translated sentence data, that is, the translated sentence data, is sent to the post-editing unit 30. The post-editing unit 30 uses the information used for the translation process to index the dictionary 18 to correct the translated sentence data, thereby completing a more natural Japanese sentence. This Japanese sentence data is transferred to the output unit 32 and the translated Japanese sentence 34
Is output from the output unit 32. The output unit 32 includes a file storage device such as a printer, a display, and / or a magnetic disk.

これらの一連の翻訳処理の流れは、本装置全体の制御
を統括する制御部38によって制御される。単語辞書18に
は、本実施例では英語および日本語の単語についての辞
書データが格納され、語彙だけでなく、係り関係すなわ
ち共起関係や、意味、単複、品詞などの様々な情報が記
述されている。また解析ルールファイル36には、形態素
解析および構文解析のルールデータが格納されている。The flow of these series of translation processes is controlled by the control unit 38 that controls the entire control of the present apparatus. In the present embodiment, the word dictionary 18 stores dictionary data for English and Japanese words, and describes not only vocabulary but also various information such as relations, that is, co-occurrence relations, meanings, singularity, and parts of speech. ing. The analysis rule file 36 stores rule data for morphological analysis and syntax analysis.

制御部38には、操作表示部40が接続されている。操作
表示部40は、操作者から本装置に様々な指示を与える。
たとえば翻訳指示キー、カーソルキーなどの操作キー
や、入力英語文テキスト、翻訳結果の日本語文、辞書情
報などの中間データ、操作者に対する様々な指示などを
可視表示するディスプレイやインジケータを有する。な
お、それらの操作表示機能の多くは、入力部10にキーボ
ードを備えている場合はそのキーボードに、また出力部
32にディスプレイを備えている場合はそのディスプレイ
に含まれるように構成してよい。The operation display unit 40 is connected to the control unit 38. The operation display unit 40 gives various instructions from the operator to the apparatus.
For example, it has operation keys such as a translation instruction key and a cursor key, a display and an indicator for visually displaying input English sentence text, translation result Japanese sentence, intermediate data such as dictionary information, various instructions to an operator, and the like. Many of these operation display functions are provided on the keyboard when the input unit 10 has a keyboard, and the output unit.
If the display 32 is provided with a display, the display may be included in the display.

ところで構文解析Ｉ部20では、形態素解析された英文
データについて、英文にcfg文法ルールをbottom−up,ri
ght−to−leftに適用してその文について可能性のある
すべての構文解を導出する。この解は一般に構造木の形
で理解される。これは、１つの文ごとにそれに含まれる
単語または句が修飾関係および格関係などの従属ないし
は共起関係によって相互に関連づけられ、たとえば親、
子、孫といった相互の従属関係を示すものである。各単
語または句は、構造木の節点すなわちノードの位置を占
める。By the way, the parsing I unit 20 converts the morphologically analyzed English text data into a cfg grammar rule in the English text as a bottom-up, ri.
Apply to ght-to-left to derive all possible syntactic solutions for the sentence. This solution is generally understood in the form of a structural tree. This means that for each sentence, the words or phrases contained in it are related to each other by subordination or co-occurrence such as qualification and case, and
It shows mutual dependency such as children and grandchildren. Each word or phrase occupies a node or node in the structure tree.

本実施例は、構文解析に先立って、文の形態上および
語彙上の特徴を識別して構文上のまとまりを判別するよ
うに構成してもよい。この構文上のまとまりをここでは
「ユニット」および「ブロック」と称する。入力英文の
構文上のまとまりをブロックとして認識する機能は、形
態素解析部16にて行なわれる。This embodiment may be configured so as to identify the morphological and lexical features of the sentence and determine the syntactic unit before the syntax analysis. These syntactic units are referred to herein as "units" and "blocks". The function of recognizing the syntactic unit of the input English sentence as a block is performed by the morphological analysis unit 16.

「ユニット」は、翻訳プロセスの最小単位となる語の
集まりであり、パーズの際には、これを一語と同等に扱
い、それに含まれる各構成要素の辞書情報を使用しな
い。A “unit” is a group of words that is the minimum unit of the translation process. At the time of parsing, this is treated as one word, and the dictionary information of each component included in the word is not used.

また「ブロック」は、その内部での解析を外部におけ
る解析より優先させて行ない、ブロック外に対してはそ
のブロックをユニットと同等に扱う構文的なまとまりで
ある。たとえば、節、句などの他、cfg文法で用いる中
間的なシンボルに相当するものでもよい。また、入れ子
になり得る。すなわちブロック内にさらにブロックが含
まれていてもよい。さらに、ブロックの概念に、文、段
落、文章全体をも含め、これらをそれぞれ１つのブロッ
クとみなしてもよい。この、部分的解析を優先させる処
理をここでは「部分パーズ」と称する。これによって、
前述の無駄な構文解が減少し、解析の効率が向上してよ
り確からしい解析結果が得られる。A "block" is a syntactic unit that gives priority to the analysis inside the block over the analysis outside the block, and treats the block outside the block as equivalent to a unit. For example, it may correspond to an intermediate symbol used in the cfg grammar, in addition to a clause, a phrase, and the like. It can also be nested. That is, a block may be further included in the block. Further, the concept of a block, including a sentence, a paragraph, and a whole sentence, may be regarded as one block. This process of prioritizing the partial analysis is referred to herein as "partial parse". by this,
The above-mentioned useless parsing is reduced, the efficiency of analysis is improved, and a more reliable analysis result is obtained.

単語辞書18には、英語の単語や熟語についての辞書情
報が格納されている。本実施例では各語の変化形ごとに
エントリが形成され、そのすべての情報が展開されてい
る。たとえば品詞情報については、複数の品詞の情報を
持つことができる。The word dictionary 18 stores dictionary information on English words and idioms. In this embodiment, an entry is formed for each variation of each word, and all the information is expanded. For example, the part-of-speech information can have information on a plurality of parts of speech.

解析ルールファイル36には、ブロックの先頭を示す先
頭条件、および末尾を示す終了条件のデータがテーブル
として格納され、また、cfg文法ルールや構造記述のデ
ータが格納されている。これらは、たとえば後述の暫定
木構造化処理204（第３図）などに利用される。The analysis rule file 36 stores data of a head condition indicating the head of the block and an end condition indicating the end of the block as a table, and also stores data of a cfg grammar rule and structure description. These are used, for example, in a provisional tree structuring process 204 (FIG. 3) described later.

ところで形態素解析部16では、前編集部14から入力さ
れる英文をまず、翻訳単位である文に分割する。その
際、スペルの誤りや未登録語の検出を行なう。文単位に
辞書18を索引し、各構成要素の辞書情報をフェッチす
る。それらの辞書情報に従って各種のまとめあげ処理を
行なう。Meanwhile, the morphological analysis unit 16 first divides the English sentence input from the pre-editing unit 14 into sentences that are translation units. At that time, spelling errors and unregistered words are detected. The dictionary 18 is indexed for each sentence, and dictionary information of each component is fetched. Various grouping processes are performed according to the dictionary information.

形態素解析部16では、ブロックの認識の他に、たとえ
ば固有名詞、派生語、未知語、省略語、数、時の表現、
ハイフン語、アポストロフィ「’」などの処理や、同格
の推定、付加疑問の処理などの様々な処理を行なって、
形態素解析データを作成する。形態素解析された英文デ
ータは、形態素解析で得られた辞書情報とともに構文解
析Ｉ部20に転送される。In the morphological analysis unit 16, in addition to the block recognition, for example, proper nouns, derivatives, unknown words, abbreviations, numbers, time expressions,
It performs various processing such as processing of hyphen, apostrophe "'", estimation of equality, processing of additional questions, etc.
Create morphological analysis data. The morphologically analyzed English sentence data is transferred to the syntax analysis unit 20 together with the dictionary information obtained by the morphological analysis.

構文解析Ｉ部20では、cfg文法ルールを英文データに
適用して文についてbottom−up,right−to−leftに表層
構造の解析を行ない、すべての構文的可能性を見つけ出
す。すなわち、解析ルールファイル36に格納されている
文脈自由文法ルールを適用して英文の表層構造を解析
し、可能性のあるすべての構文木を見つけだす（111,第
２図）。その際、ブロックが含まれていれば前述の部分
パーズを行ない、局所的解析を優先させる。これによっ
て、解析の効率と正確さが向上する。The syntactic analysis unit 20 applies the cfg grammar rule to the English sentence data, analyzes the surface structure of the sentence bottom-up, right-to-left, and finds all syntactic possibilities. That is, by applying the context-free grammar rules stored in the analysis rule file 36, the surface structure of the English sentence is analyzed, and all possible syntax trees are found (111, FIG. 2). At this time, if a block is included, the above-described partial parse is performed, and local analysis is prioritized. This improves the efficiency and accuracy of the analysis.

構文上のまとまりと従属関係を規定する表層構造デー
タが得られると、これは構文解析II部24に送られる。こ
のデータは、第３図に示すcfg解析結果（WFS）のリスト
200の形をとり、前述した構文木の形で容易に理解され
る。構文解析II部22では、構文解析Ｉ部20による表層的
なパーズ結果から、ウエイトないしは優先度を考慮して
構造記述を適用し、解を選択する。これによって英語文
の確からしい解析木の候補を作成し、その構造を作る
（112）。さらに、辞書記述などを利用して、前置詞句
や副詞句の係り先を決定し、最優先テーブルを作成す
る。これによって、より確からしい解析木を得る（113
〜115）。When the surface structure data defining the syntactic unit and the subordinate relation is obtained, it is sent to the parsing II unit 24. This data is a list of cfg analysis results (WFS) shown in Fig. 3.
It takes the form of 200 and is easily understood in the form of a parse tree as described above. The parsing II unit 22 applies a structural description from the surface parsing result of the parsing I unit 20 in consideration of the weight or priority and selects a solution. In this way, a probable parse tree candidate for the English sentence is created and its structure is created (112). Furthermore, the destination of the prepositional phrase or the adverbial phrase is determined using a dictionary description or the like, and a top priority table is created. As a result, a more likely parse tree is obtained (113
~ 115).

第３図を参照し、英文“I saw a man in the park wi
th a telescope."を例にとって構文解析II部22における
処理を説明する。この英文の構文的要素は、 NP1−（Vt1またはVt2）−NP2−PP1−PP2 である。ただし、NPは名詞句、Vtは他動詞、PPは前置詞
句であり、添字はその語彙または形態が異なるものを示
している。たとえばVt1は動詞“see"の過去形、Vt2は動
詞“saw"の現在形をシンボライズしている。Referring to FIG. 3, the English text “I saw a man in the park wi
th a telescope. "will be described as an example. The syntactic element of this English sentence is NP1- (Vt1 or Vt2) -NP2-PP1-PP2, where NP is a noun phrase, Vt is a transitive verb, PP is a prepositional phrase, and subscripts indicate different vocabulary or forms, for example, Vt1 symbolizes the past tense of the verb "see" and Vt2 symbolizes the present tense of the verb "saw". .

構文解析Ｉ部20ではそこで、この英文の表層構造を SE1←NP1 VP1 PP1 PP2 SE2←NP1 VP2 PP1 PP2 VP1←Vt1 NP2 VP2←Vt2 NP2 などと解析する。勿論、これら以外のものも得られる。
ただしSEは文を示し、VPは動詞句を示す。この解析結果
データは、構文解析II部22でcfg解析結果リスト200に格
納される。The parsing I unit 20 analyzes the surface structure of the English sentence as SE1 ← NP1 VP1 PP1 PP2 SE2 ← NP1 VP2 PP1 PP2 VP1 ← Vt1 NP2 VP2 ← Vt2 NP2. Of course, other than these can also be obtained.
However, SE indicates a sentence, and VP indicates a verb phrase. This analysis result data is stored in the cfg analysis result list 200 by the syntax analysis unit 22.

構文解析II部22ではまず、構文解析Ｉ部20から得られ
たcfg解析結果のリスト200に従って辞書18を索引し、暫
定木構造化処理を実行する。解析ルールファイル26に格
納されているcfg文法ルールには、文の構造に応じた選
択の優先度が規定されたデータを含む。The parsing II unit 22 first indexes the dictionary 18 according to the list 200 of the cfg analysis result obtained from the parsing I unit 20, and executes a provisional tree structuring process. The cfg grammar rules stored in the analysis rule file 26 include data defining the priority of selection in accordance with the structure of the sentence.

本実施例ではたとえば、優先度「４」として SE←NP NP ［PP］ｎなる構文がcfgルールに規定されている。ただし、ｎは
０以上の任意の整数である。また、［］は形態素の従
属ないしは共起関係を示すシンボルである。この構文式
は、文の構造が［VP［NP PPa.....］］であり、暫定的前置詞句PPaがｎ個含まれることを意味
している。ただし添字「ａ」は、係り先を暫定的に一応
決定したノードであることを示す。In the present embodiment, for example, a syntax of SE ← NP NP [PP] n is defined in the cfg rule as priority “4”. Here, n is an arbitrary integer of 0 or more. [] Is a symbol indicating morpheme subordination or co-occurrence. This syntactical expression means that the sentence structure is [VP [NP PPa .....]], and that n temporary provisional preposition phrases PPa are included. However, the suffix “a” indicates that the destination is a node whose provisional destination is temporarily determined.

また、優先度「３」として VP←Vt NP なる構文がcfgルールに規定されている。この文構造
は、［Vt ［NP］］である。この他 SE←NP VP 優先度「10」 VP←V NP 優先度「５」などの構文も規定されている。これらの文構造はそれぞ
れ、［VP ［NP］］［V ［NP］］などである。Also, a syntax such as VP ← Vt NP is specified in the cfg rule as priority “3”. This sentence structure is [Vt [NP]]. In addition, syntax such as SE ← NP VP priority “10” VP ← V NP priority “5” is also specified. Each of these sentence structures is [VP [NP]] [V [NP]].

これによって、cfgルールに基づき解析を行ない（11
1）,cfgルールに対応する規則に基づきすべての解析木
の候補を得る（112）。その際、前置詞句や副詞句の係
り先の可能性が複数ある場合は、どれか１つに暫定的に
設定しておく。This allows analysis based on cfg rules (11
1) Obtain all parse tree candidates based on the rule corresponding to the cfg rule (112). At this time, when there is a plurality of possible destinations of the prepositional phrase or the adverbial phrase, one of them is provisionally set to one of them.

辞書引きでは、解析結果リスト200に含まれる単語に
ついて辞書18を索引してその内容を取り込み、辞書引き
バッファ202に蓄積する。暫定木構造化処理では、解析
結果リスト200内の優先度ないしはウエイトと、解析ル
ール36に含まれる構造編集ルール206とから木構造を作
成する。作成された優先度の値は、木構造スタック（係
り先暫定状態）216に格納される。また、木構造は木構
造データ（係り先暫定状態）218に蓄積される。In the dictionary lookup, the dictionary 18 is indexed with respect to the words included in the analysis result list 200, the contents thereof are fetched, and stored in the dictionary lookup buffer 202. In the provisional tree structuring process, a tree structure is created from the priorities or weights in the analysis result list 200 and the structure editing rules 206 included in the analysis rules 36. The created priority value is stored in the tree structure stack (temporary state of the destination) 216. The tree structure is stored in the tree structure data (temporary state of the destination) 218.

たとえば上記英文例では、暫定構造［Vt1［NP1 NP2 PP1a PP2a］］が得られる。この優先度をたとえば「25」とする。この
暫定的係り関係構造を第4A図に示す。また、他の優先
度、たとえば「20」として別の暫定構造［Vt2［NP1 NP2 PP1a PP2a］］も得られる。この暫定的係り関係構造を第4B図に示す。
勿論、これら以外のものも得られる。For example, in the above English example, the provisional structure [Vt1 [NP1 NP2 PP1a PP2a]] is obtained. This priority is set to, for example, “25”. This provisional relationship structure is shown in FIG. 4A. Also, another provisional structure [Vt2 [NP1 NP2 PP1a PP2a]] is obtained with other priorities, eg, “20”. This provisional relationship structure is shown in FIG. 4B.
Of course, other than these can also be obtained.

そこで、ルールの優先度などに基づき、解の候補を限
定する（113）。具体的には、木構造スタック（係り先
暫定状態）の優先度の高い順に木構造スタック（係り先
暫定状態）とそれに対応する木構造データ（係り先暫定
状態）を並べかえることで実現している。Therefore, the solution candidates are limited based on the priority of the rules (113). Specifically, this is realized by rearranging the tree structure stack (temporary state) and the corresponding tree structure data (temporary state) in descending order of priority of the tree structure (temporary state). I have.

次に、このように絞り込んだ解析木について、すべて
の係り先の可能性のそれぞれに解析木を抽出する（11
4）。Next, for the parse tree thus narrowed down, a parse tree is extracted for each of the possibilities of the destination (11).
Four).

この係り先決定処理208では、所定の論理条件210に従
って優先度の修正を行なう。たとえば、辞書18の共起情
報に記述されている係り先については、その優先度を増
す。辞書の記述と反対のものは、優先度を減少させる。
辞書にないものは、ほぼそのままでよい。In the destination determination process 208, the priority is corrected according to a predetermined logical condition 210. For example, the priority of the destination described in the co-occurrence information of the dictionary 18 is increased. The opposite of a dictionary entry reduces priority.
Those not in the dictionary can be left almost unchanged.

優先度の増減の度合は、シミュレーションなどで決定
するのがよい。一例をあげると、共起情報に記述がある
係り先ノードは＋15,自己のノードの親のノードに係る
場合は＋５、自己のノードの兄弟（姉妹）で最も近い年
上の兄かその子孫の場合は＋８、それ以外の兄弟の場合
は＋２、親の兄弟とその子孫の場合は変更なし、などで
ある。The degree of increase or decrease of the priority is preferably determined by a simulation or the like. As an example, the destination node described in the co-occurrence information is +15, if it is related to the parent node of its own node, it is +5, and its own older sibling (sister) of its own node or its descendant. In the case of +8, in the case of other siblings, +2, in the case of parent siblings and their descendants, there is no change, and so on.

他の例では、係り方が辞書に記述されている場合は＋
10,動詞類に係りやすい前置詞によるもので、動詞類に
係るときは＋８、同じく名詞類に係るときは−１、ま
た、係り先の可能性が複数あるもので、最も近い位置に
あるものに係っているときは＋５するなどである。In another example, if the relationship is described in a dictionary, +
10, which is a preposition that is easily related to verbs, such as +8 for verbs, -1 for nouns, and the closest one that has more than one possible destination When it is engaged, +5 is applied.

まず、木構造データ（係り先暫定状態）218の先頭か
ら順に１文を取り出す。次に、その文中における共起情
報を辞書引きバッファ202中の単語記述と照合する。そ
の内容により木構造と優先度を決定する。その際、複数
発生することがであるので、優先度表を使用する。決定
した優先度は木構造スタック（係り先決定状態）212
へ、また木構造は木構造データ（係り先決定状態）214
へ格納する。First, one sentence is extracted sequentially from the top of the tree structure data (temporary state of the destination) 218. Next, the co-occurrence information in the sentence is checked against the word description in the dictionary lookup buffer 202. The tree structure and priority are determined according to the contents. At this time, since a plurality of occurrences may occur, a priority table is used. The determined priority is a tree structure stack (dependency determination state) 212
, And the tree structure is tree structure data (decision destination state) 214
To store.

係り先の変更の範囲は、本実施例では第５図に点線23
0で示す範囲である。すなわち、同図において各丸印は
構文木のノードを示し、実線で各ノードの共起関係が示
されている。自己のノードがｉであるとすれば、係り先
を変更できる範囲は、１）自己のノードｉの親ノードｅと、その祖先のノード
b,aなど、２）自己のノードｉの親ノードの兄弟であって親ノード
より年上のものc,dなどと、その子孫のノードn,oなど、３）自己のノードｉの兄弟であって自己ノードより年上
のものg,hと、その子孫のノードl,mなどである。In this embodiment, the range of the change of the destination is indicated by a dotted line 23 in FIG.
The range indicated by 0. That is, in the figure, each circle indicates a node of the syntax tree, and the co-occurrence relationship of each node is indicated by a solid line. Assuming that the own node is i, the range in which the destination can be changed is: 1) the parent node e of the own node i and the ancestor node
b) a, etc. 2) siblings of parent node of own node i that are older than parent node c, d, etc., and descendant nodes n, o, etc. 3) sibling of own node i There are g, h older than the self node and its descendant nodes l, m.

こうして、可能性のある係り先を規定した解析木をす
べて抽出したのち（114），この処理114で得られたすべ
ての解析木から係り方などを考慮して、より確からしい
解析木を得る（115）。具体的には、木構造スタック
（係り先決定状態）の優先度の高い順に木構造スタック
（係り先決定状態）とそれに対応する木構造データ（係
り先決定状態）を並べかえることで実現している。In this way, after extracting all the parse trees defining the possible destinations (114), a more reliable analytic tree is obtained from all the analytic trees obtained in the processing 114 by taking into account the way of the involvement and the like ( 115). Specifically, this is realized by rearranging the tree structure stack (dependency determination state) and the corresponding tree structure data (dependency determination state) in descending order of the priority of the tree structure stack (dependency determination state). I have.

上述の英文例では、たとえば優先度「48」で解析木［Vt1［NP1 NP2［PP1］］PP2］が得られる。これを第6A図に示す。これは、前述の英文
例で、前置詞句“in the park"が名詞句“a man"に係
り、前置詞句“with a telescope."が「を見た」という
意味の動詞“saw"に係ることを意味する。In the above English example, for example, a parse tree [Vt1 [NP1 NP2 [PP1]] PP2] is obtained with a priority of “48”. This is shown in FIG. 6A. This means that in the English example above, the preposition phrase “in the park” relates to the noun phrase “a man”, and the preposition phrase “with a telescope.” Relates to the verb “saw” meaning “saw”. Means

また、たとえば優先度「45」で解析木［Vt1［NP1 NP2］PP1 PP2］またたとえば優先度「43」で解析木［Vt2［NP1 NP2［PP1］］PP2］が、また優先度「41」で解析木［Vt1［NP1 NP2［PP1［PP2］］］］がそれぞれ得られる。これらを第6B図、第6C図および第
6D図にそれぞれ示す。Also, for example, a parse tree [Vt1 [NP1 NP2] PP1 PP2] with priority "45" and a parse tree [Vt2 [NP1 NP2 [PP1]] PP2] with priority "43", and a priority "41" The analytic tree [Vt1 [NP1 NP2 [PP1 [PP2]]]] is obtained. These are shown in FIGS. 6B, 6C and
Each is shown in the 6D diagram.

第6B図の場合は、前置詞句“in the park"および“wi
th a telescope."が「を見た」という意味の動詞“saw"
に係ることを意味し、また第6C図の場合は、前置詞句
“in the park"が名詞句“a man"に係り、前置詞句“wi
th a telescope."が「をのこぎりでひく」という意味の
動詞“saw"に係ることを意味し、同様に第6D図の場合
は、前置詞句“with a telescope."が同“in the park"
に係り、後者が名詞句“a man"に係ることを意味する。
勿論、これら以外の結果も得られる。In the case of FIG. 6B, the preposition phrases “in the park” and “wi
th a telescope. "is a verb" saw "meaning" saw "
In the case of FIG. 6C, the preposition phrase “in the park” relates to the noun phrase “a man” and the preposition phrase “wi
th a telescope. "means the verb" saw "meaning" saw with a saw. "Similarly, in the case of FIG. 6D, the preposition phrase" with a telescope. "
Mean that the latter relates to the noun phrase “a man”.
Of course, other results can be obtained.

こうして構文解析された英文データは、構造変換部24
へ転送されて日本語文の構造に変換され、訳文生成部26
では、それに含まれる各ノードごとに訳文を生成してゆ
く。構造木におけるノードの処理は、top−down,left−
to−rightで行なう。生成された訳文は、後編集部30で
後処理が行なわれ、操作表示部40に可視表示されるとと
もに、出力部32にて日本文34としてたとえば印字出力さ
れる。The English sentence data thus parsed is converted to a structure conversion unit 24.
And translated into the structure of the Japanese sentence.
Then, a translation is generated for each node included in the translation. The processing of nodes in the structure tree is top-down, left-
Perform with to-right. The generated translation is subjected to post-processing in the post-editing unit 30, is visually displayed on the operation display unit 40, and is printed out as a Japanese sentence 34 in the output unit 32, for example.

このように本実施例によれば、英語の文法規制におけ
る優先度などから確からしい解の候補を絞り込む。比較
的少数の限定した候補について、前置詞句や副詞句の係
り先のすべての可能性を考慮する。これによって、無駄
な解の数を減らし、解析の効率が向上するとともに、解
析結果がより確からしいものとなる。As described above, according to the present embodiment, probable solution candidates are narrowed down based on the priority in the English grammar regulation and the like. For a relatively small number of qualified candidates, consider all possible destinations of prepositional phrases and adverbial phrases. As a result, the number of unnecessary solutions is reduced, the efficiency of the analysis is improved, and the analysis result is more reliable.

効果本発明によれば、文法規則に基づいて確からしい解の
候補をまず制限し、限定した数の候補について次に文の
構成要素の係り先のすべての可能性を考慮する。これに
よって、無駄な解の生成を最小化し、解析の効率と正確
さを向上させている。Effects According to the present invention, candidates for probable solutions are first restricted based on grammar rules, and then all possible destinations of components of a sentence are considered for a limited number of candidates. This minimizes the generation of useless solutions and improves the efficiency and accuracy of the analysis.

[Brief description of the drawings]

第１図は、本発明による言語解析装置を英日自動翻訳装
置に適用した実施例の全体構成を示す機能ブロック図、第２図は、第１図に示す実施例における係り先を限定す
る構文解析処理の例を示すフロー図、第３図は、同実施例において、係り先を限定する構文解
析処理の機能をまとめた機能ブロック図、第4A図ないし第6D図は、特定の入力英文例について解析
木のまとめあげ処理の例を示す説明図である。主要部分の符号の説明 10……入力部 16……形態素解析部 18……単語辞書 20,22……構文解析部 24……構造変換部 26……訳文生成部 32……出力部 36……解析ルール 38……制御部 40……操作表示部FIG. 1 is a functional block diagram showing an entire configuration of an embodiment in which a language analyzer according to the present invention is applied to an English-Japanese automatic translator, and FIG. 2 is a syntax for limiting a participant in the embodiment shown in FIG. FIG. 3 is a flow diagram showing an example of an analysis process. FIG. 3 is a functional block diagram summarizing the functions of a syntax analysis process for limiting a participant in the embodiment. FIGS. 4A to 6D are examples of a specific input English sentence. FIG. 8 is an explanatory diagram showing an example of a process of grouping analytic trees. Explanation of Signs of Main Parts 10 Input Unit 16 Morphological Analysis Unit 18 Word Dictionary 20, 22 Syntactic Analysis Unit 24 Structural Conversion Unit 26 Translation Source Generation Unit 32 Output Unit 36 Analysis rules 38 Control unit 40 Operation display unit

Claims

(57) [Claims]

1. A dictionary means for storing dictionary data for analyzing a sentence in a predetermined language, and a morphological analysis is performed by indexing the dictionary means for the input sentence in the predetermined language. Analysis means for performing syntax analysis on the sentence based on the sentence, wherein the analysis means applies a grammar rule to analyze the surface structure of the sentence, and at this time, is given in advance to the grammar rule In consideration of the priority of the candidate of the destination, a candidate of the temporary destination of the component included in the sentence is obtained with a priority, and the candidate of the temporary candidate having the higher priority among the candidates of the temporary destination is obtained. A candidate for a destination is selected, and at least one of information on the selected candidate for the destination and information on how to bend to the candidate for the destination is referred to, and priority correction information included therein is referred to. Based on the selected destination A language analysis apparatus, wherein the priority of a candidate is corrected, and among the selected candidates for a destination, the corrected destination candidate having a higher priority is further narrowed down and selected.