JPH07249040A

JPH07249040A - Document structure analyzing method for machine translation and machine translating method using the same, document structure analyzing device, and machine translating device

Info

Publication number: JPH07249040A
Application number: JP6038479A
Authority: JP
Inventors: Izuru Yagakinai; 出野垣内; Naoki Inoue; 直己井ノ上
Original assignee: Kokusai Denshin Denwa KK
Current assignee: KDDI Corp
Priority date: 1994-03-09
Filing date: 1994-03-09
Publication date: 1995-09-26

Abstract

PURPOSE:To obtain a method which can analyze document structure by using a simple rule so that the load of operation at the time of rule generation is reduced by analyzing the document structure from extracted position information by using a document structure position rule and a document structure partial rule. CONSTITUTION:When a document is inputted, position information extraction is performed S1. Then the document structure is analyzed, line by line, by using the document structure position rule 100 and document structure partial rule 200 S2. Then when the conditions in the document structure position rule 100 are satisfied, the same document structure name with the last line is given to the current line. When no document structure name is given, candidates for a document structure name is searched for by using the document structure partial rule 200. For each candidate for the document structure name, it is judged which conditions of the document structure name are met by the current line by using the document structure position rule 100 and document structure partial rule 200, and the document structure name meeting the conditions is given S3. Thus, the document structure is analyzed, line by line, and the analysis is ended S4 when the final line is reached.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、機械翻訳用文書構造解
析方法並びにそれを用いた機械翻訳方法、文書構造解析
装置及び機械翻訳装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a machine translation document structure analysis method, a machine translation method using the machine translation method, a document structure analysis apparatus and a machine translation apparatus.

【０００２】[0002]

【従来の技術】機械翻訳システムでは、翻訳を援助する
ための総合的なソフトウェアである翻訳支援環境とし
て、翻訳対象の文書構造の利用が提案されている。即
ち、文書構造の情報が予め付加された文書であれば、機
械翻訳の精度が向上し、例えば英文和訳の場合に名詞句
を通常文や命令文として誤訳することがなくなる。しか
し、文書構造情報が前以って付加されない一般の文書の
場合には、機械翻訳に先立って文書構造を解析する必要
があり、従来は詳細な規則を用いて文書構造解析を行っ
ていた。2. Description of the Related Art In a machine translation system, use of a document structure to be translated is proposed as a translation support environment which is a comprehensive software for assisting translation. That is, if the document structure information is added in advance, the accuracy of machine translation is improved, and in the case of English-Japanese translation, for example, noun phrases are not mistranslated as ordinary sentences or imperative sentences. However, in the case of a general document in which the document structure information is not added in advance, it is necessary to analyze the document structure prior to machine translation, and conventionally, the document structure analysis was performed using detailed rules.

【０００３】従来の文書構造解析方法の代表的なものと
して、文献１により知られているように、文書中の
最初の章名や、段落を示す「初めに」、などのキーワー
ドと、文書が「前書き」、「本文」、「後書き」で
構成される、等の規則に基づいて、文書構造名毎の
条件と、或る行がそれ以前の行と同じ文書構造名と
なる条件などを含む詳細な文書構造一般規則を作成し、
これを用いて、文書構造を解析する方法がある。文献
１：電子情報通信学会論文誌Ｄ−II，Vol.Ｊ７６─Ｄ−
II，No. ９，pp２０４２〜２０５２，１９９３年９月，
土井美和子他“文書構造抽出技法の開発”（電子情報通
信学会）As known from the document 1, as a typical document structure analysis method, a keyword such as the first chapter name in a document, "at the beginning" indicating a paragraph, and the document are Based on the rules such as "preface", "text", "postscript", etc., it includes the condition for each document structure name and the condition that a certain line has the same document structure name as the lines before it. Create detailed document structure general rules,
There is a method of analyzing the document structure using this. Reference 1: IEICE Transactions D-II, Vol. J76-D-
II, No. 9, pp2042-2052, September 1993,
Miwako Doi et al. "Development of Document Structure Extraction Techniques" (IEICE)

【０００４】文書構造一般規則を用いた従来の文書構造
解析方法を、図６に基づいて説明する。図６において、
文書を入力すると、まずステップＳ３１として行単位の
文解析を行う。この解析処理では、現在の行が以前の行
の文書構造名と同じになるための所定の条件を満足する
か否かを、文書構造一般規則を利用して判断し、同条件
を満足すれば以前の行と同じ文書構造名を現在の行に与
え、その後、次の行に移って同様の解析処理を行う。同
条件が満足されなければ、次のステップＳ３２に移って
別の解析をする。ステップＳ３２では、複数の文書構造
名のうち、それぞれ毎にその条件を現在の行が満足して
いるか否かを、文書構造一般規則を用いて判断し、満足
した条件に対応する文書構造名を現在の行に与える。こ
のようにして文書の最終行まで解析を行ったらステップ
Ｓ３３で終了するが、そうでなければステップＳ３２に
戻って次の行の文解析を行う。A conventional document structure analysis method using the document structure general rule will be described with reference to FIG. In FIG.
When a document is input, first, in step S31, line-by-line sentence analysis is performed. In this parsing process, it is judged whether or not the current line satisfies the predetermined condition for becoming the same as the document structure name of the previous line by using the general rule for the document structure, and if the same condition is satisfied, The same document structure name as the previous line is given to the current line, and then the next line is moved to perform the same parsing processing. If the same condition is not satisfied, the process moves to the next step S32 to perform another analysis. In step S32, it is determined whether the current line satisfies the condition for each of the plurality of document structure names by using the document structure general rule, and the document structure name corresponding to the satisfied condition is determined. Give to the current line. When the analysis is performed up to the last line of the document in this way, the process ends in step S33. If not, the process returns to step S32 and the sentence analysis of the next line is performed.

【０００５】[0005]

【発明が解決しようとする課題】上述した従来の文書構
造解析方法では、詳細な文書構造一般規則を用いるた
め、次のような問題点がある。規則は人手により作成する必要があるが、文書構造
一般規則ではキーワードを収集し、文書全体の構成や、
章の構成から文の構成まで詳細に規則を作成する必要が
あるため、規則作成時の作業負担が大きい。また、詳細な規則であることから、規則中の条件を
厳密に満足しない文書については、文書構造の解析を行
うことができない。そのため、類似する各種文書を解析できるように規
則を多く必要とするが、規則どうしの相互干渉が生じな
いように全体的な調整が必要となり、更に作業負担が大
きくなる。The above-mentioned conventional document structure analysis method has the following problems because it uses detailed general rules for document structure. Rules need to be created manually, but the general rule for document structure collects keywords,
Since it is necessary to create rules in detail from the structure of chapters to the structure of sentences, the work load when creating rules is heavy. In addition, since it is a detailed rule, the document structure cannot be analyzed for a document that does not strictly satisfy the conditions in the rule. Therefore, many rules are required so that various similar documents can be analyzed, but overall adjustment is required to prevent mutual interference between rules, which further increases the work load.

【０００６】そこで本発明は、規則作成時の作業負担が
少なくなるように簡素な規則を用いて文書構造を解析す
ることができる方法を提供することを目的とし、また、
それを用いた機械翻訳方法、文書構造解析装置及び機械
翻訳装置を提供することを目的とする。Therefore, an object of the present invention is to provide a method capable of analyzing a document structure using a simple rule so as to reduce the work load at the time of creating a rule, and
An object of the present invention is to provide a machine translation method, a document structure analysis device and a machine translation device using the same.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成する請求
項１の発明は、文書から行の開始及び終了の位置情報を
抽出すること、文書構造を行の開始及び終了の位置情報
から判断するための所定の文書構造位置規則と、文書構
造を文書の局所的な情報から判断するための所定の文書
構造局所規則とを用いて、前記抽出された位置情報から
文書構造を解析すること、を特徴とする文書構造解析方
法である。また請求項２の発明は、文書から行の開始及
び終了の位置情報を抽出すること、文書構造を行の開始
及び終了の位置情報から判断するための所定の文書構造
位置規則と、文書構造局所規則に代えて文書構造を文書
の全体的な情報から判断するための所定の文書構造一般
規則とを用いて、前記抽出された位置情報から文書構造
を解析すること、を特徴とする文書構造解析方法であ
る。請求項３の発明は、請求項１ないし請求項２記載の
文書構造解析方法により文書構造を解析すること、この
解析で得た文書構造を、予め定めた文書構造名と文法項
目との変換テーブルを用いて、文法項目に変換するこ
と、この変換で得た文法項目となるように、文を機械翻
訳用に解析すること、この解析結果に基づいて前記文の
翻訳を行うこと、を特徴とする機械翻訳方法である。更
に、請求項４の発明は、文書構造を文書の位置情報から
判断するための所定の文書構造位置規則を記憶した記憶
手段と、文書構造を文書の局所的な情報から判断するた
めの所定の文書構造局所規則を記憶した記憶手段と、文
書データから文書全体の開始位置及び終了位置、各行の
開始位置及び終了位置を求める位置情報抽出手段と、行
毎に、現在の行とそれ以前の行との位置情報の関係及び
文書構造位置規則に基づいて、現在の行の文書構造を解
析し見出し等の文書構造名を定める第１の解析手段と、
第１の解析手段で文書構造名が定まらない行について、
当該行と文書全体との位置情報の関係及び文書構造局所
規則に基づいて、当該行の文書構造を解析し見出し等の
文書構造名を定める第２の解析手段と、を具備すること
を特徴とする文書構造解析装置である。請求項５の発明
は、文書構造を文書の位置情報から判断するための所定
の文書構造位置規則を記憶した記憶手段と、文書構造を
文書の一般的な情報から判断するための所定の文書構造
一般規則を記憶した記憶手段と、文書データから文書全
体の開始位置及び終了位置、各行の開始位置及び終了位
置を求める位置情報抽出手段と、行毎に、現在の行とそ
れ以前の行との位置情報の関係及び文書構造位置規則に
基づいて、現在の行の文書構造を解析し見出し等の文書
構造名を定める第１の解析手段と、第１の解析手段で文
書構造名が定まらない行について、当該行と文書全体と
の位置情報の関係及び文書構造一般規則に基づいて、当
該行の文書構造を解析し見出し等の文書構造名を定める
第２の解析手段と、を具備することを特徴とする文書構
造解析装置である。請求項６の発明は、請求項４ないし
請求項５記載の文書構造解析装置と、予め定めた文書構
造名と文法項目との変換テーブルに基づき、文書構造解
析装置で定められた見出し等の文書構造名を名詞句など
の文法項目に変換する変換手段と、この変換手段で得た
名詞句などの文法項目となるように、文を解析する解析
手段と、この解析手段で得た文の解析結果からその文を
他の言語の文に翻訳する翻訳手段と、を具備することを
特徴とする機械翻訳装置である。In order to achieve the above object, the invention according to claim 1 extracts the position information of the start and end of a line from a document, and judges the document structure from the position information of the start and end of a line. Analyzing the document structure from the extracted position information using a predetermined document structure position rule for determining the document structure from the local information of the document. It is a characteristic document structure analysis method. The invention according to claim 2 extracts the position information of the start and the end of the line from the document, the predetermined document structure position rule for judging the document structure from the position information of the start and the end of the line, and the document structure local A document structure analysis which analyzes the document structure from the extracted position information using a predetermined document structure general rule for judging the document structure from the overall information of the document instead of the rule. Is the way. According to the invention of claim 3, the document structure is analyzed by the document structure analyzing method according to claim 1 or 2, and the document structure obtained by this analysis is converted into a conversion table of a predetermined document structure name and a grammar item. Is used to convert a sentence into a grammatical item, a sentence is analyzed for machine translation so that the grammatical item obtained by this conversion is obtained, and the sentence is translated based on the analysis result. Machine translation method. Further, according to the invention of claim 4, a storage means for storing a predetermined document structure position rule for judging the document structure from the position information of the document and a predetermined means for judging the document structure from the local information of the document. A storage unit that stores the local rule of the document structure, a position information extraction unit that obtains the start position and the end position of the entire document, the start position and the end position of each line from the document data, and the current line and the lines before it. First analysis means for analyzing the document structure of the current line and determining the document structure name such as a heading based on the relationship of position information with and the document structure position rule,
For lines where the document structure name is not determined by the first analysis means,
A second analysis unit that analyzes the document structure of the line and determines a document structure name such as a heading based on the positional information relationship between the line and the entire document and the document structure local rule. It is a document structure analysis device that does. According to a fifth aspect of the present invention, a storage means for storing a predetermined document structure position rule for judging the document structure from the document position information, and a predetermined document structure for judging the document structure from general document information. A storage unit that stores the general rule, a position information extraction unit that obtains the start position and end position of the entire document, the start position and end position of each line from the document data, and the current line and the lines before it for each line. A first analysis unit that analyzes the document structure of the current line and determines a document structure name such as a heading based on the relationship of position information and a document structure position rule, and a line whose document structure name is not determined by the first analysis unit A second analysis means for analyzing the document structure of the line and determining a document structure name such as a heading based on the relationship between the position information of the line and the entire document and the document structure general rule. Characteristic document structure It is the analysis apparatus. A sixth aspect of the invention is a document such as a headline determined by the document structure analysis device based on the document structure analysis device according to the fourth to fifth aspects and a conversion table of a predetermined document structure name and a grammar item. A conversion means for converting a structure name into a grammatical item such as a noun phrase, an analysis means for analyzing a sentence so as to obtain a grammatical item such as a noun phrase obtained by this conversion means, and an analysis of a sentence obtained by this analysis means A machine translation device comprising: a translation unit that translates the sentence from the result into a sentence in another language.

【０００８】[0008]

【作用】上述した発明では、文書から行の開始及び終了
の位置情報を抽出し、この位置情報を用いて文書構造を
解析する。多くの文書では、文書全体に対する或る部分
の位置的関係が判れば、その部分の文書構造が、(イ) フ
ッター及びヘッダー、(ロ) 章や節の見出し、(ハ) 個条書
き、(ニ) 本文等の大雑把な形態的を項目のいずれに該当
するかを判断することができる。また、局所的に例えば
前後数行について見れば、或る行の開始位置、あるいは
文頭の特定な形（数字や記号）、更にはその行の文長、
文書全体における行開始位置と終了位置との傾向が判れ
ば、その行の文書構造が「見出し」という項目であるか
否、等を判断することができる。更に、或る行の位置情
報が直前の行の位置情報と同じであれば、文書構造の項
目名も同じと考えることができる。In the above-described invention, the position information of the start and end of a line is extracted from the document, and the document structure is analyzed using this position information. In many documents, if the positional relationship of a part with respect to the whole document is known, the document structure of that part is (a) footer and header, (b) chapter or section heading, (c) item writing, D) It is possible to judge which of the items corresponds to a rough morphology such as the text. Also, if you look at several lines before and after locally, for example, the starting position of a certain line, or the specific form of the beginning of a sentence (numbers or symbols), or the sentence length of that line,
If the tendency of the line start position and the line end position in the entire document is known, it can be determined whether or not the document structure of the line is the item "heading". Furthermore, if the position information of a certain line is the same as the position information of the immediately preceding line, the item names of the document structure can be considered to be the same.

【０００９】そこで請求項１の発明では、従来の文書構
造一般規則の代りに、文書構造を行の開始及び終了の位
置情報から判断するための所定の文書構造位置規則と、
文書構造を文書の局所的な情報から判断するための所定
の文書構造局所規則とを予め作成しておき、これらを用
いて、前記抽出された位置情報から文書構造を形態的に
解析する。これら文字構造位置規則や文字構造局所規則
は、従来の文書構造一般規則が詳細なのに比べると簡素
であるから、規則作成の作業負担が軽減する。請求項２
の発明では、文書構造を行の開始及び終了の位置情報か
ら判断するための所定の文書構造位置規則と、文書構造
を文書の全体的な情報から判断するための所定の文書構
造一般規則とを用いて、前記抽出された位置情報から文
書構造を形態的に解析するが、この場合の文書構造一般
規則は、従来の文書構造一般規則ほどに詳細な必要がな
い。従って、規則作成時の負担が軽減する。請求項３の
発明では、上記の如く解析された文書構造の項目名を、
「名詞句」等の文法項目に変換したのち、この文法項目
名を文に付加して同文の機械翻訳用の構造の解析を行っ
たのち、翻訳する。機械翻訳では、文法項目が指定され
れば、その指定を優先して、指定された文法項目となる
ように文を解析し、翻訳する。請求項４の発明では、記
憶手段に、従来の文書構造一般規則の代りに、文書構造
を文書の位置情報から判断するための所定の文書構造位
置規則と、文書構造を文書の局所的な情報から判断する
ための所定の文書構造局所規則とを記憶しておき、位置
情報抽出手段で文書データから文書全体の開始位置及び
終了位置、各行の開始位置及び終了位置を求め、第１の
解析手段で行毎に、現在の行とそれ以前の行との位置情
報の関係及び文書構造位置規則に基づいて、現在の行の
文書構造を解析し見出し等の文書構造名を定め、第１の
解析手段で文書構造名が定まらない行については、第２
の解析手段で、当該行と文書全体との位置情報の関係及
び文書構造局所規則に基づいて、当該行の文書構造を解
析し見出し等の文書構造名を定める。この場合、これら
文字構造位置規則や文字構造局所規則は、従来の文書構
造一般規則が詳細なのに比べると簡素であるから、規則
作成の作業負担が軽減する。請求項５の発明では、記憶
手段に、従来の文書構造一般規則の代りに、文書構造を
文書の位置情報から判断するための所定の文書構造位置
規則と、文書構造を文書の全体的な情報から判断するた
めの簡素な文書構造一般規則とを記憶しておき、位置情
報抽出手段で、文書データから文書全体の開始位置及び
終了位置、各行の開始位置及び終了位置を求め、第１の
解析手段で行毎に、現在の行とそれ以前の行との位置情
報の関係及び文書構造位置規則に基づいて、現在の行の
文書構造を解析し見出し等の文書構造名を定め、第１の
解析手段で文書構造名が定まらない行について、第２の
解析手段で、当該行と文書全体との位置情報の関係及び
文書構造一般規則に基づいて、当該行の文書構造を解析
し、見出し等の文書構造名を定める。この場合の文書構
造一般規則は、従来の文書構造一般規則ほどに詳細な必
要がない。従って、規則作成時の負担が軽減する。請求
項６の発明では、上記の如く解析された文書構造名を、
変換手段で「名詞句」等の文法項目に変換したのち、解
析手段でこの文法項目名を文に付加して同文の機械翻訳
用の構造の解析を行ったのち、翻訳手段で翻訳する。機
械翻訳では、文法項目が指定されれば、その指定を優先
して、指定された文法項目となるように文を解析し、翻
訳する。従って、例えば英文の名詞句を命令文や通常文
に誤って日本語に翻訳することが減少し、翻訳精度が向
上する。Therefore, in the invention of claim 1, instead of the conventional general rule for document structure, a predetermined document structure position rule for judging the document structure from position information of the start and end of lines,
A predetermined document structure local rule for judging the document structure from the local information of the document is created in advance, and these are used to morphologically analyze the document structure from the extracted position information. These character structure position rules and character structure local rules are simpler than the conventional general document structure rules, and therefore the work load of rule creation is reduced. Claim 2
In the invention, a predetermined document structure position rule for determining the document structure from the line start and end position information and a predetermined document structure general rule for determining the document structure from the overall information of the document. The document structure is morphologically analyzed by using the extracted position information, but the document structure general rule in this case does not need to be as detailed as the conventional document structure general rule. Therefore, the burden of creating rules is reduced. In the invention of claim 3, the item name of the document structure analyzed as described above is
After converting to a grammatical item such as "noun phrase", this grammatical item name is added to the sentence to analyze the structure of the same sentence for machine translation, and then translated. In machine translation, when a grammatical item is specified, the sentence is parsed and translated so that the specified grammatical item is given priority to the specification. In the invention of claim 4, instead of the conventional general rule for document structure, a predetermined document structure position rule for determining the document structure from the position information of the document and the document structure for the local information of the document are stored in the storage means. A predetermined document structure local rule for judging from the above is stored, and the position information extraction means obtains the start position and end position of the entire document, the start position and end position of each line from the document data, and the first analysis means. For each line, the document structure of the current line is analyzed based on the relationship between the position information of the current line and the lines before it and the document structure position rule, and the document structure name such as a heading is determined. For lines where the document structure name cannot be determined by means,
The analysis means analyzes the document structure of the line based on the positional information relation between the line and the entire document and the document structure local rule to determine the document structure name such as a headline. In this case, these character structure position rules and character structure local rules are simpler than the conventional document structure general rules, and therefore the work load of rule creation is reduced. According to the invention of claim 5, instead of the conventional general rule for the document structure, a predetermined document structure position rule for determining the document structure from the position information of the document and the document structure for the whole information of the document are stored in the storage means. A simple document structure general rule for judging from the above is stored, and the position information extraction means obtains the start position and end position of the entire document, the start position and end position of each line from the document data, and the first analysis For each line, the document structure of the current line is analyzed based on the relation between the position information of the current line and the lines before it and the document structure position rule, and the document structure name such as a heading is determined. For the line whose document structure name is not determined by the analysis means, the second analysis means analyzes the document structure of the line based on the relationship between the position information of the line and the entire document and the document structure general rule, and the heading etc. Specify the document structure name of. The document structure general rule in this case does not need to be as detailed as the conventional document structure general rule. Therefore, the burden of creating rules is reduced. In the invention of claim 6, the document structure name analyzed as described above is
After the conversion means converts the grammatical item such as "noun phrase" into the sentence, the analyzing means adds the grammar item name to the sentence to analyze the structure for machine translation of the sentence, and then translates the sentence by the translating means. In machine translation, when a grammatical item is specified, the sentence is parsed and translated so that the specified grammatical item is given priority to the specification. Therefore, for example, it is less likely that the English noun phrase is erroneously translated into an imperative sentence or a normal sentence into Japanese, and the translation accuracy is improved.

【００１０】[0010]

【実施例】以下、本発明をその実施例とともに図面を参
照して説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to the drawings along with its embodiments.

【００１１】＜実施例１：文書構造解析方法＞図１，図
４及び図５を参照して、文書構造位置規則１００と、文
書構造局所規則２００とを用いた文書構造解析方法を説
明する。図１において、文書が入力されると、まずステ
ップＳ１にて位置情報抽出を行う。具体的には、文書の
各行についてその行の開始位置及び終了位置（Ｘ_S，Ｘ
_E）を文字数を単位にして求める。また、文書全体の中
で、全行の行開始位置及び行終了位置について、それぞ
れ最も発生数の多いものを調べ、これらを文書全体の開
始位置及び終了位置と定める。以下の説明では、文書全
体の開始位置及び終了位置を、簡単のため（０，４０）
とする。<First Embodiment: Document Structure Analysis Method> FIG. 1 and FIG.
4 and FIG. 5, the document structure position rule 100 and the sentence
Describes the document structure analysis method using the document structure local rule 200
Reveal In FIG. 1, when a document is input, first the step
Position information is extracted at step S1. Specifically,
For each line, the start position and end position (X_S, X
_E) Is calculated in terms of the number of characters. Also, in the entire document
For the line start position and line end position of all lines,
Check the most frequently occurring ones and open them in the whole document.
The start position and the end position are defined. In the following description, the entire document
For easy start and end position of body (0,40)
And

【００１２】次に、ステップＳ２で、位置情報と、文書
構造位置規則１００及び文書構造局所規則２００とを用
いて、行単位に文書構造の解析を行う。文書構造位置規
則１００の一例を図４（ａ），（ｂ）に示す。この例で
は図４（ｂ）にも示すように、直前の行が本文であり、現在の行と直前と行の行頭の位置（開始位置）が同
じであれば、その解析対象を「本文」とする、という規則であ
る。また、文書構造局所規則２００の一例を図５（ａ），
（ｂ）に示す。この例では図５（ｂ）には示すように、［文書全体の開始位置］に［見出しの左インデン
ト］を足した位置より、左に行の開始位置があり、［文書全体の終了位置］から［見出しの右インデン
ト］を引いた位置より、左に行の終了位置があり、解析対象の長さは０でなく、解析対象の文頭が個条書きの時に使われる特定の記
号で始まっていない時に、その解析対象を「見出し」とする、という規則であ
る。このステップＳ２の行単位の解析では、文書構造位置規
則１００と文書構造局所規則２００を用い、また各行毎
に、現在の行の開始位置Ｘ_Sと文書全体の開始位置０と
の比較、現在の行の終了位置Ｘ_Eと文書全体の終了位置
４０との比較を行い、更にそれより以前の行がもし存在
すれば、現在の行と前の行の開始位置の比較、現在の行
と前の行の終了位置の比較を行う。そして、以前の行と
現在の行の開始位置の差及び終了位置の差が、設定した
数値以内であり、且つ現在の行が以前の行の文書構造名
となるための文書構造位置規則１００中の条件を満足す
る場合に、現在の行に以前の行と同じ文書構造名を付与
する。ステップＳ２で文書構造名が与えられない場合
は、次のステップＳ３に移り、別途文書構造解析を行
う。ステップＳ３では、現在の行の開始位置と文書全体
の開始位置との比較、現在の行の終了位置と文書全体の
終了位置との比較を行い、文書構造局所規則２００を用
いて、文書構造名の候補を探す。そして、文書構造名の
候補毎に、文書構造位置規則１００及び文書構造局所規
則２００を用いてどの文書構造名の条件を現在の行が満
たすか否かを判断し、条件を満たした文書構造名を与え
る。このようにして文書構造の解析を行単位に行い、最
終行まで行ったらステップＳ４で解析を終了とするが、
そうでなければ、次の行の文解析をステップＳ２に戻っ
て行う。Next, in step S2, the document structure is analyzed line by line using the position information and the document structure position rule 100 and the document structure local rule 200. An example of the document structure position rule 100 is shown in FIGS. In this example, as shown in FIG. 4 (b), if the previous line is the text and the current line and the position just before the line (start position) are the same, the analysis target is "text". The rule is that Further, an example of the document structure local rule 200 is shown in FIG.
It shows in (b). In this example, as shown in FIG. 5B, there is a line start position to the left of the position obtained by adding [left indent of heading] to [start position of entire document], and [end position of entire document]. The end position of the line is to the left of the position obtained by subtracting [Right indent of heading] from, the length of the analysis target is not 0, and the beginning of the analysis target starts with a specific symbol used when writing individual items. The rule is that when there is not, the analysis target is the "headline". In the line-by-line analysis in step S2, the document structure position rule 100 and the document structure local rule 200 are used, and the start position X _S of the current line and the start position 0 of the entire document are compared for each line. The end position X _E of the line is compared with the end position 40 of the entire document, and if a line before that is present, the start position of the current line is compared with the start position of the previous line. Compare the end positions of lines. In the document structure position rule 100, the difference between the start position and the end position of the previous line and the current line is within a set numerical value, and the current line becomes the document structure name of the previous line. When the condition of is satisfied, the same document structure name as the previous line is given to the current line. If the document structure name is not given in step S2, the process moves to the next step S3 to separately analyze the document structure. In step S3, the start position of the current line is compared with the start position of the entire document, the end position of the current line is compared with the end position of the entire document, and the document structure local rule 200 is used to determine the document structure name. Find a candidate for. Then, for each candidate of the document structure name, it is determined using the document structure position rule 100 and the document structure local rule 200 which condition of the document structure name the current line satisfies, and the document structure name that satisfies the condition. give. In this way, the document structure is analyzed line by line, and when the last line is reached, the analysis is terminated in step S4.
Otherwise, the sentence analysis of the next line is returned to step S2.

【００１３】次に、具体例を説明する。例えば、ステッ
プＳ２の文解析の入力が［（空白部分）B1.The Program
lists］であるとする。この行の開始及び終了位置を文
字数で求めると、（１，２１）となる。直前の行が過去
の動作において文書構造として［本文］を与えられてお
り、また直前の行の開始及び終了位置は（０，３５）と
する。図４の［文書構造位置規則の例］では、開始位置
が同じでないため、現在の行の文書構造名は［本文］と
はならない。そこで現在の行については文書構造名が与
えられていないことになるので、ステップＳ３の文書構
造解析において、文書構造局所規則２００、文書構造位
置規則１００により解析処理を行なう。図５の［文書構
造局所規則の例］では、文書全体の開始位置及び終了位
置と、現在の行の開始位置及び終了位置との関係が条件
を満たし、且つ解析対象の行の長さは１９なので、０以
上の条件を満たし、また行頭の文字列は［Ｂ１］であ
り、個条書きの始まり記号でもない。このため、現在の
行の文書構造名は［見出し］であると解析される。Next, a specific example will be described. For example, the input of the sentence analysis in step S2 is [(blank part) B1. The Program
lists]. When the start and end positions of this line are obtained by the number of characters, it becomes (1,21). [Body] is given to the previous line as the document structure in the past operation, and the start and end positions of the previous line are (0, 35). In [Example of document structure position rule] in FIG. 4, since the start positions are not the same, the document structure name of the current line is not [body]. Therefore, since the document structure name is not given to the current line, analysis processing is performed by the document structure local rule 200 and the document structure position rule 100 in the document structure analysis in step S3. In [Example of local rule for document structure] in FIG. 5, the relationship between the start position and end position of the entire document and the start position and end position of the current line satisfies the condition, and the line length of the analysis target is 19 Therefore, the condition of 0 or more is satisfied, the character string at the beginning of the line is [B1], and it is not the start symbol of item writing. Therefore, the document structure name of the current line is analyzed as [heading].

【００１４】＜実施例２：文書構造解析装置＞次に図７
を参照して、上述した実施例１の文書構造解析方法を用
いて実現した文書構造解析装置３００の実施例を説明す
る。図７において、この文書構造解析装置３００は、位
置情報抽出部１と、行単位の文解析部２と、文書構造解
析部３と、文書構造位置規則の記憶部４と、文書構造局
所規則の記憶部５と、処理制御部６と、文書データの記
憶部７と、位置情報の記憶部８と、解析された文書構造
名の記憶部９と、出力部１０とからなる。このうち、処
理制御部６は全体の処理を管理する。また、記憶部７に
は入力された文書データが記憶される。出力部１０から
は、解析結果である各行の文書構造名が、必要に応じて
その行の文書データに付加されて出力される。<Second Embodiment: Document Structure Analyzing Device> Next, referring to FIG.
An example of the document structure analyzing apparatus 300 realized by using the above-described document structure analyzing method according to the first embodiment will be described with reference to FIG. In FIG. 7, the document structure analysis device 300 includes a position information extraction unit 1, a line-by-line sentence analysis unit 2, a document structure analysis unit 3, a document structure position rule storage unit 4, and a document structure local rule. The storage unit 5, the processing control unit 6, the document data storage unit 7, the position information storage unit 8, the analyzed document structure name storage unit 9, and the output unit 10. Of these, the processing control unit 6 manages the entire processing. Further, the input document data is stored in the storage unit 7. From the output unit 10, the document structure name of each line, which is the analysis result, is added to the document data of the line as necessary and output.

【００１５】図７において、位置情報抽出部１は記憶部
７から文書を入力すると、まずステップＳ１にて位置情
報抽出を行う。具体的には、文書の各行についてその行
の開始位置及び終了位置（Ｘ_S，Ｘ_E）を文字数を単位
にして求め記憶部８に記憶する。また、文書全体の中
で、全行の行開始位置及び行終了位置について、それぞ
れ最も発生数の多いものを調べ、これらを文書全体の開
始位置及び終了位置と定め記憶部８に記憶する。以下の
説明では、文書全体の開始位置及び終了位置を、簡単の
ため（０，４０）とする。文解析部２は、文書構造位置
規則１００と文書構造局所規則２００を用い、また各行
毎に、現在の行の開始位置Ｘ_Sと文書全体の開始位置０
との比較、現在の行の終了位置Ｘ_Eと文書全体の終了位
置４０との比較を行い、更にそれより以前の行がもし存
在すれば、現在の行と前の行の開始位置の比較、現在の
行と前の行の終了位置の比較を行う。そして、以前の行
と現在の行の開始位置の差及び終了位置の差が、設定し
た数値以内であり、且つ現在の行が以前の行の文書構造
名となるための文書構造位置規則１００中の条件を満足
する場合に、現在の行に以前の行と同じ文書構造名を付
与し、記憶部９に記憶する。文書構造名が与えられない
場合は、文書構造解析部３が、別途文書構造解析を行
う。文書構造解析部３では、現在の行の開始位置と文書
全体の開始位置との比較、現在の行の終了位置と文書全
体の終了位置との比較を行い、文書構造局所規則２００
を用いて、文書構造名の候補を探す。そして、文書構造
名の候補毎に、文書構造位置規則１００及び文書構造局
所規則２００を用いてどの文書構造名の条件を現在の行
が満たすか否かを判断し、条件を満たした文書構造名を
与える。このようにして文書構造の解析を行単位に行
い、最終行まで行ったらステップＳ４で解析を終了とす
るが、そうでなければ、次の行の文解析を文解析部２に
戻って行う。In FIG. 7, when the position information extraction unit 1 inputs a document from the storage unit 7, first, position information extraction is performed in step S1. Specifically, for each line of the document, the start position and end position (X _S , X _E ) of the line are obtained in units of the number of characters and stored in the storage unit 8. In addition, the line start position and line end position of all lines in the entire document are checked for the highest number of occurrences, and these are set as the start position and end position of the entire document and stored in the storage unit 8. In the following description, the start position and end position of the entire document are (0, 40) for simplicity. The sentence analysis unit 2 uses the document structure position rule 100 and the document structure local rule 200, and for each line, the start position X _S of the current line and the start position 0 of the entire document are 0.
, And the end position X _E of the current line with the end position 40 of the entire document, and if there is a line before that, compare the current line with the start position of the previous line, Compare the ending positions of the current line and the previous line. In the document structure position rule 100, the difference between the start position and the end position of the previous line and the current line is within a set numerical value, and the current line becomes the document structure name of the previous line. When the condition of is satisfied, the current line is given the same document structure name as the previous line and stored in the storage unit 9. When the document structure name is not given, the document structure analysis unit 3 separately analyzes the document structure. The document structure analysis unit 3 compares the start position of the current line with the start position of the entire document, and compares the end position of the current line with the end position of the entire document to determine the document structure local rule 200.
Use to search for candidate document structure names. Then, for each candidate of the document structure name, it is determined using the document structure position rule 100 and the document structure local rule 200 which condition of the document structure name the current line satisfies, and the document structure name that satisfies the condition. give. In this way, the document structure is analyzed line by line, and when the process reaches the last line, the analysis ends in step S4. If not, the sentence analysis of the next line is returned to the sentence analysis unit 2.

【００１６】＜実施例３：文書構造解析方法＞図２及び
図４を参照して、先に説明した文書構造位置規則１００
と、簡素化した文書構造一般規則４００とを用いた文書
構造解析方法を説明する。図２において、文書が入力さ
れると、まずステップＳ１１にて位置情報抽出を行う。
具体的には、文書の各行についてその行の開始位置及び
終了位置（Ｘ_S，Ｘ_E）を文字数を単位にして求める。
また、文書全体の中で、全行の行開始位置及び行終了位
置について、それぞれ最も発生数の多いものを調べ、こ
れらを文書全体の開始位置及び終了位置と定める。以下
の説明でも、文書全体の開始位置及び終了位置を、簡単
のため（０，４０）とする。<Third Embodiment: Document Structure Analysis Method> With reference to FIGS. 2 and 4, the document structure position rule 100 described above will be described.
A document structure analysis method using the simplified document structure general rule 400 will be described. In FIG. 2, when a document is input, position information is first extracted in step S11.
Specifically, for each line of the document, the start position and end position (X _S , X _E ) of the line are obtained in units of the number of characters.
Also, the line start position and line end position of all lines in the entire document are checked for the most frequently occurring ones, and these are defined as the start position and end position of the entire document. Also in the following description, the start position and the end position of the entire document are (0, 40) for simplicity.

【００１７】次に、ステップＳ１２で、位置情報と、文
書構造位置規則１００及び簡素化した文書構造一般規則
４００とを用いて、行単位に文書構造の解析を行う。こ
のステップＳ１２の行単位の解析では、文書構造位置規
則１００と文書構造一般規則４００を用い、また各行毎
に、現在の行の開始位置Ｘ_Sと文書全体の開始位置０と
の比較、現在の行の終了位置Ｘ_Eと文書全体の終了位置
４０との比較を行い、更にそれより以前の行がもし存在
すれば、現在の行と前の行の開始位置の比較、現在の行
と前の行の終了位置の比較を行う。そして、以前の行と
現在の行の開始位置の差及び終了位置の差が、設定した
数値以内であり、且つ現在の行が以前の行の文書構造名
となるための文書構造位置規則１００中の条件を満足す
る場合に、現在の行に以前の行と同じ文書構造名を付与
する。ステップＳ１２で文書構造名が与えられない場合
は、次のステップＳ１３に移り、別途文書構造解析を行
う。ステップＳ１３では、現在の行の開始位置と文書全
体の開始位置との比較、現在の行の終了位置と文書全体
の終了位置との比較を行い、文書構造一般規則４００を
用いて、文書構造名の候補を探す。そして、文書構造名
の候補毎に、文書構造位置規則１００及び文書構造一般
規則４００を用いてどの文書構造名の条件を現在の行が
満たすか否かを判断し、条件を満たした文書構造名を与
える。このようにして文書構造の解析を行単位に行い、
最終行まで行ったらステップＳ１４で解析を終了とする
が、そうでなければ、次の行の文解析をステップＳ１２
に戻って行う。Next, in step S12, the document structure is analyzed line by line using the position information and the document structure position rule 100 and the simplified document structure general rule 400. In the line-by-line analysis in step S12, the document structure position rule 100 and the document structure general rule 400 are used, and for each line, the start position X _S of the current line and the start position 0 of the entire document are compared, The end position X _E of the line is compared with the end position 40 of the entire document, and if a line before that is present, the start position of the current line is compared with the start position of the previous line. Compare the end positions of lines. In the document structure position rule 100, the difference between the start position and the end position of the previous line and the current line is within a set numerical value, and the current line becomes the document structure name of the previous line. When the condition of is satisfied, the same document structure name as the previous line is given to the current line. When the document structure name is not given in step S12, the process proceeds to the next step S13, and the document structure analysis is separately performed. In step S13, the start position of the current line and the start position of the entire document are compared, the end position of the current line and the end position of the entire document are compared, and the document structure name rule 400 is used. Find a candidate for. Then, for each candidate of the document structure name, it is judged using the document structure position rule 100 and the document structure general rule 400 which condition of the document structure name the current line satisfies, and the document structure name which satisfies the condition. give. In this way, the document structure is analyzed line by line,
When the process reaches the last line, the analysis is ended in step S14. If not, the sentence analysis of the next line is performed in step S12.
Go back to.

【００１８】＜実施例４：文書構造解析装置＞次に図８
を参照して、上述した実施例３の文書構造解析方法を用
いて実現した文書構造解析装置５００の実施例を説明す
る。図８において、この文書構造解析装置５００は、位
置情報抽出部１と、行単位の文解析部２と、文書構造解
析部３と、文書構造位置規則の記憶部４と、文書構造一
般規則の記憶部１１と、処理制御部６と、文書データの
記憶部７と、位置情報の記憶部８と、解析された文書構
造名の記憶部９と、出力部１０とからなる。このうち、
処理制御部６は全体の処理を管理する。また、記憶部７
には入力された文書データが記憶される。出力部１０か
らは、解析結果である各行の文書構造名が、必要に応じ
てその行の文書データに付加されて出力される。<Fourth Embodiment: Document Structure Analysis Device> Next, referring to FIG.
An example of the document structure analysis apparatus 500 realized by using the document structure analysis method of the third embodiment described above will be described with reference to FIG. In FIG. 8, the document structure analysis apparatus 500 includes a position information extraction unit 1, a line-by-line sentence analysis unit 2, a document structure analysis unit 3, a document structure position rule storage unit 4, and a document structure general rule. A storage unit 11, a processing control unit 6, a document data storage unit 7, a position information storage unit 8, a analyzed document structure name storage unit 9, and an output unit 10. this house,
The processing control unit 6 manages the entire processing. In addition, the storage unit 7
The input document data is stored in. From the output unit 10, the document structure name of each line, which is the analysis result, is added to the document data of the line as necessary and output.

【００１９】図８において、位置情報抽出部１は記憶部
７から文書を入力すると、位置情報抽出を行う。具体的
には、文書の各行についてその行の開始位置及び終了位
置（Ｘ_S，Ｘ_E）を文字数を単位にして求め記憶部８に
記憶する。また、文書全体の中で、全行の行開始位置及
び行終了位置について、それぞれ最も発生数の多いもの
を調べ、これらを文書全体の開始位置及び終了位置と定
め記憶部８に記憶する。以下の説明では、文書全体の開
始位置及び終了位置を、簡単のため（０，４０）とす
る。文解析部２は、文書構造位置規則１００と簡素な文
書構造一般規則４００を用い、また各行毎に、現在の行
の開始位置Ｘ_Sと文書全体の開始位置０との比較、現在
の行の終了位置Ｘ_Eと文書全体の終了位置４０との比較
を行い、更にそれより以前の行がもし存在すれば、現在
の行と前の行の開始位置の比較、現在の行と前の行の終
了位置の比較を行う。そして、以前の行と現在の行の開
始位置の差及び終了位置の差が、設定した数値以内であ
り、且つ現在の行が以前の行の文書構造名となるための
文書構造位置規則１００中の条件を満足する場合に、現
在の行に以前の行と同じ文書構造名を付与し、記憶部９
に記憶する。文書構造名が与えられない場合は、文書構
造解析部３が、別途文書構造解析を行う。文書構造解析
部３では、現在の行の開始位置と文書全体の開始位置と
の比較、現在の行の終了位置と文書全体の終了位置との
比較を行い、文書一般局所規則４００を用いて、文書構
造名の候補を探す。そして、文書構造名の候補毎に、文
書構造位置規則１００及び文書構造一般規則４００を用
いてどの文書構造名の条件を現在の行が満たすか否かを
判断し、条件を満たした文書構造名を与える。このよう
にして文書構造の解析を行単位に行い、最終行まで行っ
たらステップＳ４で解析を終了とするが、そうでなけれ
ば、次の行の文解析を文解析部２に戻って行う。In FIG. 8, when the position information extracting unit 1 inputs a document from the storage unit 7, the position information extracting unit 1 extracts the position information. Specifically, for each line of the document, the start position and end position (X _S , X _E ) of the line are obtained in units of the number of characters and stored in the storage unit 8. In addition, the line start position and line end position of all lines in the entire document are checked for the highest number of occurrences, and these are set as the start position and end position of the entire document and stored in the storage unit 8. In the following description, the start position and end position of the entire document are (0, 40) for simplicity. The sentence analysis unit 2 uses the document structure position rule 100 and the simple document structure general rule 400, and for each line, compares the start position X _S of the current line with the start position 0 of the entire document, A comparison is made between the end position X _E and the end position 40 of the entire document, and if there is a line before that, compare the start position of the current line and the previous line, compare the current line and the previous line. Compare end positions. In the document structure position rule 100, the difference between the start position and the end position of the previous line and the current line is within a set numerical value, and the current line becomes the document structure name of the previous line. When the condition of is satisfied, the current line is given the same document structure name as the previous line, and the storage unit 9
Remember. When the document structure name is not given, the document structure analysis unit 3 separately analyzes the document structure. The document structure analysis unit 3 compares the start position of the current line with the start position of the entire document, compares the end position of the current line with the end position of the entire document, and uses the document general local rule 400. Search for candidate document structure names. Then, for each candidate of the document structure name, it is judged using the document structure position rule 100 and the document structure general rule 400 which condition of the document structure name the current line satisfies, and the document structure name which satisfies the condition. give. In this way, the document structure is analyzed line by line, and when the process reaches the last line, the analysis ends in step S4. If not, the sentence analysis of the next line is returned to the sentence analysis unit 2.

【００２０】次に図３を参照して、先の実施例１，２ま
たは実施例３，４の文書構造解析技術により得られた行
毎の文書構造名を用いて、機械翻訳を行う方法の実施例
を説明する。図３において、文書が入力されると、先ず
ステップＳ２１において、先の実施例１，２または実施
例３，４で説明した方法または装置により、行毎に文書
構造名を解析する。この解析後、次のステップＳ２２に
おいて、先に得られた文書構造名を、文書構造名から予
想される文法項目への変換テーブル６００を用いて文法
項目に変換する。例えば、英日翻訳の場合、ステップＳ
２１の入力が［B1.The Program lists］として、解析結
果の文書構造が［見出し］であるとすると、文書構造名
［見出し］を変換して、［名詞句］を得る。Next, referring to FIG. 3, a method of performing machine translation using the document structure name for each line obtained by the document structure analysis technique of the first or second embodiment or the third or fourth embodiment will be described. An example will be described. In FIG. 3, when a document is input, first in step S21, the document structure name is analyzed line by line by the method or apparatus described in the first and second embodiments or the third and fourth embodiments. After this analysis, in the next step S22, the previously obtained document structure name is converted into a grammar item using the conversion table 600 for converting the document structure name into an expected grammar item. For example, in the case of English-Japanese translation, step S
If the input of 21 is [B1. The Program lists] and the document structure of the analysis result is [heading], the document structure name [heading] is converted to obtain [noun phrase].

【００２１】次に、ステップＳ２３において、行の文と
その文法項目とを１組にし、指定された文法項目となる
ように、文を文法的及び意味的に解析する。例えば、文
献項目として、［名詞句］、文として［B1.The Program
lists］を与えられると、入力文が名詞句となるように
解析される。この時、指定文法項目が与えられない場合
に生じていた、［B1.The Program lists］が通常文や命
令文などに誤って解析される場合が減少する。更に、次
のステップＳ２４において、先のステップＳ２３で解析
された文を別の言語の文法または意味的な構造に変換す
る。例えば、先の例では、文法項目として名詞句を持つ
解析構造が得られる。そして、次のステップＳ２５にお
いて、先のステップＳ２４で変換された文法または意味
的な構造から、前記別の言語の文を生成する。かくして
翻訳文が作成される。例えば、［B1. プログラムリス
ト］のように、翻訳結果が名詞句の文が作成される。Next, in step S23, the sentence on the line and its grammatical item are grouped into one set, and the sentence is grammatically and semantically analyzed so as to obtain the designated grammatical item. For example, the reference item is [noun phrase] and the sentence is [B1. The Program
lists] is given, the input sentence is parsed into noun phrases. At this time, the number of cases in which [B1. The Program lists] is erroneously parsed into a normal sentence or imperative sentence, which occurs when the designated grammar item is not given, is reduced. Further, in the next step S24, the sentence analyzed in the previous step S23 is converted into a grammar or a semantic structure of another language. For example, in the above example, an analysis structure having a noun phrase as a grammar item is obtained. Then, in the next step S25, a sentence in the another language is generated from the grammar or the semantic structure converted in the previous step S24. Thus, the translated text is created. For example, a sentence whose translation result is a noun phrase is created like [B1. Program list].

【００２２】＜実施例６：機械翻訳装置＞次に、実施例
２または実施例４の文書構造解析措置３００または５０
０を用いた機械翻訳装置の実施例を、図９を参照して説
明する。図９において、機械翻訳装置は文書構造解析装
置３００または５００と、変換部１２と、解析部１３
と、変換部１４と、生成部１５、からなる。文書構造解
析装置３００または５００は文書データが入力される
と、先の実施例１〜４の方法により、行毎に文書構造名
を解析する。変換部１２は、この解析結果の文書構造名
を、文書構造名から予想される文法項目への変換テーブ
ルを参照して、文法項目に変換する。解析部１３は、こ
の変換された文法項目と、行の文データとを用いて、指
定された文法項目となるように、行の文を文法的及び意
味的に解析し、変換部１４に与える。変換部１４は解析
された文を、別の言語の文法または意味的な構造に変換
し、生成部１５に与える。生成部１５は変換された文法
または意味的な構造から、別の言語の文を生成する、即
ち翻訳する。<Embodiment 6: Machine translation device> Next, the document structure analyzing device 300 or 50 of the embodiment 2 or 4
An embodiment of a machine translation device using 0 will be described with reference to FIG. In FIG. 9, the machine translation device is a document structure analysis device 300 or 500, a conversion unit 12, and an analysis unit 13.
And a conversion unit 14 and a generation unit 15. When the document data is input, the document structure analysis device 300 or 500 analyzes the document structure name for each line by the method of the first to fourth embodiments. The conversion unit 12 converts the document structure name of the analysis result into a grammar item by referring to a conversion table for converting the document structure name into an expected grammar item. The analysis unit 13 uses the converted grammar item and the sentence data of the line to grammatically and semantically analyze the sentence of the line so as to obtain the specified grammatical item, and gives the sentence to the conversion unit 14. . The conversion unit 14 converts the parsed sentence into a grammar or a semantic structure of another language, and gives it to the generation unit 15. The generation unit 15 generates, that is, translates a sentence in another language from the converted grammar or semantic structure.

【００２３】[0023]

【発明の効果】請求項１の発明では、従来の文書構造一
般規則の代りに、文書構造を行の開始及び終了の位置情
報から判断するための所定の文書構造位置規則と、文書
構造を文書の局所的な情報から判断するための所定の文
書構造局所規則とを予め作成しておき、これらを用い
て、前記抽出された位置情報から文書構造を形態的に解
析する。従って、これら文字構造位置規則や文字構造局
所規則は、従来の文書構造一般規則が詳細なのに比べる
と簡素であるから、規則作成の作業負担が軽減する。請
求項２の発明では、文書構造を行の開始及び終了の位置
情報から判断するための所定の文書構造位置規則と、文
書構造を文書の全体的な情報から判断するための所定の
文書構造一般規則とを用いて、前記抽出された位置情報
から文書構造を形態的に解析するが、この場合の文書構
造一般規則は、文書構造位置規則ほどに詳細な必要がな
い。従って、規則作成時の負担が軽減する。請求項３の
発明では、上記の如く解析された文書構造の項目名を、
「名詞句」等の文法項目に変換したのち、この文法項目
名を文に付加して同文の機械翻訳用の構造の解析を行っ
たのち、翻訳する。機械翻訳では、文法項目が指定され
れば、その指定を優先して、指定された文法項目となる
ように文を解析し、翻訳する。従って、名詞句を命令文
や通常文に誤訳をすることがなくなる。請求項４の発明
では、記憶手段に、従来の文書構造一般規則の代りに、
文書構造を文書の位置情報から判断するための所定の文
書構造位置規則と、文書構造を文書の局所的な情報から
判断するための所定の文書構造局所規則とを記憶してお
き、位置情報抽出手段で文書データから文書全体の開始
位置及び終了位置、各行の開始位置及び終了位置を求
め、第１の解析手段で行毎に、現在の行とそれ以前の行
との位置情報の関係及び文書構造位置規則に基づいて、
現在の行の文書構造を解析し見出し等の文書構造名を定
め、第１の解析手段で文書構造名が定まらない行につい
ては、第２の解析手段で、当該行と文書全体との位置情
報の関係及び文書構造局所規則に基づいて、当該行の文
書構造を解析し見出し等の文書構造名を定める。この場
合、これら文字構造位置規則や文字構造局所規則は、従
来の文書構造一般規則が詳細なのに比べると簡素である
から、規則作成の作業負担が軽減する。請求項５の発明
では、記憶手段に、従来の文書構造一般規則の代りに、
文書構造を文書の位置情報から判断するための所定の文
書構造位置規則と、文書構造を文書の全体的な情報から
判断するための簡素な文書構造一般規則とを記憶してお
き、位置情報抽出手段で、文書データから文書全体の開
始位置及び終了位置、各行の開始位置及び終了位置を求
め、第１の解析手段で行毎に、現在の行とそれ以前の行
との位置情報の関係及び文書構造位置規則に基づいて、
現在の行の文書構造を解析し見出し等の文書構造名を定
め、第１の解析手段で文書構造名が定まらない行につい
て、第２の解析手段で、当該行と文書全体との位置情報
の関係及び文書構造一般規則に基づいて、当該行の文書
構造を解析し、見出し等の文書構造名を定める。この場
合の文書構造一般規則は、従来の文書構造一般規則ほど
に詳細な必要がない。従って、規則作成時の負担が軽減
する。請求項６の発明では、上記の如く解析された文書
構造名を、変換手段で「名詞句」等の文法項目に変換し
たのち、解析手段でこの文法項目名を文に付加して同文
の機械翻訳用の構造の解析を行ったのち、翻訳手段で翻
訳する。機械翻訳では、文法項目が指定されれば、その
指定を優先して、指定された文法項目となるように文を
解析し、翻訳する。従って、例えば英文の名詞句を命令
文や通常文に誤って日本語に翻訳することが減少し、翻
訳精度が向上する。According to the first aspect of the invention, instead of the conventional document structure general rule, a predetermined document structure position rule for judging the document structure from the position information of the start and end of a line, and the document structure A predetermined document structure local rule for judging from the local information is prepared in advance, and these are used to morphologically analyze the document structure from the extracted position information. Therefore, these character structure position rules and character structure local rules are simpler than the conventional document structure general rules, and the work load of rule creation is reduced. According to the invention of claim 2, a predetermined document structure position rule for judging the document structure from the line start and end position information and a predetermined document structure general for judging the document structure from the entire document information. The rule is used to morphologically analyze the document structure from the extracted position information, but the general rule for the document structure in this case does not need to be as detailed as the document structure position rule. Therefore, the burden of creating rules is reduced. In the invention of claim 3, the item name of the document structure analyzed as described above is
After converting to a grammatical item such as "noun phrase", this grammatical item name is added to the sentence to analyze the structure of the same sentence for machine translation, and then translated. In machine translation, when a grammatical item is specified, the sentence is parsed and translated so that the specified grammatical item is given priority to the specification. Therefore, the noun phrase is not mistranslated into an imperative sentence or a normal sentence. According to the invention of claim 4, in the storage means, instead of the conventional general rule for document structure,
A predetermined document structure position rule for judging the document structure from the position information of the document and a predetermined document structure local rule for judging the document structure from the local information of the document are stored and the position information is extracted. The start position and the end position of the entire document, the start position and the end position of each line are obtained from the document data by the means, and the relationship between the position information of the current line and the lines before the line and the document are obtained for each line by the first analysis means. Based on the structural position rule,
For a line in which the document structure name such as a heading is determined by analyzing the document structure of the current line and the document structure name is not determined by the first analysis means, the second analysis means determines the position information of the line and the entire document. The document structure of the line is analyzed and the document structure name such as a heading is determined based on the relation of (1) and the local rule of document structure. In this case, these character structure position rules and character structure local rules are simpler than the conventional document structure general rules, and therefore the work load of rule creation is reduced. According to the invention of claim 5, instead of the conventional general rule for document structure,
A predetermined document structure position rule for judging the document structure from the position information of the document and a simple document structure general rule for judging the document structure from the overall information of the document are stored to extract the position information. By means of the means, the start position and end position of the entire document, and the start position and end position of each line are obtained from the document data, and the first analysis means for each line, the relation between the position information of the current line and the line before it and Based on the document structure position rule,
A document structure name such as a heading is determined by analyzing the document structure of the current line, and for a line for which the document structure name is not determined by the first analysis means, the second analysis means determines the position information of the line and the entire document. Based on the relation and document structure general rules, the document structure of the line is analyzed and the document structure name such as a heading is determined. The document structure general rule in this case does not need to be as detailed as the conventional document structure general rule. Therefore, the burden of creating rules is reduced. In the invention of claim 6, the document structure name analyzed as described above is converted into a grammatical item such as "noun phrase" by the converting means, and then the grammatical item name is added to the sentence by the analyzing means to add a machine of the same sentence. After the structure for translation is analyzed, it is translated by a translation means. In machine translation, when a grammatical item is specified, the sentence is parsed and translated so that the specified grammatical item is given priority to the specification. Therefore, for example, it is less likely that the English noun phrase is erroneously translated into an imperative sentence or a normal sentence into Japanese, and the translation accuracy is improved.

[Brief description of drawings]

【図１】本発明の実施例１のフローチャート。FIG. 1 is a flowchart of a first embodiment of the present invention.

【図２】本発明の実施例３のフローチャート。FIG. 2 is a flowchart of a third embodiment of the present invention.

【図３】本発明の実施例５のフローチャート。FIG. 3 is a flowchart of a fifth embodiment of the present invention.

【図４】文書構造位置規則の一例を示す図。FIG. 4 is a diagram showing an example of a document structure position rule.

【図５】文書構造局所規則の一例を示す図。FIG. 5 is a diagram showing an example of a document structure local rule.

【図６】従来技術のフローチャート。FIG. 6 is a flowchart of a conventional technique.

【図７】本発明の実施例２のブロック構成図。FIG. 7 is a block configuration diagram of a second embodiment of the present invention.

【図８】本発明の実施例４のブロック構成図。FIG. 8 is a block configuration diagram of a fourth embodiment of the present invention.

【図９】本発明の実施例６のブロック構成図。FIG. 9 is a block configuration diagram of a sixth embodiment of the present invention.

[Explanation of symbols]

１００文書構造位置規則２００文書構造局所規則３００，５００文書構造解析装置４００簡素化した文書構造一般規則１位置情報抽出部２文解析部３文書構造解析部４，５，７，８，９記憶部６処理制御部 100 document structure position rule 200 document structure local rule 300,500 document structure analysis device 400 simplified document structure general rule 1 position information extraction unit 2 sentence analysis unit 3 document structure analysis unit 4, 5, 7, 8, 9 storage unit 6 Processing control unit

Claims

[Claims]

1. A predetermined document structure position rule for extracting line start and end position information from a document, determining a document structure from line start and end position information, and a document structure for a document local position. And analyzing the document structure from the extracted position information using a predetermined document structure local rule for judging from the specific information.

2. A predetermined document structure position rule for extracting line start and end position information from a document, determining a document structure from line start and end position information, and a document structure for the entire document. A document structure analyzing method, comprising: analyzing a document structure from the extracted position information using a predetermined document structure general rule for making a determination based on specific information.

3. A document structure analysis method according to claim 1, wherein the document structure is analyzed, and the document structure obtained by this analysis is converted using a conversion table of a predetermined document structure name and a grammar item. To convert the sentence into a grammatical item, to analyze the sentence for machine translation so that the grammatical item obtained by the conversion is obtained, and to translate the sentence based on the analysis result. How to translate.

4. A storage unit for storing a predetermined document structure position rule for judging a document structure from document position information, and a predetermined document structure local rule for judging a document structure from local document information. And a position information extraction unit that obtains the start position and end position of the entire document, the start position and end position of each line from the document data, and the position information of the current line and the lines before it. For the line for which the document structure name is not determined by the first analyzing means and the document structure name of the heading and the like for analyzing the document structure of the current line based on the relationship between
A second analysis unit that analyzes the document structure of the line and determines the document structure name such as a heading based on the relationship between the position information of the line and the entire document and the document structure local rule. Document structure analysis device.

5. A storage unit for storing a predetermined document structure position rule for determining a document structure from document position information, and a predetermined document structure general rule for determining a document structure from general document information. And a position information extraction unit that obtains the start position and end position of the entire document, the start position and end position of each line from the document data, and the position information of the current line and the lines before it. For the line for which the document structure name is not determined by the first analyzing means and the document structure name of the heading and the like for analyzing the document structure of the current line based on the relationship between
A second analysis unit that analyzes the document structure of the line and determines a document structure name such as a heading based on the relationship between the position information of the line and the entire document and the general rule of the document structure. Document structure analysis device.

6. A document structure name such as a heading determined by the document structure analysis device based on the document structure analysis device according to claim 4 and a conversion table between a predetermined document structure name and a grammatical item. To a grammar item such as a noun phrase, an analysis unit that analyzes the sentence so that the grammatical item such as a noun phrase obtained by this conversion unit, and the analysis result of the sentence obtained by this analysis unit A machine translation device comprising: a translation unit that translates the sentence into a sentence in another language.