JPS60200359A

JPS60200359A - Simple sentence producer

Info

Publication number: JPS60200359A
Application number: JP59055509A
Authority: JP
Inventors: Akishige Masuyama; 増山　顕成
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1984-03-23
Filing date: 1984-03-23
Publication date: 1985-10-09

Abstract

PURPOSE:To facilitate easy natural language processing and its debug for collection, analysis, etc. of data by providing a word dividing means, a modification analyzing means, etc. and decomposing a natural language sentence containing complicated factors tangled with each other into a simple style. CONSTITUTION:A sentence supplied from an input part 2 is divided into words through a word dividing part 3 and then collated with the words stored in a dictionary 6 stored in a memory. Thus a word list is produced and sent to a paragraph composing part 4, and the part 4 supplies the word list every word at and after the head of the sentence. A paragraph list thus produced is sent to a modification analyzing part 5, and an analysis tree is produced via the dictionary 6 and an indicating part 7. This analysis tree is sent to single sentence extracting part 8 for production of a single sentence. Thus a complicated sentence is changed into a simple style via those parts 3, 4, 5, 7 and 8 as well as the dictionary 6. This can facilitate easy natural language processing and its debug for collection, analysis, etc. of data.

Description

【発明の詳細な説明】（１）発明の技術分野本発明はデータ処理装置による自然言語の処理に関する
もので、入力された文章から単文を生成する単文生成装
置に係るものである。DETAILED DESCRIPTION OF THE INVENTION (1) Technical Field of the Invention The present invention relates to natural language processing by a data processing device, and relates to a simple sentence generation device that generates a simple sentence from an input sentence.

（２）従来技術と問題点一般に自然言語の文章は非常に複雑な構造を持っている
ので、計算機による処理を行なう場合これを直接データ
として用いたり、または文章からデータを収集するのは
非常に困難である。しかし複雑な構造の文章であっても
、これを単文（主語と述語が一つずつで出来ている文）
に分解すれば構文的なあいまいさが少なくなるため構文
パターンの決定や動詞の格の決定が容易になるので計算
機による処理が行ない易くなる。しかし、単文に分解す
るためには入力する文章の構造が決定されていなければ
ならず、その構造を決定するためには構文パターン又は
動詞の格が分っていなければならないと言う条件がある
。しかし、元元、この構文パターン又は動詞の格を決定
するために単文をめようとしているのであるから、これ
らを自動的に行なうのは無理であると言う問題点があっ
た。(2) Prior art and problems Natural language sentences generally have a very complex structure, so it is extremely difficult to use them directly as data or collect data from the sentences when performing computer processing. Have difficulty. However, even if the sentence has a complex structure, it can be called a simple sentence (a sentence consisting of one subject and one predicate).
By decomposing it into , it becomes easier to determine the syntactic pattern and the case of the verb because there is less syntactic ambiguity, which makes it easier to process by computer. However, in order to break it down into simple sentences, the structure of the input sentence must be determined, and in order to determine that structure, the syntactic pattern or the case of the verb must be known. However, since we are trying to find a simple sentence to determine the origin, this syntactic pattern, or the case of the verb, there is a problem in that it is impossible to do this automatically.

そのため、従来、自然言語の計算機による処理（データ
収集）を行なう場合は、人手によって対象となる文章を
簡単な形式に直して入力するか又は簡単な形式のものを
選んで入力すると言う方式を採、ってぃたが、前者は非
常に多くの手間が必要であり、後者は重要な情報が漏れ
る恐れがあると言う欠点があった。Therefore, conventionally, when processing natural language using a computer (data collection), a method has been adopted in which the target sentence is manually input into a simple format, or a simple format is selected and input. However, the former method required a great deal of effort, and the latter method had the disadvantage that important information could be leaked.

（３）　発明の目的本発明は上記従来の欠点に鑑み複雑な要因の絡み合った
自然言語の文意を簡単な形式に分解して、データ収集や
解析等の自然言語処理やそのデバッグを容易に行なうこ
との出来る方式を提供することを目的としている。(3) Purpose of the Invention In view of the above-mentioned drawbacks of the conventional art, the present invention decomposes the sentence meaning of natural language involving complex factors into a simple format, thereby facilitating natural language processing such as data collection and analysis, and its debugging. The purpose is to provide a method that can be used.

（４）発明の構成そして、この目的は入力された文章データを単語に分割
して単語リストと成し、該単語リストラ文節単位に連結
して文節リスト’６生成する文節合成手段と該文節リス
ト中の文節相互の係り受け関係を解析する係り受け解析
手段と、外部に文節間の係り受け関係の判断をめその決
定を受け入れる指示受は入れ手段と、前記係り受け解析
手段および指示受は入れ手段の出力を受けて解析木を作
成し該解析木より単文を抽出する単文抽出手段を有し、
前記係り受け解析手段は文節リスト中の連続した３個の
文節の中間に位置する文節が後続する文節に係るもので
あるか否かを記憶装置上の文法テーブルを参照して判断
すると共に、前記３個の文節の関係が特定のものである
とき外部に当該係り受けの関係の判断をめることにより
達成される。(4) Structure of the Invention The purpose of this invention is to provide a phrase synthesis means that divides input text data into words to form a word list, and connects the word restructuring in phrase units to generate a phrase list '6. a dependency analysis means for analyzing the dependency relationship between the clauses in the text; a simple sentence extraction means for creating an analytic tree in response to the output of the means and extracting a simple sentence from the analytic tree;
The dependency analysis means determines whether or not a clause located in the middle of three consecutive clauses in the clause list relates to the following clause, with reference to the grammar table on the storage device, and When the relationship between three clauses is specific, this can be achieved by determining the dependency relationship externally.

（５）　発明の実施例第１図は本発明を実施する装置の１例のブロック図であ
って、１は単文生成装置、２は入力部、３は単語分割部
、４は文節合成部、５は係り受け解析部、６は記憶装置
、７は指示部、８は単文抽出部を表わしている。(5) Embodiment of the Invention FIG. 1 is a block diagram of an example of a device implementing the present invention, in which 1 is a simple sentence generation device, 2 is an input section, 3 is a word segmentation section, 4 is a phrase synthesis section, Reference numeral 5 represents a dependency analysis section, 6 a storage device, 7 an instruction section, and 8 a simple sentence extraction section.

第１図において入力部２を経て入力された、文章は単語
分割部３において記憶装置６中に登録されている辞書に
納められている単語と照合されて単語リストが作られ文
節合成部４に渡される。In FIG. 1, a sentence input via the input unit 2 is compared with words stored in a dictionary registered in the storage device 6 in the word division unit 3 to create a word list, and then sent to the phrase synthesis unit 4. passed on.

文節合成部４は単語リストを文頭から１語づつ入力し、
下記の様な操作によって文節合成を行なう。該処理フロ
ーを第２図に示す。The phrase synthesis unit 4 inputs the word list one word at a time from the beginning of the sentence,
Phrase synthesis is performed by the following operations. The processing flow is shown in FIG.

（イ）・・・・・・・・・名詞が連続しているものは、
これをつなぐ。(b)・・・・・・・・・Things with consecutive nouns are
Connect this.

（ロ）・・・・・・・・・助詞は直前の単語につなぐ。(b)・・・・・・Particles connect to the previous word.

（ハ）・・・・・・・・・助動詞は直前の単語につなぐ
。(c)・・・・・・Auxiliary verbs connect to the previous word.

に）・・・・・・・・・意味なし形式名詞（例えば６と
き″、“こと”など）は直前の単語につなぐ。ni) ......Formal nouns that have no meaning (for example, 6 oki'', ``koto'', etc.) are connected to the previous word.

に）・・・・・・・・・連体詞は右の単語につなぐ。)・・・・・・・・・Adnominals are connected to the word on the right.

（へ）・・・・・・・・・動詞語尾は直前の単語につな
ぐ。(to)・・・・・・・・・The verb ending connects to the previous word.

（ト）・・・・・・・・・カンマ、ピリオドおよびドツ
トは直前の単語につなぐ。(g)・・・・・・Commas, periods, and dots connect to the previous word.

げ）・・・・・・・・・括弧の内部はすべてつなぐ。ま
た括弧で囲まれた全体を直前の単語につける。)・・・・・・Connect everything inside the parentheses. Also, attach the whole thing enclosed in parentheses to the previous word.

ω）・・・・・・・・・単位（ビット、バイト、α、Ｈ
など）は直前の単語につなぐ。ω)・・・・・・・・・Unit (bit, byte, α, H
) connects to the previous word.

休）・・・・・・・・・連結を行う際に、文節中の最後
の単語の文法属性を残す。Leave the grammatical attributes of the last word in the clause when concatenating.

例えば、１学校で１で（連用修飾）１の（名詞修飾）１勉強１という例では「の（名詞修飾）」の文法属性金銭１１学校での（名詞修飾）１とする。For example, 1 in 1 school (continuous modification) 1 In the example of (noun modification) 1 study 1, Grammatical attribute of “no (noun modification)” money 1 1 School (noun modification) 1.

文節の属性には名詞、動詞、名詞修飾、動詞修飾、連用
修飾の動詞、および連体修飾の動詞などがある。これら
の内、名詞修飾と動詞修飾は“と″のように共起するこ
とかあり、また、文章の末尾の動詞には１文末”と言う
属性を入れる。これらの各文節の属性は助詞や語尾によ
って、例えば、′は”、−ゾ、１に“等の助詞は動詞修
飾、“の”、′における″等は名詞修飾、６待ち“のよ
うに連用形動詞は連用修飾と云う様に定める。The attributes of a clause include a noun, a verb, a noun modification, a verb modification, an adjunctive modification verb, and an adnominal modification verb. Among these, noun modifications and verb modifications may co-occur, such as "and", and the verb at the end of a sentence has the attribute "1 sentence end".The attributes of each of these clauses include particles and Depending on the ending of the word, for example, particles such as ``, -zo, 1'' modify the verb, ``no'', ``in'', etc. modify the noun, and conjunctive verbs such as ``6'' modify the verb. .

以上の方法によシ文節合成部４で作成された文節リスト
が係り受け解析部５に送られると、係り受け解析部５は
「三つの解析窓」を用いて、文意の後尾から解析を行な
う。「三つの解析窓」とは連続した三つの枠組を想定し
て、それぞれの枠組の中に１個の文節が入るように文章
を重ねるものを言い、解析の説明を容易にするために使
用するもので、それぞれの解析窓（枠組）を左からり、
ＭＸＲと名付ける。そして各窓内の文節の属性の組み合
わせによって解析全行ない解析木を作成する。「三つの
解析窓」を用いての解析手順は第１表及び第３図に示す
とおりである。第１表の解析手順の組合せは記憶装置６
中に文法テーブルとして格納されている。解析中に利用
者の指示をめる必要のある場合のメツセージ出力や利用
者からの指示は指示部７を経由して行なわれる。When the phrase list created by the phrase synthesis section 4 using the above method is sent to the dependency analysis section 5, the dependency analysis section 5 uses the "three analysis windows" to analyze the meaning from the end of the sentence. Let's do it. ``Three analysis windows'' refers to three consecutive frameworks and overlapping sentences so that one clause fits in each framework, and is used to facilitate explanation of analysis. From the left, open each analysis window (framework) with a
Name it MXR. Then, an analysis tree is created by performing all the analysis lines by combining the attributes of the clauses in each window. The analysis procedure using the "three analysis windows" is as shown in Table 1 and Figure 3. The combination of analysis procedures in Table 1 is storage device 6.
It is stored as a grammar table inside. If a user's instructions are required during analysis, message output or instructions from the user are performed via the instruction section 7.

この様にして作成された解析木は単文抽出部８に送られ
て単文が生成される。この際の単文分割は次に示す手順
で行なわれる。The parse tree created in this way is sent to the simple sentence extraction section 8, where a simple sentence is generated. The simple sentence division at this time is performed in the following steps.

第　１　表 ■・・・・・・・・・名詞句（連体修飾も含む）を取シ
出し最後の語のみを残す。Table 1■・・・・・・Extract a noun phrase (including adnominal modifications) and leave only the last word.

■・・・・・・・・・連用修飾を分割する。■・・・・・・・・・Divide the conjunctive modification.

■・・・・・・・・・連用形を終止形に直し単文を出力
する。■・・・・・・Converts the continuous form to the final form and outputs a simple sentence.

以上の各部における単文生成の過程を具体的な事例につ
いて更に補足すれば以下のとおりである。The process of generating a simple sentence in each part above will be further supplemented with specific examples as follows.

例えば入力文が「図形情報はＸ−Ｙ座標に対して与えら
れ、この座標領域を画面と呼ぶ」であるとき、単語リス
トは「図形１情報１はＩＸ−Ｙｌ座標１に対して１与え
ｌられ１．１この１座標１領域１金１画面１と１呼１ぶ
１゜」の様になり、文節リストは［図形情報はＩＸ−Ｙ
座標に対して１与えられ、１この座標領域を１画面と１
呼ぶ。１」の様になる（１は単語または文節の区切りを
示している）。For example, when the input sentence is ``Graphic information is given for X-Y coordinates, and this coordinate area is called a screen,'' the word list is ``Graphic 1 information 1 is given by 1 for IX-Yl coordinates 1. 1.1 this 1 coordinate 1 area 1 gold 1 screen 1 and 1 call 1 1゜'', and the phrase list is [Graphic information is IX-Y
1 is given for the coordinates, 1 this coordinate area is 1 screen and 1
call. 1" (1 indicates a break between words or phrases).

第４図は、更に文節リストを「三つの解析窓」を使用し
て解析する経過を示す図で、９は入力文、１０は文節の
区切り、１１〜１３は「三つの解析窓」で１１がＬｌ　
１２がＭ１１３がＲを表わしている。そして、この様な
解析の結果として解析木が得られる。Figure 4 is a diagram showing the process of further analyzing the clause list using the "three analysis windows", where 9 is the input sentence, 10 is the clause break, and 11 to 13 are the "three analysis windows". is Ll
12 represents M113 represents R. As a result of such analysis, an analytic tree is obtained.

第５図は解析木と抽出された単文を示す図で（＆）〜（
ｅ）は解析木、（ｄ）は単文を表わしている。Figure 5 shows the parse tree and extracted simple sentences.
e) represents a parse tree, and (d) represents a simple sentence.

（６）発明の効果以上詳細に説明したように本発明の単文生成装置は簡潔
な構成で実現することが可能であり、また、機械的に判
断することが困難な個所は利用者に照会してその指示に
従う方式としているので、辞書情報等が少なくて済む利
点がある上、処理時間の損失が少ないから効果は大であ
る。(6) Effects of the Invention As explained in detail above, the simple sentence generation device of the present invention can be realized with a simple configuration, and it is possible to refer to the user for parts that are difficult to judge mechanically. Since the system follows the instructions given by the computer, it has the advantage of requiring less dictionary information and the like, and the loss of processing time is small, which is very effective.

[Brief explanation of the drawing]

第１図は本発明を実施する装置の１例のブロック図、第
２図は文節合成部の動作を示す流れ図、第３図は係り受
け解析部の動作を示す流れ図、第４図は文節リストを「
三つの解析窓」を使用して解析する経過を示す図、第５
図は解析木と抽出された単文金示す図である。１・・・・・・単文生成装置、２・・・・・・入力部、
３・・・・・・単語分割部、４・・・・・・文節合成部
、５・・・・・・係り受け解析部、６・・・・・・辞書
、７・・・・・・指示部、８・・・・・・単文抽出部、
９・・・・・・入力文、１０・・・・・・文節の区切り
、１１〜１３・・・・・・「三つの解析窓」第　１　図第　２　図第３図Fig. 1 is a block diagram of an example of a device implementing the present invention, Fig. 2 is a flowchart showing the operation of the phrase synthesis section, Fig. 3 is a flowchart showing the operation of the dependency analysis section, and Fig. 4 is a clause list. of"
Diagram 5 showing the process of analysis using “Three Analysis Windows”
The figure shows the parse tree and the extracted simple money. 1... Simple sentence generation device, 2... Input section,
3... Word division unit, 4... Clause synthesis unit, 5... Dependency analysis unit, 6... Dictionary, 7... instruction section, 8... simple sentence extraction section,
9... Input sentence, 10... Clause break, 11-13... "Three analysis windows" Figure 1 Figure 2 Figure 3

Claims

[Claims]

A phrase synthesizing means that divides the input text data into words to form a word list, and connects the word list in phrase units to generate phrase restructuring; and a section that analyzes the dependency relationship between phrases in the phrase list. a receiver parsing means, an instruction receiving means for externally determining the dependency relationship between clauses, and an inserting means; The parse tree has simple sentence extraction means for extracting simple sentences, and the dependency analysis means determines whether a phrase located in the middle of three consecutive phrases in the phrase list is related to a subsequent phrase. 1. A simple sentence generation device characterized by making a judgment by referring to a grammar table on a storage device and, when the relationship between the three clauses is a specific one, to externally judge the relationship between the three clauses.