JPH08171563A - Natural language processing system - Google Patents

Natural language processing system

Info

Publication number
JPH08171563A
JPH08171563A JP6313898A JP31389894A JPH08171563A JP H08171563 A JPH08171563 A JP H08171563A JP 6313898 A JP6313898 A JP 6313898A JP 31389894 A JP31389894 A JP 31389894A JP H08171563 A JPH08171563 A JP H08171563A
Authority
JP
Japan
Prior art keywords
syntax tree
processing
syntax
sentence
completed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP6313898A
Other languages
Japanese (ja)
Inventor
Takesuke Hiraoka
丈介 平岡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Meidensha Corp
Meidensha Electric Manufacturing Co Ltd
Original Assignee
Meidensha Corp
Meidensha Electric Manufacturing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Meidensha Corp, Meidensha Electric Manufacturing Co Ltd filed Critical Meidensha Corp
Priority to JP6313898A priority Critical patent/JPH08171563A/en
Publication of JPH08171563A publication Critical patent/JPH08171563A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

PURPOSE: To reduce and shorten the throughput and processing time through the efficient search processing of a syntax tree when a probable syntax tree is selected by using an evaluation point for the generation of the syntax tree. CONSTITUTION: The processing is advanced preferentially from a generated syntax tree which has evaluation points in a higher range through processing A (S11-S18) which is repeated up to the end of a sentence while only syntax trees given full evaluation points are regarded as objects of modification search processing and ended while a completed syntax tree is regarded as a completed syntax tree when there is the completed syntax tree by the advance to the end of the sentence and repetitive processing B (S19-S31) which is repeated up to the end of the sentence while only syntax trees having evaluation points above certain points are regarded as objects of search processing when there is no completed syntax tree in the processing A, ended while a syntax tree is regarded as a completed syntax tree when a 'sentence' is completed with the syntax tree with evaluation points, and performed for a next syntax tree having lower points unless the completed syntax tree is obtained; and syntax trees in the same evaluation point range are processed by a lateral search.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、自然言語処理システム
に係り、特に構文解析処理及び意味解析処理のための構
文木の選択アルゴリズムに関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a natural language processing system, and more particularly to a syntax tree selection algorithm for parsing and semantic analysis.

【0002】[0002]

【従来の技術】ワードプロセッサや機械翻訳、ドキュメ
ントデータベース、ハイパーテキストといったコンピュ
ータを使った自然言語処理が実用化されている。
2. Description of the Related Art Natural language processing using a computer such as a word processor, machine translation, document database, and hypertext has been put into practical use.

【0003】このための自然言語解析は、まず解析対象
となる文章を形態素単位(語構成の最小単位)に区切
り、それぞれの形態素がもつ性質を明らかにする形態素
解析を行う。この後、自然言語の統語規則から解析する
構文解析、続いて曖昧性や漠然性を取り除く意味解析、
文脈解析を行う。
In natural language analysis for this purpose, a sentence to be analyzed is first divided into morpheme units (minimum units of word structure), and morpheme analysis is performed to clarify the properties of each morpheme. After this, a syntactic analysis that analyzes from the syntactic rules of natural language, and then a semantic analysis that removes ambiguity and vagueness,
Perform contextual analysis.

【0004】構文解析には、形態素解析された文を文法
を用いて正しい文であるか否かを判定し、正しい文のと
きはその構文解析結果として木構造(解析木)を得る。
この木構造を作成する構文解析アルゴリズムには、文か
ら単語へ向かう探索を行うトップダウン型と、単語から
文へ向かうボトムアップ型がある。
In the syntactic analysis, it is determined whether or not the morphologically analyzed sentence is a correct sentence by using a grammar, and when the sentence is correct, a tree structure (a parse tree) is obtained as the syntactic analysis result.
The parsing algorithm that creates this tree structure includes a top-down type that searches from a sentence to a word and a bottom-up type that searches from a word to a sentence.

【0005】また、構文解析のある段階で次に作り出す
部分木構造に複数の可能性があるときにそれらの各構造
を同時並列的に作る横型(パラレル型)と、1つだけを
選んで探索する縦型(シリアル型)がある。
Further, when there is a plurality of possibilities in a subtree structure to be created next at a certain stage of parsing, a horizontal type (parallel type) for simultaneously making each of these structures in parallel and a search by selecting only one There is a vertical type (serial type).

【0006】一般に、構文解析処理では、文法的な適合
性のみに着目しているため、構文的な曖昧性が発生し、
多くの解析木が生成されてしまう。この中から、正しい
解析木を選択するために、意味解析処理を行う。
In general, in parsing processing, since attention is paid only to grammatical conformity, syntactic ambiguity occurs,
Many parse trees will be generated. A semantic analysis process is performed to select the correct parse tree from among these.

【0007】意味解析処理では、単語の文法カテゴリ
(品詞に相当)だけでなく、その意味的な情報を利用す
るものであり、意味的に不適当な係受けをチェックし、
誤った構文木を削除するという方法がとられている。
In the semantic analysis process, not only the grammatical category of a word (corresponding to a part of speech) but also its semantic information is used.
The method is to remove the incorrect syntax tree.

【0008】しかしながら、自然言語文を対象としたと
き、このような二者択一的な処理だけでは正しい構文木
を1つだけ選択することは困難であるため、評価点方式
が用いられている。
However, when a natural language sentence is targeted, it is difficult to select only one correct syntax tree by such an alternative process. Therefore, the evaluation point method is used. .

【0009】この方法は、構文木に評価点を与え、構文
的または意味的なチェックに掛かる構文木の評価点を減
点していき、最終的に残った構文木の中から評価点の高
いものを選択する方法である。
According to this method, an evaluation point is given to the syntax tree, and the evaluation point of the syntax tree that is subjected to the syntactic or semantic check is deducted. Is the method of selecting.

【0010】しかし、評価点の低い構文木も解析すると
探索空間が広くなり過ぎ、解析効率が落ちるため、構文
木の意味的な適合性の判定にしきい値を設け、このしき
い値よりも低くなった構文木を削除する方法が採られ
る。
However, if a syntax tree with a low evaluation point is also analyzed, the search space becomes too wide, and the analysis efficiency drops. Therefore, a threshold is set for determining the semantic compatibility of the syntax tree. The method of deleting the parse tree which became is adopted.

【0011】図3には、ボトムアップ型−横型探索によ
る構文解析アルゴリズムに評価点を加味した処理を示
し、単語の読み込み(S1)に次いで構文木の生成処理
(S2)を行う過程において、単語と単語の係受けが発
生した時点で評価点の減点処理を含む意味チェック処理
(S3)を実行し、これら処理を1つの文の全ての単語
を読み込むまで繰り返し(S4)、評価点も最も大きい
構文木を選択する(S5)。
FIG. 3 shows a process in which the evaluation points are added to the syntax analysis algorithm by the bottom-up-horizontal search, and in the process of reading a word (S1) and then generating a syntax tree (S2). When the word dependency occurs, the semantic check process (S3) including the evaluation point deduction process is executed, and these processes are repeated until all the words of one sentence are read (S4), and the evaluation score is also the largest. A syntax tree is selected (S5).

【0012】ここで、意味チェック処理(S3)は、当
該単語の係受けが意味的に適合性があるか否かを判定し
(S31)、この判定で適合性があると判断されればそ
のままにして構文木を成長させていき(S32)、意味
的に不適当な係受けであると判断された場合には意味チ
ェックの類別に応じて評価点を減点し(S33)、この
構文木の評価点がしきい値よりも低くなるとき(S
4)には当該構文木を削除し(S35)、構文木が多大
になるのを抑制する。しきい値は、形態素解析された結
果の単語数を調べて計算しておく(S6)。
Here, in the meaning check process (S3), it is judged whether the dependency of the word is semantically compatible (S3 1 ), and if it is judged as compatible by this judgment. The syntax tree is grown as it is (S3 2 ), and if it is judged that the dependency is semantically inappropriate, the evaluation point is deducted according to the type of semantic check (S3 3 ), and When the evaluation point of the syntax tree becomes lower than the threshold value (S
In 3 4 ), the syntax tree is deleted (S 3 5 ) to prevent the syntax tree from becoming large. The threshold value is calculated by checking the number of words resulting from the morphological analysis (S6).

【0013】[0013]

【発明が解決しようとする課題】一般に、構文・意味解
析処理の問題点として、長文に対する解析が困難であ
る。これは、単語数が多くなるといわゆる組み合わせ爆
発が起き、極端に効率が低下し、現実的な時間内に解析
を行うことができなくなるというものである。この問題
は、ボトムアップ方式・横型全数探索方式においても避
けられない。
Generally, it is difficult to analyze a long sentence as a problem of the syntax / semantic analysis processing. This is because when the number of words increases, so-called combinatorial explosion occurs, the efficiency drops extremely, and it becomes impossible to perform analysis within a realistic time. This problem is unavoidable even in the bottom-up method / horizontal exhaustive search method.

【0014】この方式による解析を簡単な例文「太郎が
走った。」に適用した場合を以下に説明する。図4は、
構文木の生成順序を英文字で示し、これら構文木は以下
の解析手順で下記表に示すように生成される。表中に手
順と対応する符号も示す。
A case where the analysis by this method is applied to a simple example sentence "Taro ran." Will be described below. FIG.
The generation order of syntax trees is indicated by English characters, and these syntax trees are generated as shown in the following table by the following analysis procedure. The codes corresponding to the procedures are also shown in the table.

【0015】[0015]

【表1】 [Table 1]

【0016】(1)文の先頭の「太郎」から見ていく。(1) Look at "Taro" at the beginning of the sentence.

【0017】(2)「太郎」だけで生成可能な構文木を
すべて生成する。この構文木は下記表中のaとbにな
る。
(2) Generate all syntax trees that can be generated only by "Taro". This syntax tree becomes a and b in the table below.

【0018】(3)次の単語「が」を見にいく。(3) Go to the next word "ga".

【0019】(4)単語「が」から生成可能な構文木を
すべて生成する。この構文木は表中のcとdになる。
(4) Generate all syntax trees that can be generated from the word "ga". This syntax tree becomes c and d in the table.

【0020】(5)単語「が」と単語「太郎」の構文木
のうち結合可能なものはすべて結合させる。この構文木
は表中のbとdを結合したeとなる。
(5) All of the syntactic trees of the word "ga" and the word "Taro" that can be combined are combined. This syntax tree becomes e which is the combination of b and d in the table.

【0021】(6)上記の(3)〜(5)を繰り返す。
この時に生成される構文木は「走っ」から表中のf〜i
が生成され、eとgからjが生成され、jからkが生成
される。
(6) The above (3) to (5) are repeated.
The syntax tree generated at this time is "run" from f to i in the table.
Is generated, j is generated from e and g, and k is generated from j.

【0022】(7)最後の単語「た」まで見てしまった
ときに構文解析を終了とする。この時に生成される構文
木は、「た」からlとmが生成され、他の単語との組み
合わせからn〜uが生成される。
(7) When the last word "ta" is seen, the syntax analysis is terminated. In the syntax tree generated at this time, l and m are generated from "ta", and n to u are generated from the combination with other words.

【0023】(8)生成された構文木の中から単語を全
部使って「文」まで生成できたものを選び出す。このと
きに選択された構文木はsになる。
(8) From the generated syntax tree, a word that can generate a "sentence" using all the words is selected. The syntax tree selected at this time becomes s.

【0024】以上の例のように、簡単な文に対する解析
にも多くの構文木が生成され、長い文になると構文木数
が膨大になり、解析を困難にする。
As in the above example, a large number of syntax trees are generated even for the analysis of a simple sentence, and the number of syntax trees becomes enormous in the case of a long sentence, which makes the analysis difficult.

【0025】意味解析処理の目的は、不適合な構文木を
削除していき、最終結果として正しい解析木を1つ選び
出すことである。この構文木の削除において、意味的な
チェックを強くすることは、残る構文木の数を減らすの
に有効であるが、場合によっては構文木の全てがしきい
値以下となり、解析結果が何も得られないことも起こり
得る。
The purpose of the semantic analysis processing is to delete incompatible syntax trees and select one correct parse tree as the final result. In the deletion of this syntax tree, strengthening the semantic check is effective in reducing the number of remaining syntax trees, but in some cases, all of the syntax trees fall below the threshold value, and the analysis result shows nothing. It may happen that you do not get it.

【0026】逆に、意味的なチェックを弛めにしたり、
しきい値が低過ぎると、高得点の構文木が最後まで多く
残ることになり、正しい構文木の選択が困難となった
り、組み合わせ爆発を抑えられず効率が低下する。この
ように、しきい値だけに頼る従来方法では、問題点が残
る。
On the contrary, loosen the semantic check,
If the threshold value is too low, many high-scoring parse trees remain until the end, making it difficult to select the correct parse tree or suppressing the combinatorial explosion and reducing the efficiency. As described above, the conventional method that relies only on the threshold value has a problem.

【0027】本発明の目的は、構文木の生成に評価点を
用いて確からしい構文木を選択する処理において、構文
木の効率良い探索処理により処理量及び処理時間を減縮
できる自然言語処理システムを提供することにある。
An object of the present invention is to provide a natural language processing system capable of reducing the processing amount and the processing time by an efficient search process of a syntax tree in the processing of selecting a likely syntax tree by using an evaluation point for generating the syntax tree. To provide.

【0028】[0028]

【課題を解決するための手段】本発明は、前記課題の解
決を図るため、ボトムアップ型・横型の探索機構により
入力文を単語毎に構文解析して構文木を生成し、この構
文木が生成される毎に単語の係り受けが意味的に適合性
を持つか否かに応じて当該構文木の評価点を減点する意
味チェック処理をし、前記評価点から各構文木の構文・
意味的な評価を行う自然言語処理システムにおいて、前
記意味チェック処理は、評価点が満点の構文木のみを係
り受け探索処理対象として文末まで処理を繰り返し、減
点となった構文木データは記憶領域に格納しておき、文
末までの処理で構文木が「文」まで完成した場合は当該
構文木を完成した構文木として処理を終了する処理A
と、前記処理Aで「文」が完成しなかった場合に前記減
点となった構文木データのうち評価点がある点数以上の
構文木のみを探索処理対象として文末まで処理を繰り返
し、該評価点を持った構文木で「文」が完成した場合は
当該構文木を完成した構文木として処理を終了し、完成
した構文木が得られないときは次ぎに低い点数以上の構
文木について処理を行う繰り返し処理Bとを備えたこと
を特徴とする。
In order to solve the above problems, the present invention parses an input sentence for each word by a bottom-up type / horizontal type search mechanism to generate a syntax tree. Each time it is generated, a semantic check process is performed to deduct the evaluation point of the syntax tree according to whether or not the dependency of the word is semantically compatible.
In the natural language processing system that performs a semantic evaluation, the semantic check processing is repeated only until the end of the sentence for the dependency search processing targeting only the syntax tree with a perfect evaluation score, and the deducted syntax tree data is stored in the storage area. Processing A which stores the syntax tree and ends the processing as a completed syntax tree when the syntax tree is completed up to the "sentence" by the processing up to the end of the sentence
When the "sentence" is not completed in the process A, the process is repeated until the end of the sentence with only the syntax tree having the evaluation point having a certain score or more out of the syntax tree data having the deduction point, and the evaluation point When a "sentence" is completed with a syntax tree that has, the processing ends as the completed syntax tree, and when the completed syntax tree cannot be obtained, the syntax tree with a lower score or higher is processed next. It is characterized in that it is provided with repetitive processing B.

【0029】[0029]

【作用】生成される構文木の評価点が高い範囲になるも
のから優先的に処理を進め、同じ評価点範囲の構文木に
は横型探索で処理を行う。
The processing is preferentially proceeded from the syntax tree having a higher evaluation point in the generated syntax tree, and the horizontal search is performed on the syntax trees having the same evaluation point range.

【0030】これにより、探索対象とする構文木が無く
なってしまうことなく、かつ構文木の無駄な探索を少な
くし、処理時間も短縮する。
As a result, the syntax tree to be searched does not run out, the wasteful search of the syntax tree is reduced, and the processing time is shortened.

【0031】[0031]

【実施例】図1は本発明の一実施例を示すフローチャー
トを示す。本実施例は、大きくにはS11〜S18まで
の処理Aと、S19〜S31までの処理Bの2段階に分
けられる。
1 is a flow chart showing an embodiment of the present invention. The present embodiment is roughly divided into two stages, a process A of S11 to S18 and a process B of S19 to S31.

【0032】処理Aでは、従来のアルゴリズムと同様に
解析が進められるが、評価点が満点のもののみを対象と
し、減点となった構文木データは記憶領域に格納してお
く。そして、処理が文末まで進み、構文木が「文」まで
完成した場合は、評価点100点の構文木が得られたこ
とになり、これを出力して終了する。
In the process A, the analysis is carried out in the same manner as the conventional algorithm, but only the evaluation points having a perfect score are targeted, and the deducted syntax tree data is stored in the storage area. When the processing proceeds to the end of the sentence and the syntax tree is completed up to the “sentence”, it means that a syntax tree with an evaluation score of 100 has been obtained, and this is output and the processing ends.

【0033】処理Aで「文」が完成しなかった場合に処
理Bへ進む。処理Bは、再び入力文の最初から解析を進
めていくが、処理Aで保存された構文木データを呼び出
しながら処理を進める。このとき、ある点数「着目点
数」以上の構文木のみを処理対象とする。文末まで進
み、着目点数の評価点を持った構文木で文が完成してい
れば出力して終了する。そうでない場合は、「着目点
数」を次の低い値に設定し直して入力文の最初から処理
Bを繰り返す。
When the "sentence" is not completed in the process A, the process proceeds to the process B. The process B proceeds from the beginning of the input sentence to the analysis again, but proceeds while calling the syntax tree data saved in the process A. At this time, only the syntax trees having a certain score "point of interest" or more are processed. Proceed to the end of the sentence, and if the sentence is completed with the syntax tree having the evaluation score of the point of interest, output and end. If not, the "point of interest" is set to the next lower value and the process B is repeated from the beginning of the input sentence.

【0034】したがって、評価点の高いほうから順に、
等しい評価点の構文木を対象として探索していく方式で
あり、これを「等高点探索」と呼ぶ。評価点が高いもの
から優先する部分は縦型の探索になっており、一方、評
価点が等しいものは同時に処理するため横型の探索にな
っていると言える。
Therefore, from the highest evaluation point,
This is a method in which a syntax tree with equal evaluation points is searched, and this is called "contour point search". It can be said that the part having higher evaluation points is prioritized in the vertical type search, while the parts having the same evaluation points are processed in the horizontal type because they are processed simultaneously.

【0035】以下、処理Aと処理Bについて詳細に説明
する。処理Aは以下の(1)〜(6)になり、処理Bは
(7)〜(11)になる。
The processes A and B will be described in detail below. The process A becomes the following (1) to (6), and the process B becomes (7) to (11).

【0036】(1)次候補点を未定として設定し、従来
の方式と同様にボトムアップ式によって解析を開始する
(S11とS12)。
(1) The next candidate point is set as undetermined, and the analysis is started by the bottom-up method as in the conventional method (S11 and S12).

【0037】(2)着目している単語Wが単独で成す構
文木を生成する(S13)。これらの構文木は係り受け
が何もないため評価点はすべて100点になっている。
(2) Generate a syntax tree that is formed by the word W of interest alone (S13). Since these syntax trees have no dependency, the evaluation points are all 100 points.

【0038】(3)着目単語の直前の単語までの解析で
生成された構文木と、処理S13で生成した構文木との
係り受けの探索処理を行い、評価点を計算する。評価点
が減点になった構文木はその評価点及びどこで減点にな
ったか(つまり、係り受けを行った単語Wの文内での位
置)とをペアにし、候補データとして一時保存する(S
14)。
(3) The evaluation point is calculated by performing a dependency search process between the syntax tree generated by the analysis up to the word immediately before the target word and the syntax tree generated in the process S13. The syntax tree whose evaluation points have been deducted is paired with that evaluation point and where it was deducted (that is, the position in the sentence of the word W that has undergone the modification) and is temporarily stored as candidate data (S
14).

【0039】(4)評価点が減点にならなかった構文木
(つまり100点)のみを次の単語W’との係り受け探
索処理に渡す(S15)。また、次候補点が未定の場
合、処理S14で保存された構文木データのうち、次点
の評価点の値を次候補点とする。次候補点が未定でない
場合は、比較して大きいほうを新しい次候補点として設
定し、次の処理へ渡す。
(4) Only the syntax tree (that is, 100 points) whose evaluation points have not been deducted is passed to the dependency search processing for the next word W '(S15). If the next candidate point is undecided, the value of the next evaluation point in the syntax tree data saved in step S14 is set as the next candidate point. If the next candidate point is not yet determined, the larger one is set as a new next candidate point by comparison and passed to the next process.

【0040】(5)文の最後尾まで処理が進むまで単語
W’を新しい着目単語として処理S12に戻る(S1
6)。
(5) Until the processing reaches the end of the sentence, the word W'is set as a new word of interest and the process returns to step S12 (S1).
6).

【0041】(6)文の最後尾までの処理で完成した構
文木があれば、それらを出力し、処理を終了する(S1
7とS18)。
(6) If there are syntax trees completed by the processing up to the end of the sentence, those are output and the processing is terminated (S1).
7 and S18).

【0042】(7)処理S17または処理S29から引
き渡された次候補点を「着目点数」として設定するとと
もに次候補点を再び未定として設定する(S19)。
(7) The next candidate point delivered from the process S17 or S29 is set as the "point of interest" and the next candidate point is set again as undecided (S19).

【0043】(8)次候補点になる構文木について文頭
から次の単語に進む探索処理により保存された生成構文
木があるか否かをチェックし(S20〜S22)、生成
構文木があれば着目単語Wが単独で成す構文木との係り
受け探索処理を行う。係り受け探索処理によって生成さ
れた構文木のうち、着目点数以上のものを残し、それ以
外は新しい候補データとして処理Aと同様に一時保存す
る(S23とS24)。引き渡された構文木データがな
い場合は新しい候補データなしとして保存する(S2
5)。次候補点の処理についても処理Aと同様に処理す
る。
(8) Regarding the syntax tree to be the next candidate point, it is checked whether or not there is a generated syntax tree saved by the search processing from the beginning of the sentence to the next word (S20 to S22). Dependency search processing is performed with the syntax tree formed by the focused word W alone. Of the syntax trees generated by the dependency search processing, those with the number of points of interest or more are left, and other than that are temporarily stored as new candidate data in the same manner as in processing A (S23 and S24). If there is no delivered syntax tree data, it is saved as no new candidate data (S2).
5). The processing of the next candidate point is performed in the same manner as the processing A.

【0044】(9)着目単語Wに対応した(前回保存し
ておいた古い)候補データについて、着目点数に達して
いる候補データを選び出し、着目単語の候補データをマ
ージしてこれらを生成構文木として次の着目単語W’の
処理に引き渡す(S26)。
(9) With respect to the candidate data corresponding to the word of interest W (old and saved last time), candidate data having the number of points of interest are selected, and the candidate data of the word of interest are merged to generate them. Is passed to the processing of the next focused word W '(S26).

【0045】(10)もし、文の最後尾でなければ着目
単語W’を新しい着目単語として処理S21へ戻り、文
末までの処理を終えたときには構文木が完成しているか
をチェックする(S27とS28)。
(10) If it is not at the end of the sentence, the noticed word W'is set as a new noticed word and the process returns to step S21. When the process up to the end of the sentence is completed, it is checked whether the syntax tree is completed (S27 and S27). S28).

【0046】(11)構文木が完成していればそれを出
力し(S29)、構文木が完成していないときは処理S
19に戻って新しい候補データを見る(S30)。も
し、候補データが全くない場合は構文解析失敗として終
了する(S31)。
(11) If the syntax tree is completed, it is output (S29). If the syntax tree is not completed, the process S is performed.
Return to 19 to see new candidate data (S30). If there is no candidate data, the process ends as a syntax analysis failure (S31).

【0047】したがって、本実施例では、図2に構文木
生成例を示すように、処理Aでは単語1つで構成される
構文木と係り受け探索によって構文木を生成するのに評
価点が満点の構文木についてのみ行う等高点探索にな
る。
Therefore, in this embodiment, as shown in the syntax tree generation example in FIG. 2, in the process A, the syntax tree consisting of one word and the syntax tree generated by the dependency search have a high evaluation score. This is a contour point search only for the syntax tree of.

【0048】この処理Aにより完成した構文木があれば
その時点で解析終了になり、横型探索を評価点が満点の
ものに制限することで生成される構文木の数を減らし、
探索処理数を大幅に減らすことができる。
If there is a syntax tree completed by this process A, the analysis ends at that point, and the number of syntax trees generated by limiting the horizontal search to those with a perfect score is reduced,
The number of search processes can be significantly reduced.

【0049】そして、処理Aで完成した構文木が得られ
ないときには減点された構文木のうち、ある着目点数以
上の構文木のみを処理対象として横型探索を行い、完成
した構文木があるときに解析終了とし、無いときは次点
の着目点数以上の構文木のみを処理対象とする繰り返し
で完成した構文木を見いだす。
Then, when the syntax tree completed by the process A cannot be obtained, the horizontal search is performed only on the syntax trees having a certain number of points of interest among the syntax trees deducted, and when there is a completed syntax tree. When the analysis is completed, and when there is no parse tree, find a parse tree that is completed by iterating only parse trees that are equal to or more than the next number of points of interest.

【0050】この処理Bにおいても評価点の範囲が高い
順の構文木を処理対象として横型探索を行うことによ
り、生成される構文木の数を減らし、探索処理数を大幅
に減らすことができる。
Also in this process B, the horizontal search is performed on the syntax trees in the order of the higher evaluation points, so that the number of generated syntax trees can be reduced and the number of search processes can be greatly reduced.

【0051】しかも、従来のしきい値設定方式では、し
きい値以上の構文木が無くなってしまうと処理できなく
なること、及びしきい値を低くすると無駄な探索が増え
るのに対して、本実施例では等価的にしきい値を最高点
から最小点まで段階的に下げていく方式になり、処理不
能や無駄な探索を起こすことなく最も確からしい構文木
を得ることができる。
Moreover, in the conventional threshold value setting method, the processing cannot be performed when the syntax tree above the threshold value is lost, and the useless search increases when the threshold value is lowered. In the example, the threshold is equivalently lowered stepwise from the highest point to the lowest point, and the most probable syntax tree can be obtained without causing processing failure and unnecessary search.

【0052】[0052]

【発明の効果】以上のとおり、本発明によれば、生成さ
れる構文木に評価点を設定しておき、意味チェック処理
で不適合として判定される毎に評価点を減点し、評価点
から係り受け探索処理対象の構文木を選択する処理にお
いて、評価点が満点の構文木のみを係り受け探索処理対
象として文末まで処理を繰り返すことで満点の評価点を
持つ構文木についての横型探索で完成した構文木を得る
処理Aを行い、この処理Aで「文」が完成しなかった場
合に評価点が高い範囲になる構文木から優先的に横型探
索で完成した構文木を得る処理Bを繰り返すようにした
ため、探索処理対象とする構文木が無くなってしまうこ
となく、かつ構文木の無駄な探索を少なくし、処理時間
も短縮することができる。
As described above, according to the present invention, an evaluation point is set in the generated syntax tree, and the evaluation point is deducted every time it is determined as nonconformity in the semantic check processing, and the evaluation point is related. In the process of selecting the parse tree to be the target of the search process, only the parse tree with the highest evaluation score is involved and the process is repeated until the end of the sentence as the target of the search process. The process A for obtaining the syntax tree is performed, and when the "sentence" is not completed in this process A, the process B for obtaining the completed syntax tree by the horizontal search is preferentially repeated from the syntax tree having a high evaluation point. Therefore, the syntax tree to be searched is not lost, and the wasteful search of the syntax tree can be reduced and the processing time can be shortened.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例を示すフローチャート。FIG. 1 is a flowchart showing an embodiment of the present invention.

【図2】実施例における構文木生成例。FIG. 2 is an example of syntax tree generation in the embodiment.

【図3】従来の構文・意味解析アルゴリズムを示すフロ
ーチャート。
FIG. 3 is a flowchart showing a conventional syntax / semantic analysis algorithm.

【図4】従来の横型全数探索の構文木生成例。FIG. 4 shows an example of a conventional horizontal exhaustive search syntax tree generation.

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 ボトムアップ型・横型の探索機構により
入力文を単語毎に構文解析して構文木を生成し、この構
文木が生成される毎に単語の係り受けが意味的に適合性
を持つか否かに応じて当該構文木の評価点を減点する意
味チェック処理をし、前記評価点から各構文木の構文・
意味的な評価を行う自然言語処理システムにおいて、 前記意味チェック処理は、評価点が満点の構文木のみを
係り受け探索処理対象として文末まで処理を繰り返し、
減点となった構文木データは記憶領域に格納しておき、
文末までの処理で構文木が「文」まで完成した場合は当
該構文木を完成した構文木として処理を終了する処理A
と、 前記処理Aで「文」が完成しなかった場合に前記減点と
なった構文木データのうち評価点がある点数以上の構文
木のみを探索処理対象として文末まで処理を繰り返し、
該評価点を持った構文木で「文」が完成した場合は当該
構文木を完成した構文木として処理を終了し、完成した
構文木が得られないときは次ぎに低い点数以上の構文木
について処理を行う繰り返し処理Bとを備えたことを特
徴とする自然言語処理システム。
1. A bottom-up type / horizontal type search mechanism parses an input sentence for each word to generate a syntax tree, and each time the syntax tree is generated, the dependency of words is semantically compatible. Semantic check processing is performed to deduct the evaluation point of the syntax tree according to whether or not it has, and the syntax of each syntax tree from the evaluation point.
In a natural language processing system that performs a semantic evaluation, the semantic check processing is a syntax search tree with a full evaluation score, and the processing is repeated until the end of the sentence with a dependency search processing target.
Store the deducted syntax tree data in the storage area,
When the syntax tree is completed up to the "sentence" by the processing up to the end of the sentence, the processing A that terminates the processing as the completed syntax tree
And when the "sentence" is not completed in the process A, the process is repeated until the end of the sentence with only the syntax tree having the evaluation score of a certain score or more among the scored tree data that has been deducted,
When the "sentence" is completed with the syntax tree having the evaluation point, the processing is terminated as the completed syntax tree, and when the completed syntax tree is not obtained, the syntax tree with the next lower score or more A natural language processing system comprising: a repetitive process B for performing a process.
JP6313898A 1994-12-19 1994-12-19 Natural language processing system Pending JPH08171563A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP6313898A JPH08171563A (en) 1994-12-19 1994-12-19 Natural language processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP6313898A JPH08171563A (en) 1994-12-19 1994-12-19 Natural language processing system

Publications (1)

Publication Number Publication Date
JPH08171563A true JPH08171563A (en) 1996-07-02

Family

ID=18046851

Family Applications (1)

Application Number Title Priority Date Filing Date
JP6313898A Pending JPH08171563A (en) 1994-12-19 1994-12-19 Natural language processing system

Country Status (1)

Country Link
JP (1) JPH08171563A (en)

Similar Documents

Publication Publication Date Title
US5890103A (en) Method and apparatus for improved tokenization of natural language text
US8090571B2 (en) Method and system for building and contracting a linguistic dictionary
US5708829A (en) Text indexing system
JP4459443B2 (en) Word segmentation in Chinese text
US6108620A (en) Method and system for natural language parsing using chunking
US7236925B2 (en) Left-corner chart parsing
CN111444330A (en) Method, device and equipment for extracting short text keywords and storage medium
JPH06266780A (en) Character string retrieving method by semantic pattern recognition and device therefor
JP2004038976A (en) Example-based machine translation system
JPH08305730A (en) Automatic method for selection of key phrase from document of machine-readable format to processor
JPS63254559A (en) Spelling aid for compound word
CN109766556B (en) Corpus restoration method and device
JPH0567144A (en) Method and device for pre-edit supporting
EP0524694B1 (en) A method of inflecting words and a data processing unit for performing such method
JP5447368B2 (en) NEW CASE GENERATION DEVICE, NEW CASE GENERATION METHOD, AND NEW CASE GENERATION PROGRAM
CN110795617A (en) Error correction method and related device for search terms
CN1545665A (en) Predictive cascading algorithm for multi-parser architecture
US20040054677A1 (en) Method for processing text in a computer and a computer
JPH08171563A (en) Natural language processing system
JP4635585B2 (en) Question answering system, question answering method, and question answering program
CN111967257B (en) Word segmentation method and device, electronic equipment and storage medium
JP3058511B2 (en) Chinese sentence analysis method and Chinese sentence analysis device
JP2897942B2 (en) Japanese morphological analysis system and morphological analysis method
JP3508312B2 (en) Keyword extraction device
JPH0954781A (en) Document retrieving system