JPH05120274A - Natural language processing method - Google Patents

Natural language processing method

Info

Publication number
JPH05120274A
JPH05120274A JP3279736A JP27973691A JPH05120274A JP H05120274 A JPH05120274 A JP H05120274A JP 3279736 A JP3279736 A JP 3279736A JP 27973691 A JP27973691 A JP 27973691A JP H05120274 A JPH05120274 A JP H05120274A
Authority
JP
Japan
Prior art keywords
sentence
syntax
class
sentences
line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP3279736A
Other languages
Japanese (ja)
Inventor
Takeshi Mogi
健 茂木
Naoyuki Yoda
直之 余田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sanyo Electric Co Ltd
Original Assignee
Sanyo Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sanyo Electric Co Ltd filed Critical Sanyo Electric Co Ltd
Priority to JP3279736A priority Critical patent/JPH05120274A/en
Publication of JPH05120274A publication Critical patent/JPH05120274A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

PURPOSE:To take the best syntax and meaning analysis corresponding to the class of each sentence and make the precision of the analyzing process high by utilizing syntax features that each class has for the syntax and meaning analysis. CONSTITUTION:This method is equipped with a one-sentence segmenting process means, a sentence class determining process means, a morpheme analyzing process means, and a syntax and meaning analyzing process means, which performs the best analyzing process 11 for sentences belonging to respective classes by utilizing class classifications determined by the sentence class determining process means when the syntax and meaning analysis is taken. Namely, when a one-sentence segmenting process 10 is performed, format information on centering, margins, indents, etc., is recognized and sentences are classified into a title sentence, an index sentence, list sentences, normal sentences, etc., according to the format information; when the syntax and meaning analyzing process 11 is performed, the syntax features that the respective classes have are utilized, so the optimum syntax and meaning analysis corresponding to the classes is taken.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、自然言語文の処理方法
に関し、特に、文書の書式についての情報を用いて構文
・意味解析処理を行う自然言語処理方法に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a natural language sentence processing method, and more particularly to a natural language processing method for performing a syntax / semantic analysis process using information about a document format.

【0002】[0002]

【従来の技術】従来の自然言語処理方法は、文書の文字
情報を処理する処理方法であって、センタリング、マー
ジン、インデントなどの文書における文字コード列の出
現形態(書式)を考慮した処理方法ではなかった。とこ
ろが、一般の文書では、書式も著者が読者に伝達しよう
とする情報の一部である。例えば、センタリング、マー
ジン、インデントなどによって、文がタイトルや見出し
あるいはリスト文(項目の列挙や個条書きなど)である
ことが示される。これらの文は、書式だけでなく構文
的、内容的にも特別な役割を持っており、構文・意味解
析においては通常の文とは異なった処理を必要とする場
合が多い。
2. Description of the Related Art A conventional natural language processing method is a processing method for processing character information of a document, and is not a processing method considering the appearance form (format) of a character code string in a document such as centering, margin and indent. There wasn't. However, in general documents, the format is also part of the information that the author seeks to convey to the reader. For example, centering, margins, indentation, etc. indicate that the sentence is a title, a headline, or a list sentence (such as item listing or item writing). These sentences have special roles not only in terms of format but also in terms of syntactic and content, and in many cases syntactic and semantic analysis require processing different from ordinary sentences.

【0003】簡単な例を挙げて説明すると、図7の文書
において、一番最初の行”VariousAccess Conflicts at
I/OPorts”は、本来、見出しであるため、”Various
(形容 詞) Access(名詞) Conflicts(名詞) at(前置詞)
I/O Ports(名詞)”と、全体で名詞句を構成するように
解釈しなければならないが、書式を考慮せず文字列のみ
を対象に解析した場合には、”Various(形容詞) Access
(名詞)Conflicts(動詞) at(前置詞) I/O Ports(名詞)”
と間違って解釈されてしまう。
To explain with a simple example, in the document shown in FIG. 7, the first line "VariousAccess Conflicts at
“I / OPorts” is a headline, so “Various
(Adjective) Access (Noun) Conflicts (Noun) at (Preposition)
I / O Ports (noun) "should be interpreted as a whole noun phrase, but if you analyze only a character string without considering the format," Various (Adjective) Access "
(Noun) Conflicts (Verb) at (Preposition) I / O Ports (Noun) ”
Will be misinterpreted.

【0004】このように、書式は文書の解析を行う上で
無視できない情報であるにもかかわらず、従来の自然言
語処理方法は、書式情報の処理を十分に行ったものでは
なかった。このため、見出しやリスト文であっても通常
の文と全く区別されずに処理され、誤った解析結果を得
てしまうことが多かった。
As described above, although the format is the information that cannot be ignored in analyzing the document, the conventional natural language processing method has not sufficiently processed the format information. Therefore, even a headline or a list sentence is processed without being distinguished from a normal sentence at all, and an incorrect analysis result is often obtained.

【0005】[0005]

【発明が解決しようとする課題】本発明は上述の事情に
鑑みてなされたもので、文書からの一文切りだし処理の
際に、センタリング、マージン、インデントなどの書式
特徴を認識し、それに基づいて、文をタイトル文、見出
し文、リスト文、通常の文等のクラスに分類すること、
又、構文・意味解析時に、各クラスのもつ構文的特徴を
活用することによって、各文の属するクラスに応じた最
適な構文・意味解析を行ない、解析処理の高精度化を図
ることを目的とするものである。
SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned circumstances, and recognizes format features such as centering, margin, and indent at the time of processing to cut out one sentence from a document, and based on the recognition. , Classify sentences into title sentences, headline sentences, list sentences, ordinary sentences, etc.,
Also, at the time of syntax / semantic analysis, by utilizing the syntactic features of each class, it is possible to perform the optimum syntax / semantic analysis according to the class to which each sentence belongs, and to improve the accuracy of analysis processing. To do.

【0006】[0006]

【課題を解決するための手段】本発明による自然言語処
理方法は、入力された自然言語の文書から文を抽出する
一文切り出し処理手段と、上記文をその書式・構文的特
徴に基づいて複数のクラスに分類する文クラス決定処理
手段と、上記文を単語単位に分割し、品詞の付与を行う
形態素解析処理手段と、上記文を構文的ならびに意味的
に解析する構文・意味解析処理手段とを備えた自然言語
処理方法において、上記構文・意味解析処理手段が、構
文・意味解析時に、上記文クラス決定処理手段によって
決定されたクラス分類を利用して、各クラスに属する文
に対して最適な解析処理を行なうことを特徴とする自然
言語処理方法である。
A natural language processing method according to the present invention comprises a single sentence cutout processing means for extracting a sentence from an input natural language document, and a plurality of the sentence sentences based on their format and syntactical characteristics. A sentence class determination processing unit that classifies the sentence into a class, a morpheme analysis processing unit that divides the sentence into word units and assigns a part of speech, and a syntactic / semantic analysis processing unit that syntactically and semantically analyzes the sentence. In the provided natural language processing method, the syntactic / semantic analysis processing means uses the class classification determined by the sentence class determination processing means at the time of syntactic / semantic analysis, and is optimal for a sentence belonging to each class. It is a natural language processing method characterized by performing analysis processing.

【0007】[0007]

【作用】本発明の自然言語処理方法によれば、一文切り
だし処理の際に、センタリング、マージン、インデント
などの書式情報を認識し、この書式情報に基づいて、文
をタイトル文、見出し文、リスト文、通常の文等のクラ
スに分類し、構文・意味解析時に、各クラスの持つ構文
的特徴を活用するため、クラスに応じた最適な構文・意
味解析が行なわれる。
According to the natural language processing method of the present invention, format information such as centering, margin, and indentation is recognized at the time of processing to extract one sentence, and based on this format information, a sentence is classified into a title sentence, a headline sentence, Classes such as list sentences and ordinary sentences are classified, and at the time of syntax / semantic analysis, optimum syntactic / semantic analysis according to the class is performed in order to utilize syntactic features of each class.

【0008】[0008]

【実施例】図1は、本発明による自然言語処理のフロー
チャートを示す。本発明による自然言語処理は、一文切
り出し処理(10)と解析処理(11)とから成り立
つ。
1 is a flow chart of natural language processing according to the present invention. The natural language processing according to the present invention is composed of one sentence cutout processing (10) and analysis processing (11).

【0009】ここで、一文切り出し処理(10)が切り
出す文とは、構文・意味解析処理(11)への入力とな
る単位を指す。
Here, the sentence cut out by the one-sentence cutout process (10) refers to a unit which is an input to the syntax / semantic analysis process (11).

【0010】さらに、一文切り出し処理(10)のフロ
ーチャートを図2に、解析処理(11)のフローチャー
トを図5に示す。
Further, FIG. 2 shows a flowchart of the one-sentence cutout process (10), and FIG. 5 shows a flowchart of the analysis process (11).

【0011】[実施例の動作についての説明]実施例に
ついての説明に入る前に、本発明による実施例の動作原
理について簡単に説明する。
[Description of Operation of Embodiment] Before describing the embodiment, the operation principle of the embodiment according to the present invention will be briefly described.

【0012】図2のフローチャートを参照しつつ、一文
切り出し処理(10)について説明すると、まず、行認
識処理(21)において、行の形態的な特徴の認識が行
われ、各行に対して図3に示されるような行属性が付与
される。ここで着目される形態的な特徴とは、センタリ
ング、マージン、インデント行間スペース、書体、文字
サイズ、番号・記号、コロンなどである。
The one-sentence cutout process (10) will be described with reference to the flow chart of FIG. 2. First, in the line recognition process (21), the morphological characteristics of the line are recognized, and each line is processed as shown in FIG. Row attributes as shown in are added. The morphological features of interest here are centering, margins, indented space between lines, typeface, character size, numbers / symbols, colons, and the like.

【0013】次に、文末認識処理(22)において、現
在処理中の行に一文の文末があるかどうかが調べられる
(23)。ここで文末が認識されると、一文を構成する
行の行属性がまとめ上げられ、この後、切り出し処理
(24)においてこの一文が切り出され、一文切り出し
処理が終了する。また、文末が認識されなかった場合に
は、文末が認識されるまで、後続する行の認識を行う。
Next, in the sentence end recognition process (22), it is checked whether the line currently being processed has a sentence end of one sentence (23). When the end of the sentence is recognized here, the line attributes of the lines that form one sentence are collected, and then this one sentence is cut out in the cutout process (24), and the one sentence cutout process ends. If the end of the sentence is not recognized, the subsequent line is recognized until the end of the sentence is recognized.

【0014】文クラス決定処理(25)では、文を構成
する行の行属性を参考に、図4に示すような規則に従っ
て文クラスの決定が行われる。
In the sentence class determination process (25), the sentence class is determined according to the rules shown in FIG. 4 with reference to the line attributes of the lines forming the sentence.

【0015】続いて、図5のフローチャートを参照しつ
つ、解析処理(11)について説明する。解析処理(1
1)は、上記一文切り出し処理(10)によって切り出
された文の解析を行うが、その際、文クラス決定処理
(25)により決定されるクラス分類に応じて異なった
解析処理が行われる。
Next, the analysis process (11) will be described with reference to the flowchart of FIG. Analysis process (1
In 1), the sentence cut out by the one sentence cutout process (10) is analyzed. At this time, different analysis processes are performed according to the class classification determined by the sentence class determination process (25).

【0016】最初に、形態素解析処理(50)におい
て、各単語の品詞の決定が行われる。この処理で単語に
対する品詞が確実に決定できない場合には、複数の候補
を確保し、構文・意味解析処理で最終的な決定を行う。
First, in the morphological analysis process (50), the part of speech of each word is determined. When the part of speech for a word cannot be definitely determined by this processing, a plurality of candidates are secured and a final determination is made by the syntax / semantic analysis processing.

【0017】次に、構文・意味解析処理(51)におい
て、名詞句や動詞句のまとめ上げや、係先の曖昧性の解
消などの一般的な構文・意味解析が行われる。
Next, in the syntactic / semantic analysis processing (51), general syntactic / semantic analysis such as grouping of noun phrases and verb phrases and resolution of disambiguation of a contact person is performed.

【0018】構文・意味解析処理(51)の流れの途中
には、各クラス分類に応じた解析処理(53,54,5
5)が存在し、処理の流れは、上記文クラス決定処理
(25)で決定された分類に従ってこれらに振り分けら
れ(52)、文章の出現形態に依存した高精度な解析処
理が行われる。
In the middle of the flow of the syntax / semantic analysis processing (51), analysis processing (53, 54, 5) corresponding to each class classification is performed.
5) exists, and the flow of processing is divided into these according to the classification determined in the sentence class determination processing (25) (52), and highly accurate analysis processing depending on the appearance form of the sentence is performed.

【0019】[実施例の処理動作]これより、図7に示
される文書を用いて、リスト文においてing形動詞の
解釈の曖昧さを解消する処理例について説明する。ここ
で、リスト文とは項目の列挙や箇条書きに用いられる文
の形態を指し、書式的には、1)インデントされてい
る、2)一般文に比べて一行当りの文字数が少ない、
3)直前の行がコロンで切れている、などの特徴を持
つ。
[Processing Operation of the Embodiment] A processing example for eliminating the ambiguity in the interpretation of the ing verb in the list sentence will be described below with reference to the document shown in FIG. Here, the list sentence refers to the form of a sentence used for enumerating items and itemizing, and in terms of format, 1) is indented, and 2) the number of characters per line is smaller than that of a general sentence.
3) It has a feature that the line immediately before is broken by a colon.

【0020】説明に入る前に”ing形動詞”の解釈の
曖昧さについて述べる。図7のS1、S2、S3、S4
は、いずれも『ing形動詞+名詞句』の形で表される
が、一般にこのような文では”ing形動詞”の解釈に
曖昧さが生じ、その用法の同定は難しい。これら『in
g形動詞+名詞句』の解釈としては、図6に示されるよ
うに3通りの解釈が考えられる。
Before entering the description, the ambiguity of the interpretation of the "ing verb" will be described. S1, S2, S3, S4 of FIG.
Is expressed in the form of "ing verb + noun phrase", but in general, in such a sentence, there is ambiguity in the interpretation of the "ing verb", and it is difficult to identify its usage. These "in
There are three possible interpretations of "g-type verb + noun phrase", as shown in FIG.

【0021】図7のS1やS3のように、ing形動詞
の直後が冠詞である場合には、”ing形動詞=動詞用
法”であることが多いと言えるが、S2やS4のよう
に、i ng形動詞に他動詞の可能性があり、しかもi
ng形動詞の後に名詞や形容詞が続いている場合には、
用法の同定は非常に難しい。
When the ing verb is immediately followed by an article like S1 or S3 in FIG. 7, it can be said that "ing verb = verb usage" is often used, but as in S2 and S4, There is a possibility that the i ng verb is a transitive verb, and i
If a noun or adjective follows the ng verb,
Usage identification is very difficult.

【0022】本実施例では、本発明を用いてこのような
曖昧さがどのようにして解消されるかの説明を行う。
尚、ここで、図7の文書中のX(70)で示される位置
までの自然言語処理は既に終わっているものとする。
In this embodiment, the present invention will be used to explain how such ambiguity is resolved.
Here, it is assumed that the natural language processing up to the position indicated by X (70) in the document of FIG. 7 has already been completed.

【0023】まず、一文切り出し処理(10)により、
入力された文書を構成する各行の認識が行われる。図8
は行認識処理(21)により各行が認識され、各行に対
する属性が与えられた結果を示す図である。図8におい
て行(80)は、前の行がコロンによって終了しているこ
とや、行頭のインデント、記号が後続する行と一致する
ことなどから、リスト行(LATR5)であると判断さ
れる。
First, the one-sentence cutting process (10)
Recognizing each line that constitutes the input document is performed. Figure 8
FIG. 9 is a diagram showing a result in which each line is recognized by a line recognition process (21) and an attribute is given to each line. In FIG. 8, the line (80) is determined to be a list line (LATR5) because the previous line is terminated by a colon, the indent at the beginning of the line, and the following line match the symbol.

【0024】次に、文末認識処理(22)において、行
(80)が行の半ばで終わっていることや、行(81)の行
頭のインデント、記号が後続する行と一致することなど
から、行(80)の行末が文末であると認識される。
Next, in the sentence end recognition process (22), the line
The end of the line (80) is recognized as the end of the sentence because the (80) ends in the middle of the line, the indent at the beginning of the line (81), and the symbol match with the following line.

【0025】文末認識処理(22)により認識された文
は、切り出し処理(24)によって文書から切り出され、
文を構成する行の行属性に基づき、文クラス決定処理
(25)による文クラスの決定が行われる。ここで、行属
性から文クラスを導き出すには、図4に示されるような
規則を用いる。
The sentence recognized by the sentence end recognition process (22) is cut out from the document by the cutout process (24).
Statement class determination processing based on the row attributes of the rows that make up the statement
The sentence class is determined by (25). Here, in order to derive the sentence class from the line attribute, a rule as shown in FIG. 4 is used.

【0026】行(80)の場合、1行のみで文(90)
を構成しており、行(80)のリスト行の属性(LAT
R5)は、リスト文の属性(SATR4)へと関係付け
られる。行(80)の場合と同様に、行(81)、行
(82)、行(83)についても、それぞれ文(9
1)、文(92)、文(93)へと対応し、リスト文の
属性(SATR4)が与えられる。図9は、図7の文書
から文を切り出した結果と、各文に対応する文クラスを
示すものである。
In the case of the line (80), the sentence (90) is written in only one line.
And the attributes of the list line of line (80) (LAT
R5) is associated with the list sentence attribute (SATR4). Similar to the case of the line (80), the line (81), the line (82), and the line (83) are respectively sentence (9
1), sentence (92), and sentence (93) are provided, and the attribute (SATR4) of the list sentence is given. FIG. 9 shows the result of cutting out sentences from the document of FIG. 7 and the sentence class corresponding to each sentence.

【0027】以上で一文切り出し処理(10)は終了
し、次に、解析処理(11)が行われる。
As described above, the one-sentence cutout process (10) is completed, and then the analysis process (11) is performed.

【0028】図10は、形態素解析処理(50)により
文(93)に対して品詞決定が行われた結果を示す図で
ある。ここで、単語”missing”に対して、ing動詞
(図6 の解釈1)、形容詞(図6の解釈2)、ing
名詞(図6の解釈3)の3通りの解釈が考えられるが、
この処理で単語に対する品詞が確実に決定できないた
め、構文・意味解析処理で最終的な決定が行われる。
FIG. 10 is a diagram showing a result of the part-of-speech determination performed on the sentence (93) by the morphological analysis processing (50). Here, for the word “missing”, the ing verb (Interpretation 1 in FIG. 6), the adjective (Interpretation 2 in FIG. 6), ing
There are three possible interpretations of the noun (Interpretation 3 in Figure 6),
Since the part of speech for the word cannot be reliably determined by this processing, the final determination is made by the syntax / semantic analysis processing.

【0029】構文・意味解析処理(51)では、句単位
への文要素のまとめ上げや係り先の決定などの一般的な
解析処理が行われるが、文(93)はリスト文クラスの
属性を持つために、リスト固有処理(55)が行われ
る。図11に、構文・意味解析処理におけるリスト文固
有処理規則の一例を示す。
In the syntax / semantic analysis process (51), general analysis processes such as grouping of sentence elements into phrase units and determination of the related party are performed. The sentence (93) uses the attributes of the list sentence class. In order to have the list, the list specific process (55) is performed. FIG. 11 shows an example of the list sentence specific processing rule in the syntax / semantic analysis processing.

【0030】構文的にみると、リストを構成する各文は
一種の等位性を持っており、それぞれの文は同じ表現形
式で書かれることが多い。例えば、第1文が名詞句で構
成されていれば、ほとんどの場合、第2文以降も名詞句
である。リスト文クラスの持つこのような特徴に基づい
て解析規則を作成することによって正しい解析を行うこ
とが可能となる。図11に示される処理規則は、このよ
うなリスト文の特徴を構文解析処理において規則化した
ものであり、リストを構成するリスト文の内、同じ『i
ng形動詞+名詞句』形式を持つ文の半数以上の文のi
ng形動詞がX用法であれば、残りの『ing形動詞+
名詞句』形式を持つ文のing形動詞の用法もX用法と
なることを表している。
Syntactically, each sentence forming the list has a kind of coordinate, and each sentence is often written in the same expression form. For example, if the first sentence is composed of noun phrases, in most cases, the second sentence and subsequent sentences are also noun phrases. Correct analysis can be performed by creating an analysis rule based on such characteristics of the list sentence class. The processing rule shown in FIG. 11 is obtained by regularizing the characteristics of such a list sentence in the syntactic analysis process.
i of more than half of the sentences with the form "ng verb + noun phrase"
If the ng verb is X usage, the remaining "ing verb +
It also shows that the usage of the ing verb of a sentence having a "noun phrase" form is also the X usage.

【0031】文(93)について述べると、一般に、文
(90)や文(92)のように、動詞の直後が冠詞であ
る場合には、ing形は動詞用法を取る(図6の解釈
1)。従って、上述の規則からして、文(90)、文
(92)が動詞用法(図6の解釈1)を取るのであれ
ば、文(91)、文(93)についても動詞用法(図6
の解釈1)を取ると考えることができる。
Speaking of sentence (93), in general, when a sentence immediately follows a verb, such as sentence (90) or sentence (92), the ing form takes a verb usage (Interpretation 1 in FIG. 6). ). Therefore, according to the above rules, if the sentence (90) and the sentence (92) take the verb usage (Interpretation 1 in FIG. 6), the sentence (91) and the sentence (93) also have the verb usage (FIG. 6).
Interpretation 1) can be taken.

【0032】[0032]

【発明の効果】本発明の自然言語処理方法によれば、一
文切りだし処理の際に、センタリング、マージン、イン
デントなどの書式情報を認識し、この書式情報に基づい
て、文をタイトル文、見出し文、リスト文、通常の文等
のクラスに分類し、構文・意味解析時に、各クラスの持
つ構文的特徴を活用するため、クラスに応じた最適な構
文・意味解析が行なわれる。因って、従来の方法に比べ
て精度の高い解析処理を行うことができる。
According to the natural language processing method of the present invention, format information such as centering, margin, and indentation is recognized at the time of processing to cut out one sentence, and based on this format information, a sentence is classified into a title sentence and a headline. Optimal syntax / semantic analysis according to the class is performed in order to classify sentences, list statements, ordinary statements, etc. into classes and utilize syntactic features of each class at the time of syntax / semantic analysis. Therefore, it is possible to perform analysis processing with higher accuracy than the conventional method.

【図面の簡単な説明】[Brief description of drawings]

【図1】処理全体の流れを表す図である。FIG. 1 is a diagram showing a flow of entire processing.

【図2】一文切りだし処理の流れを表す図である。FIG. 2 is a diagram showing a flow of a single sentence extraction process.

【図3】行属性を表す図である。FIG. 3 is a diagram showing row attributes.

【図4】行属性から文クラスを決定する規則を示す図で
ある。
FIG. 4 is a diagram showing a rule for determining a sentence class from a line attribute.

【図5】構文解析処理の流れを表す図である。FIG. 5 is a diagram showing a flow of a syntax analysis process.

【図6】『ing形動詞+名詞句』形式におけるing
形動詞の解釈の可能性を表す図である。
Figure 6: ing in the "ing verb + noun phrase" format
It is a figure showing the possibility of interpretation of a form verb.

【図7】入力文書例を表す図である。FIG. 7 is a diagram illustrating an example of an input document.

【図8】入力文書を構成する行と対応する行属性を表す
図である。
FIG. 8 is a diagram showing line attributes corresponding to lines forming an input document.

【図9】入力文書を構成する文と対応する文クラスを表
す図である。
FIG. 9 is a diagram showing sentence classes corresponding to sentences forming an input document.

【図10】ing形動詞を含むリスト文の形態素解析結
果を示す図である。
FIG. 10 is a diagram showing a morphological analysis result of a list sentence including an ing verb.

【図11】リスト文固有の解析処理を示す図である。FIG. 11 is a diagram showing an analysis process specific to a list sentence.

【符号の説明】[Explanation of symbols]

10 一文切り出し処理 11 解析処理 21 行認識処理 22 文末認識処理 24 切り出し処理 25 文クラス決定処理 50 形態素解析処理 51 構文・意味解析処理 55 リスト文固有処理 10 Single Sentence Extraction Process 11 Analysis Process 21 Line Recognition Process 22 Sentence End Recognition Process 24 Extraction Process 25 Sentence Class Determination Process 50 Morphological Analysis Process 51 Syntax / Semantic Analysis Process 55 List Sentence Specific Process

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】入力された自然言語の文書から文を抽出す
る一文切り出し処理手段と、上記文をその書式・構文的
特徴に基づいて複数のクラスに分類する文クラス決定処
理手段と、上記文を単語単位に分割し、品詞の付与を行
う形態素解析処理手段と、上記文を構文的ならびに意味
的に解析する構文・意味解析処理手段とを備えた自然言
語処理方法において、 上記構文・意味解析処理手段が、構文・意味解析時に、
上記文クラス決定処理手段によって決定されたクラス分
類を利用して、各クラスに属する文に対して最適な解析
処理を行なうことを特徴とする自然言語処理方法。
Claim: What is claimed is: 1. One sentence segmentation processing means for extracting a sentence from an input natural language document, sentence class determination processing means for classifying the sentence into a plurality of classes based on its format and syntactical characteristics, and the sentence. In a natural language processing method comprising: a morphological analysis processing unit that divides words into word units and adds a part of speech; and a syntactic and semantic analysis processing unit that syntactically and semantically analyzes the sentence. Processing means, at the time of syntax / semantic analysis,
A natural language processing method characterized by performing optimal analysis processing on a sentence belonging to each class using the class classification determined by the sentence class determination processing means.
JP3279736A 1991-10-25 1991-10-25 Natural language processing method Pending JPH05120274A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP3279736A JPH05120274A (en) 1991-10-25 1991-10-25 Natural language processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP3279736A JPH05120274A (en) 1991-10-25 1991-10-25 Natural language processing method

Publications (1)

Publication Number Publication Date
JPH05120274A true JPH05120274A (en) 1993-05-18

Family

ID=17615179

Family Applications (1)

Application Number Title Priority Date Filing Date
JP3279736A Pending JPH05120274A (en) 1991-10-25 1991-10-25 Natural language processing method

Country Status (1)

Country Link
JP (1) JPH05120274A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100740978B1 (en) * 2004-12-08 2007-07-19 한국전자통신연구원 System and method for processing natural language request

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100740978B1 (en) * 2004-12-08 2007-07-19 한국전자통신연구원 System and method for processing natural language request

Similar Documents

Publication Publication Date Title
Daud et al. Urdu language processing: a survey
Tabassum et al. A survey on text pre-processing & feature extraction techniques in natural language processing
Habash et al. Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
Kiss et al. Unsupervised multilingual sentence boundary detection
Evans et al. A framework for named entity recognition in the open domain.
US9875254B2 (en) Method for searching for, recognizing and locating a term in ink, and a corresponding device, program and language
CN109460552B (en) Method and equipment for automatically detecting Chinese language diseases based on rules and corpus
US11386269B2 (en) Fault-tolerant information extraction
CN110770735A (en) Transcoding of documents with embedded mathematical expressions
WO2008059111A2 (en) Natural language processing
Shanmugalingam et al. Word level language identification of code mixing text in social media using NLP
Paripremkul et al. Segmenting words in Thai language using Minimum text units and conditional random Field
Shanmugalingam et al. Language identification at word level in Sinhala-English code-mixed social media text
Khan et al. Urdu word segmentation using machine learning approaches
Camps et al. Stylometry for Noisy Medieval Data: Evaluating Paul Meyer's Hagiographic Hypothesis
Oo et al. An analysis of ambiguity detection techniques for software requirements specification (SRS)
Shahroz et al. RUTUT: roman Urdu to Urdu translator based on character substitution rules and unicode mapping
Boulaknadel et al. Amazighe Named Entity Recognition using a A rule based approach
CN113723085B (en) Pseudo-fuzzy detection method in privacy policy document
US20240169150A1 (en) Foreign language phrases learning system based on basic sentence pattern unit decomposition
Sonbhadra et al. Email classification via intention-based segmentation
Senanayake et al. Enhanced tokenizer for sinhala language
KS et al. Automatic error detection and correction in malayalam
JPS5892063A (en) Idiom processing system
JPH05120274A (en) Natural language processing method