JPH0283664A - Reference part qualifying and analyzing system - Google Patents

Reference part qualifying and analyzing system

Info

Publication number
JPH0283664A
JPH0283664A JP63235471A JP23547188A JPH0283664A JP H0283664 A JPH0283664 A JP H0283664A JP 63235471 A JP63235471 A JP 63235471A JP 23547188 A JP23547188 A JP 23547188A JP H0283664 A JPH0283664 A JP H0283664A
Authority
JP
Japan
Prior art keywords
quotation
analysis
sentence
parts
syntactic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP63235471A
Other languages
Japanese (ja)
Inventor
Norikazu Ito
則和 伊藤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP63235471A priority Critical patent/JPH0283664A/en
Publication of JPH0283664A publication Critical patent/JPH0283664A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

PURPOSE:To separate exaggeration reference from another reference by checking the presence/absence of a reference part in an input text, counting the number of words in the reference part of a text in which the reference part exists, and qualifying the reference part including one word as the exaggeration reference. CONSTITUTION:A pre-processing using spell-checking 5 and pre-editing 6 is applied on the input text 4 obtained by either file input, the input of a keyboard 2, or that of an OCR 3. An output text 11 obtained by a translation part 7 is edited by using translation information by post-editing 8, and the input text 4 and the output text 11 are printed by using a printer 12. The presence/absence of the reference part in the input text is checked, and the number of words in the reference part of the text in which the reference part exists is counted, and the reference part is judged as the exaggeration reference when the number of words is one. In such a way, it is possible to separate the exaggeration reference not having been separated until now from another reference.

Description

【発明の詳細な説明】 伎椎分災 本発明は、引用部分の認定及び解析方式、より詳細には
、機械翻訳における形態素解析部及び構文解析部に関す
るものである。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a method for recognizing and analyzing quoted parts, and more particularly to a morphological analysis unit and a syntactic analysis unit in machine translation.

■技権 一般に構文解析の解析単位は一文である。解析を行う時
にその解析範囲は短いのが望ましい。つまり、より短い
範囲を解析するのならば、解析規則の適用回数及び組み
合わせ数が少なくて済むので、解析がより容易であり、
解析過程で生まれるあいまい性をより少なく押さえるこ
とができる。
■Technical rights In general, the unit of analysis for syntactic analysis is one sentence. When performing analysis, it is desirable that the analysis range be short. In other words, if a shorter range is to be analyzed, the number of applications and combinations of analysis rules can be reduced, making the analysis easier.
Ambiguity generated during the analysis process can be minimized.

しかし、実際のテキストは短い文ばかりから成り立って
いるわけではなく、その逆に、多くは長い文からなって
いる。そこでブロックという1文の中での部分解析範囲
を考え、1文の中で解析範囲を部分を指定して区切ると
、解析範囲が限定されて、解析規則の適用回数及び組み
合わせ数が著しく減少するので、むだな規則適用や組み
合わせも減少し、解析効率が向上して速度が上がり、あ
いまい性も解消されて解析精度も向上する。この部分解
析範囲指定が自動的にできれば構文解析の精度と速度は
間違いなく上がる。
However, actual texts do not only consist of short sentences; on the contrary, many of them consist of long sentences. Therefore, if we consider a partial analysis range within a single sentence called a block and divide the analysis range by specifying parts within a single sentence, the analysis range will be limited and the number of application times and combinations of analysis rules will be significantly reduced. This reduces unnecessary rule applications and combinations, improves analysis efficiency and speed, eliminates ambiguities, and improves analysis accuracy. If this partial analysis range can be specified automatically, the precision and speed of syntax analysis will definitely increase.

月−m−的 本発明は、上述のごとき実情に鑑みてなされたもので、
今まで特に区別しなかった強調引用とその他の引用を区
別できるようにすること、今まで特に区別しなかった任
、意力用とタイトル引用を区別できるようにすること、
強調引用の認定によって効果のない範囲指定をしないで
解析を行えるようにすること、及び、任意引用のときに
加えてタイトル引用のときも、より正確で速く効率の良
い解析が行えるようにすることを目的としてなされたも
のである。
The present invention has been made in view of the above-mentioned circumstances.
To be able to distinguish between emphatic quotations and other quotations, which have not been particularly distinguished up until now, and to be able to distinguish between arbitrary and volitional quotations and title quotations, which have not been distinguished so far.
To enable analysis to be performed without specifying an ineffective range by recognition of emphasized citations, and to enable more accurate, fast and efficient analysis to be performed not only in the case of arbitrary citations but also in the case of title citations. It was made for the purpose of

構−一一瀉 本発明は、上記目的を達成するために。Structure-11 The present invention has been made to achieve the above objects.

(1)機械翻訳等の自然言語解析システムにおける形態
素解析部において、入力されたテキストの引用部の有無
を調入、引用部分の存在する文に対して、引用部内の語
数を数え、語数が1つであったら当該引用部分を強調引
用であると認定すること、或いは、(2)機械翻訳等の
自然言語解析システムにおける形態素解析部において、
入力されたテキストの引用部の有無を調べ、引用部分の
存在する文に対して、引用部分を閉じる引用符号の直前
及び直後の句読点(ピリオド、コンマ)の有無を調べ、
句読点のある引用部分を任意引用部、読点のない引用部
分をタイトル引用部として、分けて認識すること、或い
は、(3)機械翻訳等の自然言語解析システムにおける
、形態素解析部の次の処理過程にあたる構文解析部にお
いて、解析を行うための構文解析・文法規則を備え、前
記(1)にて認定を行った強調引用部分に対し、引用符
号が存在しないものと仮定して、当該強調引用部分が存
在する文の解析を行うこと、或いは、(4)機械翻訳等
の自然言語解析システムにおける、形態素解析部の次の
処理過程にあたる構文解析部において、解析を行うため
の構文解析文法規則を備え、前記(2)にて認定を行っ
た2種類の引用部分に対し、解析対象の単位である1文
の解析を行う前に先がけて、引用部分内を部分解析して
、任意引用部は解析結果をそのまま引用部分が文の中で
要求される構文的役割であるとして当該文の解析を行い
、タイトル引用部は解析結果が名詞句にならなくとも、
引用部分が文の中で要求される構文的役割を名詞句であ
るとして当該文の解析を行うことを特徴としたものであ
る。以下、本発明の実施例に基づいて説明する。
(1) The morphological analysis part of a natural language analysis system such as machine translation checks whether there is a quotation part in the input text, counts the number of words in the quotation part for sentences with quotation parts, and calculates the number of words by 1. (2) In the morphological analysis section of a natural language analysis system such as machine translation,
Checks the presence or absence of a quotation part in the input text, checks for the presence or absence of punctuation marks (periods, commas) immediately before and after the quotation mark that closes the quotation part for sentences with quotation parts,
(3) The next processing step of the morphological analysis unit in a natural language analysis system such as machine translation. The syntactic analysis unit, which is equipped with syntactic analysis and grammar rules for analysis, analyzes the highlighted quotation recognized in (1) above, assuming that there are no quotation marks. (4) In a natural language analysis system such as machine translation, the syntactic analysis unit, which is the next processing step after the morphological analysis unit, is equipped with syntactic analysis grammar rules for analysis. , for the two types of quotation parts certified in (2) above, before analyzing one sentence, which is the unit of analysis, the quotation parts are partially analyzed, and arbitrary quotation parts are analyzed. The sentence is analyzed based on the result, assuming that the cited part has a required syntactic role in the sentence, and the title quotation part is used to analyze the sentence even if the analysis result does not become a noun phrase.
This method is characterized in that the sentence is analyzed by assuming that the syntactic role of the quoted part in the sentence is a noun phrase. Hereinafter, the present invention will be explained based on examples.

而して、本発明は文中の引用部分をブロック(部分解析
範囲)として解析する方式に改良を加えたものである。
Thus, the present invention is an improvement to the method of analyzing cited portions in a sentence as blocks (partial analysis ranges).

同じ引用部分といっても実際には種別があり、本発明で
はそれらを強調引用部と任意引用部とタイトル引用部の
3つに分け、それぞれを形態素解析部で区別して構文解
析部に渡し、構文解析部でそれらの3つの引用部に対し
て異なった解析処理を行うことにより、構文解析の効率
及び精度の上昇を可能としたものである。
There are actually different types of quotation parts, and in the present invention, they are divided into three parts: an emphasized quotation part, an optional quotation part, and a title quotation part, and the morphological analysis part distinguishes each part and passes them to the syntactic analysis part. By performing different analysis processes on these three quotation parts in the syntax analysis section, it is possible to improve the efficiency and accuracy of syntax analysis.

第1図は、本発明による辞書引き方式を備えた翻訳装置
の一実施例を示す構成図で、図中、1はCRT、2はキ
ーボード、3は0CR54は入力文書、5はスペルチェ
ック部、6は前編集部、7は翻訳本体部、8は後編集部
、9は辞書、10は文法規則、11は出力文書、12は
プリンタで、ファイル入力、キーボード入力、OCR入
力のいずれかによって得た入力文はスペルチェック、前
編集を用いて前処理され、翻訳部によって得られた出力
文は後編集によって翻訳情報を利用して編集され、入力
文と出力文はプリンタを用いて印刷される。
FIG. 1 is a block diagram showing an embodiment of a translation device equipped with a dictionary lookup method according to the present invention. In the figure, 1 is a CRT, 2 is a keyboard, 3 is an 0CR54 is an input document, 5 is a spell check unit, 6 is a pre-editing section, 7 is a translation main section, 8 is a post-editing section, 9 is a dictionary, 10 is a grammar rule, 11 is an output document, 12 is a printer, and the information obtained by file input, keyboard input, or OCR input is The input sentences are preprocessed using spell checking and pre-editing, the output sentences obtained by the translation section are edited using the translation information in post-editing, and the input and output sentences are printed using a printer. .

第2図は、翻訳本体の流れを示す図で、この翻訳本体(
翻訳部)7は大きく分けて形態素解析、構文解析、変換
、生成の4つの処理からなり、まず、形態素解析部では
入力テキストの辞書引きを行なう。個々の語の情報を得
て構文解析部では文法規則に従ってパージングを行う、
解析結果から木構造を作成する。変換部では入力言語の
木構造から出力言語の木構造に変形する。生成部では得
られた木構造をノードごとに訳出する。
Figure 2 is a diagram showing the flow of the translation main body.
The translation unit) 7 is roughly divided into four processes: morphological analysis, syntactic analysis, conversion, and generation. First, the morphological analysis unit performs dictionary lookup of the input text. The parsing section obtains information about each word and performs parsing according to grammatical rules.
Create a tree structure from the analysis results. The conversion unit transforms the tree structure of the input language into the tree structure of the output language. The generation section translates the obtained tree structure node by node.

本発明は、上記形態素解析部および構文解析部に属する
もので、ここでは入力テキストは英文とする。入力され
たテキストを対象として、形態素解析部では第3図に示
す処理を行う。
The present invention belongs to the above-mentioned morphological analysis section and syntactic analysis section, and here, the input text is assumed to be English text. The morphological analysis section performs the processing shown in FIG. 3 on the input text.

第3図は、形態素解析における引用認定の流れを説明す
るための図で、ここでは、引用符号の有無を調べる。引
用符号が2つあったら、その間を引用部と認定する。次
に引用部内の語数を数える。
FIG. 3 is a diagram for explaining the flow of quotation recognition in morphological analysis, in which the presence or absence of quotation marks is checked. If there are two quotation marks, the part between them is recognized as the quotation part. Next, count the number of words in the quotation.

語数が1であったらその引用部を強調引用部とする。そ
れ以外の引用部分に対し、引用部分を閉しる引用符号の
直重直後の句読点(ピリオド、コンマ)の有無を調べ、
ピリオドもしくはコンマがあるとき、その引用部分を任
意引用部分と認定する。
If the number of words is 1, that quotation is set as the emphasized quotation. For other quoted parts, check whether there is a punctuation mark (period, comma) immediately after the quotation mark that closes the quoted part,
If there is a period or comma, the quoted part is recognized as an optional quoted part.

ピリオドもコンマもないとき、その引用部分をタイトル
引用部分と認定する。これらの認定を行った引用部分情
報は形態素解析部から構文解析部に渡される。
When there is no period or comma, the quoted part is recognized as the title quoted part. The quoted part information that has been certified is passed from the morphological analysis section to the syntactic analysis section.

第4図は、構文解析部の流れを示す図で、この構文解析
部では1文ごとに解析を行う。本実施例では文脈自由構
文解析文法規則を用いて対象文を文末からボトムアップ
で解析を進める。全ての可能性を尽して規則を適用しな
がら最終的に対象文が文などを示す1つの文法上のコー
ドにまとまると解析が通常終了する。普通は最終的に得
られる文法上のコードは文(SE:5entence)
である。文の中に部分解析範囲(ブロック)である引用
部分がある場合を考えると、引用部分の種類には3つあ
り、強調引用と任意引用とタイトル引用である。それぞ
れの7低味を説明すると、強調引用は、ある特定の語を
強調するために引用符号で囲ったものであり、引用符号
内部は1語であり、引用部内の語と引用部外の語が引用
符号によって意味的に区切られているわけではなく、い
わゆる、特に区切るべきまとまりを持つ解析範囲とは言
い難い。従って強調引用のときは、特別に1部分解析範
囲であるとはしないで解析処理を行う。任意引用は引用
部を閉じる引用符号の前後に句読点の存在する引用部で
、いわゆる典型的な引用部であり、引用部内と引用部外
にはっきりとした意味の切れ目がある。
FIG. 4 is a diagram showing the flow of the syntax analysis section, which analyzes each sentence. In this embodiment, the target sentence is analyzed from the bottom up from the end of the sentence using context-free parsing grammar rules. Parsing usually ends when all possibilities are exhausted and rules are applied until the target sentence is finally combined into a single grammatical code representing a sentence or the like. Usually, the final grammatical code is a sentence (SE: 5 sentences)
It is. If we consider a case where a sentence has a quotation that is a partial analysis range (block), there are three types of quotation: emphasized quotation, optional quotation, and title quotation. An emphatic quotation is a word enclosed in quotation marks to emphasize a specific word, and there is only one word inside the quotation mark, and the word inside the quotation and the word outside the quotation are combined. is not semantically delimited by quotation marks, and it is difficult to say that it is an analysis range that has a particular unit that should be delimited. Therefore, in the case of an emphasized quotation, the analysis process is performed without specifically treating it as a partial analysis range. An arbitrary quotation is a quotation with punctuation marks before and after the quotation mark that closes the quotation, and is a typical quotation, with a clear break in meaning between the inside and outside of the quotation.

この引用部を持つ文を解析するには、文末から解析を進
める。部分解析範囲(ブロック)である任意引用部まで
解析が進んだら、任意引用部内を解析する。引用部内の
解析が終了したら、解析の結果得られた文法的役割を示
す文法コードをそのまま引用部がこの文に対して果たす
文法的役割であるとして解析を統ける。つまり任意引用
部は、文の解析範囲を区切るという意味を持つ。また、
タイトル引用は引用部を閉じる引用符号の前後に句読点
(ピリオド、コンマ)を持たない引用部で、引用部内が
タイトル(名詞句)であると推定する。
To analyze a sentence with this quotation, proceed from the end of the sentence. When the analysis progresses to the arbitrary quotation part, which is a partial analysis range (block), the inside of the arbitrary quotation part is analyzed. When the analysis of the quotation part is completed, the analysis is conducted by assuming that the grammatical code indicating the grammatical role obtained as a result of the analysis is the grammatical role that the quotation part plays for this sentence. In other words, the optional quotation section has the meaning of delimiting the range of analysis of a sentence. Also,
A title quotation is a quotation without punctuation marks (periods, commas) before and after the quotation mark that closes the quotation, and it is assumed that the title (noun phrase) is inside the quotation.

任意引用と同じく解析を進める。相違点はタイトル引用
部の解析結果をそのまま使って文の解析を行うのではな
く、解析結果がどのようなものになろうと名詞句の役割
を与えて文の解析を行う。
Proceed with the analysis in the same way as with arbitrary citations. The difference is that the sentence is not analyzed using the analysis result of the title quotation part as is, but the sentence is analyzed by assigning the role of the noun phrase, no matter what the analysis result is.

以下、実例を示して説明する。This will be explained below using an example.

a、 All you ”need” is 1ove
a、All you "need" is 1ove
.

b、 He 5ays、”all you need 
is 1ove、”c、 ”All you need
 is l0Ve”ars the words of
the Beatles’ song。
b, He 5ays,”all you need
is 1ove,”c,”All you need
is l0Ve"ars the words of
The Beatles' song.

まず、これらのa、b、cの3つの文を形態素解析部で
引用認定処理する。引用符号の数を数える。偶数個(普
通は2つだがたまに4つあることもある)あれば引用部
内の語数を数える。1つであれば強調引用であると認定
する。aの文が強調引用の例である。それ以外のときは
、引用符部分を閉じる引用符号の前後の句読点(ピリオ
ド、コンマ)の有無をみる。bは、があるので任意引用
と、Cは、も、もないからタイトル引用と認定する。次
に、構文解析部に進むが、上記のa。
First, these three sentences a, b, and c are subjected to citation recognition processing by the morphological analysis unit. Count the number of quotation marks. If there is an even number (usually two, but sometimes four), count the number of words in the quotation. If there is only one, it is recognized as an emphasized quotation. Sentence a is an example of an emphatic quotation. In other cases, check for punctuation marks (periods, commas) before and after the quotation mark that closes the quotation mark. B is recognized as an arbitrary citation because it exists, and C is recognized as a title citation because there is neither. Next, proceed to the syntax parsing section, which is described in step a above.

b、cの英文を解析するのに1例えば以下の文法規則が
あるとする。
For example, suppose the following grammar rules are used to analyze the English sentences b and c.

A1品詞分類コード prn(主格代名詞)  n0u(名詞)  no2(
名詞所有格)  dat(定冠詞)vil(他動詞句を
目的語にとる)  vt2(他動詞that節を目的語
とする)vil(自動詞補語をとる)  pre(前置
詞)  cma(コンマ)  prd(ピリオド) B0文法コード NP(名詞句)  SN(主格名詞句)  CN(補語
名詞句)ON(目的語名詞句)  DP(前置詞句) 
 CN(コンマ)PD(ピリオド)  VC(述語) 
 QC(thati thatは省略可)  WC(目
的語の欠けた述語)  SG(文末記号を含まない文)
  SE(文末記号を含む文)C0文法規則 1、SE  → SG  PD 2、 5G   −>   SN  V(1゜3.0C
→  SG 4、VC−)   vil  CN 5、VC−+  vtl  CN 6、  VC+vt2  (CN)  QC7、WC−
+   vtl 8.5N   −+  NP 9、SN   −+   dat   NPl、0. 
SN−+prn  (SN  WC)11、CN   
4   Nl’ 12、CN   −+  det  NPl3、ON 
  4   NP 14、ON   −)   det   NPl5、N
P  → nou  (DP)16、NP   −* 
  no2   NPl7、DP  → pre  0
N IL  CN   −)   cma 19、円)  →  prd なお、括弧内の要素はあってもなくてもよい。
A1 Part of speech classification code prn (nominative pronoun) n0u (noun) no2 (
(noun possessive case) dat (definite article) vil (transitive phrase as object) vt2 (transitive that clause as object) vil (transitive complement) pre (preposition) cma (comma) prd (period) B0 grammar Code NP (noun phrase) SN (nominative noun phrase) CN (complement noun phrase) ON (object noun phrase) DP (prepositional phrase)
CN (comma) PD (period) VC (predicate)
QC (thati that can be omitted) WC (predicate without object) SG (sentence without sentence final symbol)
SE (sentence including sentence-final symbol) C0 grammar rule 1, SE → SG PD 2, 5G -> SN V (1°3.0C
→ SG 4, VC-) vil CN 5, VC-+ vtl CN 6, VC+vt2 (CN) QC7, WC-
+ vtl 8.5N −+ NP 9, SN −+ dat NPl, 0.
SN-+prn (SN WC)11, CN
4 Nl' 12, CN −+ det NPl3, ON
4 NP 14, ON -) det NPl5, N
P → nou (DP)16, NP −*
no2 NPl7, DP → pre 0
N IL CN -) cma 19, yen) → prd Note that the elements in parentheses may or may not be present.

また、行頭の数字は規則番号を示す。Also, the number at the beginning of the line indicates the rule number.

aの文の解析を文末から行う。簡単にするためそれぞれ
の語は正解につながる品詞分類だけを持つとする。aの
場合は引用部分がないときの解析と同じである。
Analyze sentence a starting from the end of the sentence. For simplicity, assume that each word only has a part-of-speech classification that leads to the correct answer. In case a, the analysis is the same as when there is no cited part.

a、  all(prn)  you(prn)  n
eed(vtl)  1s(via)10νe(nou
) 、(prd) 1、(1!]) PD −+ prd    (、)2
、(15) NP + nou    (Love)3
、(11) CN−+NP     (love)4、
 (4) VC−+ vil CN  (is 1ov
e)5、 (7) WC−* vtl     (ne
ed)6、(1,0) SN →prn    (yo
u)7、(10) SN−+prn SN WC(al
l you need)8、 (2) SG −* S
N VC(all you need is 1ove
)9、 (1) SE −+ SG PD (all 
you need is 1ove、)ただし、行頭は
通し番号、括弧内は規則番号である。
a, all(prn) you(prn) n
eed(vtl) 1s(via) 10νe(nou
) , (prd) 1, (1!]) PD −+ prd (,)2
, (15) NP + nou (Love)3
, (11) CN-+NP (love)4,
(4) VC-+ vil CN (is 1ov
e) 5, (7) WC-*vtl (ne
ed)6, (1,0) SN → prn (yo
u)7, (10) SN-+prn SN WC(al
you need)8, (2) SG -* S
N VC (all you need is 1ove)
)9, (1) SE −+ SG PD (all
(you need is 1ove,) However, the beginning of the line is the serial number, and the number in parentheses is the rule number.

bの文は”all you need is 1ove
”が任意引用である。この部分の解析はaの2から7ま
でと同一である。また引用符号自体は解析対象とはなら
ない。
Sentence b is “all you need is 1ove”
” is an arbitrary quotation. The analysis of this part is the same as 2 to 7 in a. Also, the quotation mark itself is not subject to analysis.

b、  he(prn)  5ays(vt2)  、
(cma)  all(prn)you(prn)  
need(vtl)  1s(vil)  1ove(
nou)、(prd) 1、  (19)  PD −+  prd   (、
)2、  (15)  NP−+  nou   (l
ove)8、  (2) SG−+SN VC(all
 you need is 1ove)9、  (3)
 QC−) SG   (all you need 
is 1ove)10、  (17) CN  −+ 
crma   (1)11、  (6) VC−) v
t2 CN QC(says、 all you ne
edis  1ove) 12、  (10) SN →prn   (he))
3.  (2) SG −+ SN VC(ha 5a
ys、 all you needis  Love) 14、  (1) SE −+ SG PD (he 
5ays、 all you needis  1ov
e、) all you need is Loveの解析結果
であるSG(ピリオドのない文)をそのまま利用して解
析を進めている。
b, he(prn) 5ays(vt2),
(cma) all(prn) you(prn)
need(vtl) 1s(vil) 1ove(
nou), (prd) 1, (19) PD −+ prd (,
)2, (15) NP-+ nou (l
ove)8, (2) SG-+SN VC(all
you need is 1ove)9, (3)
QC-) SG (all you need
is 1ove) 10, (17) CN −+
crma (1)11, (6) VC-) v
t2 CN QC(says, all you ne
edis 1ove) 12, (10) SN → prn (he))
3. (2) SG −+ SN VC (ha 5a
ys, all you need Love) 14, (1) SE −+ SG PD (he
5ays, all you need 1ov
e,) The analysis is proceeding by using the SG (sentence without period), which is the analysis result of all you need is Love, as it is.

Cの文は” all you need is 1ov
e”がタイトル引用である。この部分の解析はaの2か
ら7までの同一である。また引用符号自体は解析対象と
はならない。
The sentence in C is “all you need is 1ov”
e" is a title quotation. The analysis of this part is the same as 2 to 7 in a. Also, the quotation mark itself is not subject to analysis.

b、  ali(prn)  you(prn)  n
eed(vtl)  1s(vil)love(nou
)  are(vil)  the(det)word
s(nou)  of(pre)  the(det)
  Beatles(no2)  song(nou)
  、(prd)1、 (19) PD −+ prd
   (、)2、 (15) NP −+ nou  
 (song)3、 (16) NP −+ no2 
NP (Beatles’ song)4、 (14)
 ON →det NP (the Beatles’
 song)5、 (17)叶−+ pre ON (
of the Beatles’ song)6、 (
15) NP −* nou DP (words o
f the Beatles’song) 7、 (12) CN −+ det NP (the
 words of theBeatles’ son
g) 8、  (4) VC→vil CN (ars th
e words of the13eat1.es’ 
song) 9、  (15)  NP  −+  nou    
(love)15、  (2) SG  4  SN 
VC(all you need is 1ove)1
6、  (0) NP→SG    SGにNP(名詞
句)の役割を与える 17、  (8) SN→NP   (all you
 need is 1ove)18、  (2) SG
 4 SN VC(all you need is 
1oveare  the  words  of  
theBeatlcs’ song) 19、  (1) SE −* SG PD  (al
l you need is 1oveare  th
e  tmords  of  theBeatles
’ song、) 15行目でSGの文法コードが得られた。このS(iを
使ってそのまま解析を続けても最終的にSHには到達し
ない。解析規則が存在しないからだ。そこで16行目で
、このタイトル引用に対して新しい役割として文法コー
ドNPをダ7える。これは文法規則ではなく15行目で
得られたSGに16行目NPというラベルをはったと思
えばよい。この処理により、Cの文は解析に成功する。
b, ali(prn) you(prn) n
eed(vtl) 1s(vil) love(nou
) are(vil) the(det) word
s(nou) of(pre) the(det)
Beatles(no2) song(nou)
, (prd)1, (19) PD −+ prd
(,)2, (15) NP −+ nou
(song)3, (16) NP −+ no2
NP (Beatles' song) 4, (14)
ON →det NP (the Beatles'
song) 5, (17) Kano-+ pre ON (
of the Beatles' song)6, (
15) NP −* nou DP (words o
f the Beatles' song) 7, (12) CN −+ det NP (the
words of the Beatles' son
g) 8, (4) VC→vil CN (ars th
e words of the13eat1. es'
song) 9, (15) NP −+ nou
(love)15, (2) SG 4 SN
VC(all you need is 1ove)1
6. (0) NP→SG Give SG the role of NP (noun phrase) 17. (8) SN→NP (all you
need is 1ove) 18, (2) SG
4 SN VC (all you need is
1overare the words of
the Beatlcs' song) 19, (1) SE -* SG PD (al
l you need is 1overare th
e tmods of the Beatles
' song, ) The SG grammar code was obtained on the 15th line. Even if you continue parsing using this S(i, you will not reach SH in the end because there is no parsing rule. Therefore, in line 16, the grammar code NP is added as a new role for this title quotation. This is not a grammar rule, but it can be thought of as labeling the SG obtained in line 15 as NP in line 16. Through this process, the sentence in C can be successfully analyzed.

1−果 以上の説明から明らかなように、請求項(1)により、
今まで特に区別しなかった強調引用とその他の引用を区
別できるようになった。
1-As is clear from the above explanation, according to claim (1),
You can now distinguish between highlighted quotations and other quotations, which you did not previously distinguish between.

請求項(2)により、今まで特に区別しなかった任膚:
引用とタイトル引用を区別できるようになった。
According to claim (2), any matter that has not been particularly distinguished so far:
You can now distinguish between quotations and title quotations.

請求項(3)により1強調引用の認定によって効果のな
い範囲指定をしないで解析を行えるようになった。
According to claim (3), it is now possible to perform analysis without specifying an ineffective range by recognizing one emphasized citation.

請求項(4)により、任意引用のときに加えてタイトル
引用のときも、より正確で速く効率の良い解析が行える
ようになった。
According to claim (4), more accurate, faster, and more efficient analysis can be performed not only in arbitrary citations but also in title citations.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は、本発明による辞書引き方式を備えた翻訳装置
の一実施例を示す構成図、第2図は、翻訳本体の流れを
示す図、第3図は、形態素解析部における引用認定の流
れを示す図、第4図は、構文解析の流れを示す図である
。 1・・・CRT、2・・・キーボード、3・・・OCR
,4・・・入力文跡、5・・・スペルチ、エラグ部、6
・・・前編集部。 7・・・翻訳本体部、8・・・後編集部、9・・・辞書
、10・・・文法規則、11・・・出力文書、12・・
プリンタ。
Fig. 1 is a block diagram showing an embodiment of a translation device equipped with a dictionary lookup method according to the present invention, Fig. 2 is a diagram showing the flow of the main body of translation, and Fig. 3 is a diagram showing citation recognition in the morphological analysis section. FIG. 4 is a diagram showing the flow of syntax analysis. 1...CRT, 2...Keyboard, 3...OCR
, 4... Input sentence trace, 5... Spelling, error part, 6
...Previous editorial department. 7... Translation body part, 8... Post-editing part, 9... Dictionary, 10... Grammar rules, 11... Output document, 12...
printer.

Claims (1)

【特許請求の範囲】 1、機械翻訳等の自然言語解析システムにおける形態素
解析部において、入力されたテキストの引用部の有無を
調べ、引用部分の存在する文に対して、引用部内の語数
を数え、語数が1つであったら当該引用部分を強調引用
であると認定することを特徴とする引用部分認定方式。 2、機械翻訳等の自然言語解析システムにおける形態素
解析部において、入力されたテキストの引用部の有無を
調べ、引用部分の存在する文に対して、引用部分を閉じ
る引用符号の直前及び直後の句読点の有無を調べ、句読
点のある引用部分を任意引用部、読点のない引用部分を
タイトル引用部として、分けて認識することを特徴とす
る引用部分認定方式。3、機械翻訳等の自然言語解析シ
ステムにおける、形態素解析部の次の処理過程にあたる
構文解析部において、解析を行うための構文解析文法規
則を備え、請求項(1)にて認定を行った強調引用部分
に対し、引用符号が存在しないものと仮定して、当該強
調引用部分が存在する文の解析を行うことを特徴とする
引用部分解析方式。 4、機械翻訳等の自然言語解析システムにおける、形態
素解析部の次の処理過程にあたる構文解析部において、
解析を行うための構文解析文法規則を備え、請求項(2
)にて認定を行った2種類の引用部分に対し、解析対象
の単位である1文の解析を行う前に先がけて、引用部分
内を部分解析して、任意引用部は解析結果をそのまま引
用部分が文の中で要求される構文的役割であるとして当
該文の解析を行い、タイトル引用部は解析結果が名詞句
にならなくとも、引用部分が文の中で要求される構文的
役割を名詞句であるとして当該文の解析を行うことを特
徴とする引用部分解析方式。
[Claims] 1. In the morphological analysis unit of a natural language analysis system such as machine translation, the presence or absence of a quotation part in the input text is checked, and the number of words in the quotation part is counted for sentences in which the quotation part exists. , a quotation part recognition method characterized in that if the number of words is one, the quotation part is recognized as an emphasized quotation. 2. The morphological analysis unit of a natural language analysis system such as machine translation checks the presence or absence of quotations in the input text, and for sentences with quotations, punctuation marks immediately before and after the quotation mark that closes the quotation are detected. This quotation recognition method is characterized by checking the presence or absence of punctuation marks and recognizing the quotation parts with punctuation marks as optional quotation parts and the quotation parts without punctuation marks as title quotation parts. 3. In a natural language analysis system such as machine translation, the syntactic analysis unit, which is the next processing step after the morphological analysis unit, is equipped with syntactic analysis grammar rules for analysis, and the emphasis certified in claim (1) A quotation part analysis method characterized in that a sentence in which a highlighted quotation part exists is analyzed on the assumption that there is no quotation mark in the quotation part. 4. In a natural language analysis system such as machine translation, the syntactic analysis unit, which is the next processing step after the morphological analysis unit,
Claim (2) is provided with parsing grammar rules for analysis.
) For the two types of quoted parts that have been certified in The sentence is analyzed assuming that the part plays a required syntactic role in the sentence, and the title quotation section analyzes the sentence as if the quoted part has a required syntactic role in the sentence, even if the analysis result does not become a noun phrase. A quotation partial analysis method characterized by analyzing the sentence as a noun phrase.
JP63235471A 1988-09-20 1988-09-20 Reference part qualifying and analyzing system Pending JPH0283664A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP63235471A JPH0283664A (en) 1988-09-20 1988-09-20 Reference part qualifying and analyzing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP63235471A JPH0283664A (en) 1988-09-20 1988-09-20 Reference part qualifying and analyzing system

Publications (1)

Publication Number Publication Date
JPH0283664A true JPH0283664A (en) 1990-03-23

Family

ID=16986573

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63235471A Pending JPH0283664A (en) 1988-09-20 1988-09-20 Reference part qualifying and analyzing system

Country Status (1)

Country Link
JP (1) JPH0283664A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999062000A2 (en) * 1998-05-26 1999-12-02 Teragram Corporation Spelling and grammar checking system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999062000A2 (en) * 1998-05-26 1999-12-02 Teragram Corporation Spelling and grammar checking system
WO1999062000A3 (en) * 1998-05-26 2001-06-07 Teragram Corp Spelling and grammar checking system
US6424983B1 (en) 1998-05-26 2002-07-23 Global Information Research And Technologies, Llc Spelling and grammar checking system

Similar Documents

Publication Publication Date Title
Sang et al. Introduction to the CoNLL-2000 shared task: Chunking
JP4024861B2 (en) Natural language parser with dictionary-based part-of-speech probabilities
US5161105A (en) Machine translation apparatus having a process function for proper nouns with acronyms
EP0403057B1 (en) Method of translating sentence including adverb phrase by using translating apparatus
Vasiu et al. Enhancing tokenization by embedding romanian language specific morphology
JPH0283664A (en) Reference part qualifying and analyzing system
Aduriz et al. Finite state applications for basque
Ab Aziz et al. Pola grammar technique for grammatical relation extraction in Malay language
JP2632806B2 (en) Language analyzer
Dione Finite-state tokenization for a deep Wolof LFG grammar
Loftsson Tagging Icelandic text using a linguistic and a statistical tagger
Gavhal et al. Sentence Compression Using Natural Language Processing
Eineborg et al. ILP in part-of-speech tagging—an overview
Muurisep et al. Shallow Parsing of Spoken Estonian Using Constraint Grammar
JPH0157826B2 (en)
JP2661934B2 (en) Japanese processing system
Rodrigues et al. Arabic data science toolkit: An api for arabic language feature extraction
KR20010057781A (en) Apparatus for analysing multi-word morpheme and method using the same
Ying et al. A hybrid approach to Chinese-English machine translation
JP2719453B2 (en) Machine translation equipment
KR100413966B1 (en) Natural Language Analyzing Apparatus and Method for Controlled Korean Grammar
Mirdjanovna et al. Algorithm of Word Order and Syntactic Analysis in Uzbek Language Sentences
JP4103311B2 (en) Natural language processing apparatus and method
Buschbeck et al. VIRTEX-a German-Russian Translation Experiment
JP4036172B2 (en) Natural language processing system, natural language processing method, and computer program