JPH01144162A - System and device for morpheme analysis using key word - Google Patents

System and device for morpheme analysis using key word

Info

Publication number
JPH01144162A
JPH01144162A JP62303544A JP30354487A JPH01144162A JP H01144162 A JPH01144162 A JP H01144162A JP 62303544 A JP62303544 A JP 62303544A JP 30354487 A JP30354487 A JP 30354487A JP H01144162 A JPH01144162 A JP H01144162A
Authority
JP
Japan
Prior art keywords
natural language
language sentence
keyword
morphological analysis
morpheme analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP62303544A
Other languages
Japanese (ja)
Inventor
Takeshi Nishimura
健士 西村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP62303544A priority Critical patent/JPH01144162A/en
Publication of JPH01144162A publication Critical patent/JPH01144162A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

PURPOSE:To increase the processing speed with the title system and device by realizing the morpheme analysis with high accuracy for the sentences including character strings, etc., which contain the words proper to the field of a natural language sentence whose topic is limited into a specific field and the specific symbols or the numerical value having the meaning proper to such a field that is not easily recognized as a nit to be divided in the normal morpheme analysis. CONSTITUTION:With input of a natural language sentence whose topic is limited into a specific field, the coincidence is checked between one of those keywords stored preliminarily in a key word storing part 1 and a part of a character string of the input natural language sentence via a divided part 2 obtained by a keyboard. If a coincident part is detected, the punctuation marks recognizable in the morpheme analysis are put at the front and the back of said coincident part. While no mark is given when no coincident part is obtained. Then the natural language sentence divided by said punctuation marks and the undivided natural language sentence containing no punctuation mark are sent to a morpheme analyzing part 3 to undergo the morpheme analysis. Thus the morpheme analysis processing speed is increased.

Description

【発明の詳細な説明】 を形態素解析する方式及び装置に関するものである。[Detailed description of the invention] The present invention relates to a method and device for morphologically analyzing

〔従来技術〕[Prior art]

従来、自然d語文を形態素解析する技術としては1例え
ば、高橋延匡編「日本語情報処理J 1986年の田中
穂積著「構文解析と意味解析」に記載されている1字種
切りによる強制分割、最長一致の原則による辞書引きと
接続情報の獲得、接続表チエツクによる分かち書き、の
3つの処理のフェイズからなる形態素解析技術が知られ
ている。
Conventionally, techniques for morphologically analyzing natural D-language sentences include forced segmentation using one-character type cutting, which is described in Nobumasa Takahashi (ed.), "Japanese Information Processing J," written by Hozumi Tanaka (1986), "Syntactic Analysis and Semantic Analysis"; A morphological analysis technique is known that consists of three processing phases: dictionary lookup and connection information acquisition based on the longest match principle, and separation writing based on a connection table check.

〔発明が解決しようとする問題点〕[Problem that the invention seeks to solve]

従来の形態素解析技術では、形態素解析の対象とする自
然ah文の述べる話題を特定分野に限っため1人力され
た自然言語文中に特定分野に固有の語が出現した場合に
は分割の鞘度が低下するという問題があった。
With conventional morphological analysis technology, the topics covered by natural ah sentences targeted for morphological analysis are limited to specific fields, so if a word unique to a specific field appears in a natural language sentence written by one person, the degree of segmentation becomes difficult. There was a problem with the decline.

〔問題点を解決するための手段〕[Means for solving problems]

本発明のキーワードを用いた形態素解析方式は。 The morphological analysis method using the keywords of the present invention is as follows.

特定分野に話題を限定した自然言語文を人力し。Human-powered natural language sentences with topics limited to specific fields.

前もって前記特定分野のキーワードを登録しておいた辞
書を参照して、前記辞書に登録されたキーワードが出現
するとその前後に形態素解析にて認識可能な区切り記号
を挿入し、その後に形態素解析を行なうことを特徴とす
る。
A dictionary in which keywords of the specific field are registered in advance is referred to, and when a keyword registered in the dictionary appears, a delimiter that can be recognized by morphological analysis is inserted before and after the keyword, and then morphological analysis is performed. It is characterized by

〔作用〕[Effect]

ある物足の分野における自然d語文には分野固有の語が
現われる。例えは、物品売買情報を解析の対象分野とす
る際に。
Field-specific words appear in natural D-word sentences in a certain field. For example, when article sales information is targeted for analysis.

「40万で買ったパソコン新同品を23万円で売ります
。」 という文においてはr、lr同品」とは「新品同様の品
」の意の物品売買分野固有の飴である。従来の形態素解
析技術では「・・・パンコン」まではう筐く分割できて
も、「新同品」という語が′#書に含まれているとはと
ても期待できないのでその後の分割が保証できない。対
象分野を紋った自然ぎ語処理の応用システムにおいては
1通常1分野固有の語は文章の意味を取るのに]L要な
役割を持つので1分野固有の語を正確に拾い出すことが
必要である。
In the sentence ``I'm selling the same new computer I bought for 400,000 yen for 230,000 yen,'' r, lr same item'' means ``like new item,'' which is a candy unique to the field of goods sales. With conventional morphological analysis technology, even if it is possible to divide up to ``...Pancon'', it is highly unlikely that the word ``new same product'' is included in the document, so subsequent divisions cannot be guaranteed. . In an application system of natural language processing based on the target field, it is difficult to accurately pick out words unique to a field because words that are unique to a field usually play an important role in determining the meaning of a sentence. is necessary.

分野固有の飴は文章中で一意的な意味を持つことが多く
1前後の飴への依存を考慮せずに切シ出すことができる
。「新同品」もその一つであるが。
Field-specific candies often have unique meanings within a text, and can be extracted without considering dependence on the first or last candies. "New same product" is one of them.

解析方式にお−て優先的に扱うことができる。すなわち
、まず、自然言語文中からキーワードだけを検出し、検
出できたキーワードの前後に印として区切り記号を挿入
して、そこでは分割が決定されたものとする。区切シ記
号を便宜的に@で表わすことにすると、上の例は。
It can be treated preferentially in the analysis method. That is, first, only keywords are detected from the natural language sentence, and delimiters are inserted as marks before and after the detected keywords, and division is determined at that point. For convenience, the delimiter symbol is represented by @ in the above example.

[40万で買ったパソコン@新同品@を23万円で売シ
ます。
[I'm selling the new PC I bought for 400,000 yen for 230,000 yen.

となる。次に、@が区切り記号であることを認識しなが
ら、具体的には、@を前後の接続が自由な語として扱い
ながら従来の形態素解析技術を適用し、上の文を解析す
る。
becomes. Next, while recognizing that @ is a delimiter, and specifically treating @ as a word that can be freely connected before and after it, conventional morphological analysis techniques are applied to analyze the above sentence.

この方式では1分野固有の飴が誤って分割されることが
なくなる。特に、特殊記号や数値が混在した文字列など
1通常分割される単位としては認め難いものが分野固有
の意味を持つ場合に有効である。また、形態素解析の対
象とする文字列が実質的に短くなるので、処理が高速化
されるという効果もある。上の例だと。
This method prevents candy unique to one field from being erroneously divided. This is particularly effective when characters that are difficult to recognize as units that are normally divided, such as character strings containing special symbols and numerical values, have field-specific meanings. Furthermore, since the character string targeted for morphological analysis becomes substantially shorter, there is also the effect of speeding up the processing. In the example above.

「40万で買ったパンコン」及び「を23万円で売シま
す。」 の2つの短い断片の形態素解析に帰着される。
This results in morphological analysis of two short fragments: ``I bought a bread cone for 400,000 yen'' and ``I will sell it for 230,000 yen.''

〔実施例〕〔Example〕

第1図は本発明の方式の実施例を示すプロ、り図である
。キーワードによる分割部2によって。
FIG. 1 is a schematic diagram showing an embodiment of the system of the present invention. By keyword division part 2.

キーワード格納部1に予じめ格納されたキーワードのう
ち1人力された自然言語文の文字列の一部分と一致する
ものがないかが調べられ、一致する部分があれはその前
後に形態素解析において認識可能な区切り記号が挿入さ
れる。一致する部分が無ければなにも挿入されない。区
切シ記号によって分割された自然言語文もしくは区切シ
記号が挿入されず人力されたままの状態の自然言語文は
形態素解析部3に送られ、従来技術を用いて形態素解析
される。
It is checked whether any of the keywords stored in advance in the keyword storage unit 1 matches a part of the character string of the natural language sentence written manually, and if there is a matching part, it can be recognized by morphological analysis before and after it. A delimiter is inserted. If there is no match, nothing will be inserted. The natural language sentence divided by the delimiter symbols or the natural language sentence as it has been manually written without the delimiter symbols inserted is sent to the morphological analysis section 3, where it is morphologically analyzed using conventional techniques.

第2図は本発明の装置の実施例を示すプロ、り図である
。人力自然首胎文は人力自然言語文用メモIJ 11に
格納される。制御器15によって、キーワード検出器1
2に動作の指示が出され、キーワード検出器12fi特
定分野のキーワードが予じめ格納されているキーワード
辞書20と入力自然言語文用メモリ11を参照しながら
キーワードを検出し、検出された各キーワードの入力さ
れた自然言語文中における位IIItをキーワード位置
用メモリ14に書き込む。キーワード位置用メモリ14
には、キーワードが人力自然言語文中に検出される度に
、そのキーワードの人力自然ti文中における開始位置
と、終了位置の次の文字の位置の2つの値が書き込まれ
る。キーワード検出器12及びキーワード辞書20には
いろいろな実現方法があるが1例えば、[口径バイ)J
1987年8月号の伊、高木、牛島共著「5種類のパタ
ーン−マツチング手法をcd語の関数で実現する」で紹
介されている68Mアルゴリズムに基づいた装置を用い
ればよい。前に示した「40万で買りたパンコン新同品
を23万円で売シます。」という例だと、キーワード「
新同品」に対して「iFF」の位置と1を」の位置の2
つがキーワード位置用メモリ14に書き込まれる。キー
ワード検出器12が人力自然言語文用メモリ11の内容
をスキャンし終ると、制御器15はアドレスカウンタ1
3と比較器16とマルチプレク″9″17に動作を指示
する。
FIG. 2 is a schematic diagram showing an embodiment of the apparatus of the present invention. The human natural language sentence is stored in the human natural language sentence memo IJ11. The controller 15 controls the keyword detector 1
2, the keyword detector 12fi detects keywords while referring to the keyword dictionary 20 in which keywords of a specific field are stored in advance and the input natural language sentence memory 11, and each detected keyword The position IIIt in the input natural language sentence is written into the keyword position memory 14. Keyword position memory 14
Each time a keyword is detected in a human-generated natural language sentence, two values are written: the start position of the keyword in the human-generated natural language sentence, and the position of the character next to the end position. There are various implementation methods for the keyword detector 12 and the keyword dictionary 20.
It is sufficient to use a device based on the 68M algorithm introduced in the August 1987 issue of I, Takagi, and Ushijima, "Realizing 5 types of pattern-matching methods using CD word functions." In the example shown above, ``I'm selling the new Pancon product that I bought for 400,000 yen for 230,000 yen,'' the keyword ``
Position 2 of “iFF” and position 1 of “new same product”
is written into the keyword position memory 14. When the keyword detector 12 finishes scanning the contents of the human natural language sentence memory 11, the controller 15 starts the address counter 1.
3, comparator 16, and multiplexer "9" 17 to operate.

比較器16は、アドレスカウンタ13の値とキーワード
位置用メモリ14の値とを比較し、キーワード位置用メ
モリ14の負の中にアドレスカウンタ13の値と等しい
ものがあれば、マルチプレクサ17に働きかけて区切シ
記号用メモリ18に格納されている文字を形態素解析器
19にたいして出力させ、アドレスカウンタ13に対し
て1回動作を待機してアドレス値を更新しなめように指
示し、さらにキーワード位置用メモリ14中から該当す
る値を削除する。キーワード位置用メモリ14の値の中
にアドレスカウンタ13の値と等しいものが無かったら
、比較器16は、マルチプレクサ17に働きかけてアド
レスカウンタ13の持つアドレス値に該当する人力自然
言語文用メモリll中の文字を形態素解析器19にたい
して出力させ。
The comparator 16 compares the value of the address counter 13 and the value of the keyword position memory 14, and if there is a negative value in the keyword position memory 14 that is equal to the value of the address counter 13, it acts on the multiplexer 17. It outputs the characters stored in the delimiter symbol memory 18 to the morphological analyzer 19, instructs the address counter 13 to wait for one operation and update the address value, and then outputs the characters stored in the keyword position memory 19. Delete the corresponding value from 14. If there is no value in the keyword position memory 14 that is equal to the value in the address counter 13, the comparator 16 acts on the multiplexer 17 to find the value in the human natural language sentence memory 11 that corresponds to the address value held in the address counter 13. output the characters to the morphological analyzer 19.

アドレスカウンタ13に対してアドレス値に1を加える
ように指示する。アドレスカウンタ13の値が入力自然
ぎ語文用メモリ11に格納されている自然言語文の最後
の文字のアドレスを越えるまでこの比較器16の動作が
繰り返される。前の例だと、[40万で買ったパソコン
]までは人力自然言語文中メモIJ 11から1文字ず
つ形態素解析器19へ文字が送られるが、アドレスカウ
ンタ13が「新」を指すと区切り記号が送られキーワー
ド位置用メモリ14から「新」の位置の値が削除される
。続いて「新同品」が1文字ずつ送られ。
Instructs the address counter 13 to add 1 to the address value. The operation of the comparator 16 is repeated until the value of the address counter 13 exceeds the address of the last character of the natural language sentence stored in the input natural language sentence memory 11. In the previous example, up to [the computer I bought for 400,000 yen], characters are sent character by character from human natural language text memo IJ 11 to the morphological analyzer 19, but when the address counter 13 points to "new", the delimiter is sent to the morphological analyzer 19. is sent, and the value of the "new" position is deleted from the keyword position memory 14. Next, "new same product" was sent one character at a time.

「を」に至って区切り記号が送られてキーワード位置用
メモリ14から「を」の位置の値が削除される。「23
万円で売ります。」はそのまま送られる。
When "wo" is reached, a delimiter is sent and the value at the position of "wo" is deleted from the keyword position memory 14. “23
I'll sell it for 10,000 yen. ' will be sent as is.

形態素解析器19は従来技術のものが利用できる。A conventional morphological analyzer 19 can be used.

〔効果〕〔effect〕

本発明のキーワードを用いた形態素解析方式もしくは装
置を使うと、ある特定の分野に話題を限定した自然言語
の文章を形態素解析する際に、その分野固有の語や2通
常の形祠メ解析においては分割される単位としては認め
難い分野固有の意味を持つ特殊記号や数値が混在した文
字列などを含む文章を相変良く形態素解析することがで
きるという効果がある。また、形態素解析の対象とする
文字列が短くなることがあるので処理が高速化されると
いう効果がある。
When using the morphological analysis method or device using the keywords of the present invention, when morphologically analyzing a natural language sentence with a topic limited to a certain field, it is possible to analyze words specific to that field and 2. has the effect of being able to morphologically analyze sentences that contain character strings mixed with special symbols and numbers that have field-specific meanings that are difficult to recognize as units of division. Furthermore, since the character string to be subjected to morphological analysis may be shortened, there is an effect of speeding up the processing.

【図面の簡単な説明】[Brief explanation of the drawing]

図工ある。 l・・・・・・キーワード格納部、2・・・・・・キー
ワードによる分割部、3・・・・・・形態素解析部、1
1・・・・・・入力自然言語文用メモリ、12・・・・
・・キーワード検出器。 13・・・・・・アドレスカウンタ、14・・・・・・
キーワード位置用メモリ、15・・・・・・制御器、1
6・・・・・・比較器。 17・・・・・・マルチプレタブ、18・・・・・・区
切り記号用メモ!7.19・・・・・・形態素解析器、
20・・・・・・キーワード辞書、21・・・・・・キ
ーワードによる分割手段。 22・・・・・・形態素解析手段、23・・・・・・キ
ーワード格納手段である。 代理人 弁理士  内・)・′原゛晋 臼然、−語文X力 笈1 図 箭2回
There is art. l...Keyword storage unit, 2...Keyword division unit, 3...Morphological analysis unit, 1
1... Memory for input natural language sentences, 12...
・Keyword detector. 13...Address counter, 14...
Keyword position memory, 15...Controller, 1
6... Comparator. 17...Multiple tabs, 18...Memo for delimiters! 7.19 Morphological analyzer,
20...Keyword dictionary, 21...Dividing means by keyword. 22... Morphological analysis means, 23... Keyword storage means. Agent: Patent Attorney Nai・)・'Hara Shin Usuran, - Words and sentences

Claims (2)

【特許請求の範囲】[Claims] (1)特定分野に話題を限定した自然言語文を入力し、
前もって前記特定分野のキーワードを登録しておいた辞
書を参照して、前記辞書に登録されたキーワードが出現
すると前記自然言語文中のキーワードの前後に形態素解
析にて認識可能な予じめ設定した区切り記号を挿入し、
前記区切り記号が挿入された自然言語文に対して形態素
解析を行なうことを特徴とするキーワードを用いた形態
素解析方式。
(1) Input a natural language sentence with a topic limited to a specific field,
A dictionary in which keywords of the specific field are registered in advance is referred to, and when a keyword registered in the dictionary appears, a preset delimiter that can be recognized by morphological analysis is created before and after the keyword in the natural language sentence. Insert the symbol and
A morphological analysis method using keywords, characterized in that a morphological analysis is performed on a natural language sentence into which the delimiter is inserted.
(2)特定分野のキーワードを予じめ格納しておくキー
ワード格納手段と、前記特定分野に話題を限定して入力
した自然言語文中に前記キーワード格納手段に格納され
た特定分野のキーワードが有る否か参照し対応するキー
ワードを検出したとき前記自然言語文に対して検出され
たキーワードの前後に形態素解析にて認識可能な予じめ
設定した区切り信号の挿入を行なうキーワードによる分
割手段と、前記キーワードによる分割手段によって前記
区切り記号が挿入された前記自然言語文を形態素解析中
に前記区切り記号を見つけると処理対象の文字列が終了
したことを認識し次の文字から新たに形態素解析を始め
ることを繰り返す形態素解析手段と含み構成されること
を特徴とするキーワードを用いた形態素解析装置。
(2) A keyword storage means that stores keywords of a specific field in advance, and whether or not there are keywords of the specific field stored in the keyword storage means in a natural language sentence inputted with a topic limited to the specific field. keyword dividing means for inserting preset delimiter signals that can be recognized by morphological analysis before and after the keyword detected in the natural language sentence when the corresponding keyword is detected by referring to the natural language sentence; When the delimiter is found during morphological analysis of the natural language sentence into which the delimiter has been inserted by the dividing means, it is recognized that the character string to be processed has ended, and morphological analysis is started anew from the next character. A morphological analysis device using keywords, characterized in that it includes a repeating morphological analysis means.
JP62303544A 1987-11-30 1987-11-30 System and device for morpheme analysis using key word Pending JPH01144162A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP62303544A JPH01144162A (en) 1987-11-30 1987-11-30 System and device for morpheme analysis using key word

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP62303544A JPH01144162A (en) 1987-11-30 1987-11-30 System and device for morpheme analysis using key word

Publications (1)

Publication Number Publication Date
JPH01144162A true JPH01144162A (en) 1989-06-06

Family

ID=17922279

Family Applications (1)

Application Number Title Priority Date Filing Date
JP62303544A Pending JPH01144162A (en) 1987-11-30 1987-11-30 System and device for morpheme analysis using key word

Country Status (1)

Country Link
JP (1) JPH01144162A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05174011A (en) * 1991-12-25 1993-07-13 Sharp Corp Kana/kanji conversion processor

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05174011A (en) * 1991-12-25 1993-07-13 Sharp Corp Kana/kanji conversion processor

Similar Documents

Publication Publication Date Title
US5794177A (en) Method and apparatus for morphological analysis and generation of natural language text
JP2783558B2 (en) Summary generation method and summary generation device
WO1997004405A9 (en) Method and apparatus for automated search and retrieval processing
US6904564B1 (en) Method of summarizing text using just the text
JPS63244259A (en) Keyword extractor
JPH06259424A (en) Document display device and document summary device and digital copying device
JPH03286372A (en) Key word extracting device
JPH01144162A (en) System and device for morpheme analysis using key word
JP3627446B2 (en) Search formula creation device and search formula creation method
JP2830097B2 (en) Sentence search method
JPH05233689A (en) Automatic document abstracting method
JP2982076B2 (en) Text processing apparatus and method
JPH01295369A (en) Dividing and processing system for kanji/kana paragraph
JPH07152778A (en) Document retrieval device
JP4071657B2 (en) Text processing device
JP2894298B2 (en) Document search device
Cowie CRL’s Approach to MET
JP3884001B2 (en) Language analysis system and method
JP2001022752A (en) Method and device for character group extraction, and recording medium for character group extraction
JP2001125907A (en) Method and device for retrieving dictionary and recording medium recording dictionary retrieving program
JPH02127769A (en) System and device for detecting item
JPH05266072A (en) Topic extracting device
JPS63213061A (en) System for classifying declensional kana ending
JPH0546610A (en) Document summarizing device
JPH02105968A (en) Automatic test and correction system for japanese sentence error