JPH02148265A - Automatic indexing system - Google Patents

Automatic indexing system

Info

Publication number
JPH02148265A
JPH02148265A JP63301036A JP30103688A JPH02148265A JP H02148265 A JPH02148265 A JP H02148265A JP 63301036 A JP63301036 A JP 63301036A JP 30103688 A JP30103688 A JP 30103688A JP H02148265 A JPH02148265 A JP H02148265A
Authority
JP
Japan
Prior art keywords
noun
nouns
word
words
unnecessary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP63301036A
Other languages
Japanese (ja)
Inventor
Akiko Mikami
三上 明子
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP63301036A priority Critical patent/JPH02148265A/en
Publication of JPH02148265A publication Critical patent/JPH02148265A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To take out only a more important noun as the index word by taking out nouns accompanied with case particles from analysis results of morphemes and eliminating unnecessary words from taken-out nouns. CONSTITUTION:A morpheme analysis dictionary file 2 is used to divide a sentence 1 into words by a morpheme analyzing part 3, and information of parts of speech are given to respective words to generate word data 4. A case particle taking-out rule file 5 is used to extract nouns accompanied with case particles by a noun extracting part 6, and noun data 7 is generated. Since nouns accompanied with case particles are extracted, nouns having important meanings in the sentence are taken out. An unnecessary word dictionary file 8 is used to eliminate unnecessary words such as pronouns from noun data 7 in an unnecessary word eliminating part 9, and an index word 10 is extracted. Thus, a more important noun is taken out as the index word when the number of index words given to the object sentence is limited.

Description

【発明の詳細な説明】 [産業上の利用分野〕 本発明は、自動索引システムに関する。[Detailed description of the invention] [Industrial application field] The present invention relates to automatic indexing systems.

[従来の技術] 従来、自動索引システムにおいては、対象の文章を形8
素解析した後1名詞をすべて取り出し、取り出された名
詞から不要語を取り除くことにより、索引語を抽出して
いた。
[Prior art] Conventionally, in automatic indexing systems, target sentences are
After elemental analysis, index words were extracted by extracting all nouns and removing unnecessary words from the extracted nouns.

[解決すべき課題] 上述した従来のシステムでは、不要語以外の名詞は、す
べて索引語として抽出されるのて、対象の文章に対して
、索引語が多数抽出され、索引語間の重要性の度合を計
る情報に欠けているという問題かあった。
[Problems to be solved] In the conventional system described above, all nouns other than unnecessary words are extracted as index words, but many index words are extracted for the target sentence, and the importance of each index word is determined. The problem was that there was a lack of information to measure the degree of

したかって、従来のシステムては、対象の文章の中で、
より重要である名詞のみを取り出すことか出来ないとい
う問題かあった。
Therefore, in the conventional system, in the target sentence,
There was a problem that it was not possible to extract only the more important nouns.

本発明は上述した問題点にかんがみてなされたちのて、
より重要である名詞のみを索引語として取り出すことの
てきる自動索引システムの提供を目的とする。
The present invention has been made in view of the above-mentioned problems.
The purpose of the present invention is to provide an automatic indexing system that can extract only more important nouns as index words.

[課題の解決手段コ 上記目的を達成するために本発明の自動索引システムは
、対象の文章を形態素解析して単語データを作成する形
態素解析手段と、この形態素解析手段により得られた単
語データから格助詞を伴なう名詞を取り出して名詞デー
タを作成する名詞抽出手段と、この名詞抽出手段により
得られた名詞データから不要語を取り除いて索引語を抽
出する不要語除去手段とを備えた構成としである。
[Means for solving the problem] In order to achieve the above object, the automatic indexing system of the present invention includes a morphological analysis means that morphologically analyzes a target sentence to create word data, and a morphological analysis means that creates word data from the word data obtained by this morphological analysis means. A configuration comprising a noun extraction means for extracting a noun accompanied by a case particle to create noun data, and an unnecessary word removal means for extracting an index word by removing unnecessary words from the noun data obtained by the noun extraction means. It's Toshide.

[実施例] 以下、本発明の一実施例について図面を参照して説明す
る。
[Example] Hereinafter, an example of the present invention will be described with reference to the drawings.

第1図は本発明に係る自動索引システムの一実施例の構
成図であるr 本実施例の自動索引システムては、先ず、文章lを、形
態素解析用辞書ファイル2を用いて、形態素解析部3に
おいて、単語に分割し、それぞれの単語に品詞情報を付
与して、単語データ4を作成する。
FIG. 1 is a block diagram of an embodiment of an automatic indexing system according to the present invention. In the automatic indexing system of this embodiment, a morphological analysis unit first analyzes a sentence l using a morphological analysis dictionary file 2. In step 3, word data 4 is created by dividing the word into words and adding part-of-speech information to each word.

次に、格助詞取り出しルールファイル5を用いて1名詞
抽出部6において、格助詞を伴なう名詞を抽出し、名詞
データ7を作成する。ここて格助詞を伴なう名詞を抽出
することにより、文章て重要な意味を持つ名詞を取り出
すことか可能となる。
Next, a noun extraction unit 6 extracts nouns accompanied by case particles using the case particle extraction rule file 5, and creates noun data 7. By extracting nouns accompanied by case particles, it becomes possible to extract nouns that have important meanings in the text.

次いて、不要語辞書ファイル8を用いて、不要語除去部
9において1名詞データ7から不要語を除去する。ここ
でいう不要語とは、代名詞等の名詞をいう。
Next, an unnecessary word removal unit 9 removes unnecessary words from one noun data 7 using the unnecessary word dictionary file 8. The unnecessary words here refer to nouns such as pronouns.

以上の作業により、索引語lOを抽出する。By the above operations, the index word IO is extracted.

[発明の効果] 以上説明したように本発明は、自動索引において、対象
の文章を形態素解析する手段と、形態素解析した結果か
ら格助詞を伴なう名詞を取り出す1段と、取り出した名
詞から不要語を取り除く手段とを有しているのて1文章
中の名詞をすべて取り出すのではなく、より重要である
名詞のみを取り出すことか出来るという効果かある。
[Effects of the Invention] As explained above, the present invention provides, in an automatic index, a means for morphologically analyzing a target sentence, a step for extracting a noun with a case particle from the result of the morphological analysis, and a step for extracting a noun accompanied by a case particle from the result of the morphological analysis. Having a means for removing unnecessary words has the effect of being able to extract only the more important nouns, rather than all the nouns in one sentence.

これにより、対象文章に付与する索引語の語数が限定さ
れている場合、より重要な名詞を索引語として取り出す
ことか可能であるという効果かある。
This has the effect that if the number of index words to be added to a target sentence is limited, it is possible to extract more important nouns as index words.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明に係る自動索引システムの一実施例を示
す構成図である。 ■:文章 2:形態素解析用辞書ファイル 3:形態素解析部 4;単語データ 5:格助詞取り出しルールファイル 6:名詞抽出部 7:名詞データ 8:不要語辞書ファイル 9:不要語除去部 lO:索引語 代理人 弁理士 渡 辺 喜 平 第 図
FIG. 1 is a block diagram showing an embodiment of an automatic indexing system according to the present invention. ■: Sentence 2: Morphological analysis dictionary file 3: Morphological analysis unit 4; Word data 5: Case particle extraction rule file 6: Noun extraction unit 7: Noun data 8: Unnecessary word dictionary file 9: Unnecessary word removal unit IO: Index Agent Patent Attorney Kihei Watanabe

Claims (1)

【特許請求の範囲】[Claims] 対象の文章を形態素解析して単語データを作成する形態
素解析手段と、この形態素解析手段により得られた単語
データから格助詞を伴なう名詞を取り出して名詞データ
を作成する名詞抽出手段と、この名詞抽出手段により得
られた名詞データから不要語を取り除いて索引語を抽出
する不要語除去手段とを備えたことを特徴とする自動索
引システム。
a morphological analysis means for morphologically analyzing a target sentence to create word data; a noun extraction means for extracting nouns with case particles from the word data obtained by the morphological analysis means to create noun data; An automatic indexing system comprising: an unnecessary word removing means for extracting index words by removing unnecessary words from noun data obtained by the noun extracting means.
JP63301036A 1988-11-30 1988-11-30 Automatic indexing system Pending JPH02148265A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP63301036A JPH02148265A (en) 1988-11-30 1988-11-30 Automatic indexing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP63301036A JPH02148265A (en) 1988-11-30 1988-11-30 Automatic indexing system

Publications (1)

Publication Number Publication Date
JPH02148265A true JPH02148265A (en) 1990-06-07

Family

ID=17892081

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63301036A Pending JPH02148265A (en) 1988-11-30 1988-11-30 Automatic indexing system

Country Status (1)

Country Link
JP (1) JPH02148265A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0778182A (en) * 1993-06-18 1995-03-20 Hitachi Ltd Keyword allocating system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0778182A (en) * 1993-06-18 1995-03-20 Hitachi Ltd Keyword allocating system

Similar Documents

Publication Publication Date Title
US6965857B1 (en) Method and apparatus for deriving information from written text
EP0403660A1 (en) Machine translation system
EP1078322B1 (en) System for creating a dictionary
JP5028823B2 (en) Synonym pair extraction apparatus and synonym pair extraction method
JPH02148265A (en) Automatic indexing system
Hawwari et al. Building an Arabic multiword expressions repository
Kanaan et al. An improved algorithm for the extraction of triliteral Arabic roots
JP2536633B2 (en) Compound word extraction device
Lamb The nature of the machine translation problem
KR0123238B1 (en) Morphemes analysis system
Sass The verb argument browser
JP2812511B2 (en) Keyword extraction device
Vernerová et al. To Pay or to Get Paid: Enriching a Valency Lexicon with Diatheses.
JPH0244463A (en) Original text input method in machine translation system
JPH05233689A (en) Automatic document abstracting method
JP2893740B2 (en) Machine translation equipment
JPH07152778A (en) Document retrieval device
JPS6151270A (en) Inter-language relation extracting system
JP3020230B2 (en) Dictionary compact device and natural language processing method
JPS63219074A (en) Natural language analyzing device
KR20220043546A (en) Methods and systems for syntactic and semantic information extraction from plant procedures
JPS61281367A (en) Noun phrase determining system for english analysis
JPH01124027A (en) Automatic forming system for circular index
JPH0715692B2 (en) Context processor
JPS5957373A (en) Index preparation system