JPH02148265A - Automatic indexing system - Google Patents
Automatic indexing systemInfo
- Publication number
- JPH02148265A JPH02148265A JP63301036A JP30103688A JPH02148265A JP H02148265 A JPH02148265 A JP H02148265A JP 63301036 A JP63301036 A JP 63301036A JP 30103688 A JP30103688 A JP 30103688A JP H02148265 A JPH02148265 A JP H02148265A
- Authority
- JP
- Japan
- Prior art keywords
- noun
- nouns
- word
- words
- unnecessary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 claims abstract description 13
- 239000002245 particle Substances 0.000 claims abstract description 12
- 230000000877 morphologic effect Effects 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 7
- 230000000694 effects Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000000921 elemental analysis Methods 0.000 description 1
Abstract
Description
【発明の詳細な説明】 [産業上の利用分野〕 本発明は、自動索引システムに関する。[Detailed description of the invention] [Industrial application field] The present invention relates to automatic indexing systems.
[従来の技術]
従来、自動索引システムにおいては、対象の文章を形8
素解析した後1名詞をすべて取り出し、取り出された名
詞から不要語を取り除くことにより、索引語を抽出して
いた。[Prior art] Conventionally, in automatic indexing systems, target sentences are
After elemental analysis, index words were extracted by extracting all nouns and removing unnecessary words from the extracted nouns.
[解決すべき課題]
上述した従来のシステムでは、不要語以外の名詞は、す
べて索引語として抽出されるのて、対象の文章に対して
、索引語が多数抽出され、索引語間の重要性の度合を計
る情報に欠けているという問題かあった。[Problems to be solved] In the conventional system described above, all nouns other than unnecessary words are extracted as index words, but many index words are extracted for the target sentence, and the importance of each index word is determined. The problem was that there was a lack of information to measure the degree of
したかって、従来のシステムては、対象の文章の中で、
より重要である名詞のみを取り出すことか出来ないとい
う問題かあった。Therefore, in the conventional system, in the target sentence,
There was a problem that it was not possible to extract only the more important nouns.
本発明は上述した問題点にかんがみてなされたちのて、
より重要である名詞のみを索引語として取り出すことの
てきる自動索引システムの提供を目的とする。The present invention has been made in view of the above-mentioned problems.
The purpose of the present invention is to provide an automatic indexing system that can extract only more important nouns as index words.
[課題の解決手段コ
上記目的を達成するために本発明の自動索引システムは
、対象の文章を形態素解析して単語データを作成する形
態素解析手段と、この形態素解析手段により得られた単
語データから格助詞を伴なう名詞を取り出して名詞デー
タを作成する名詞抽出手段と、この名詞抽出手段により
得られた名詞データから不要語を取り除いて索引語を抽
出する不要語除去手段とを備えた構成としである。[Means for solving the problem] In order to achieve the above object, the automatic indexing system of the present invention includes a morphological analysis means that morphologically analyzes a target sentence to create word data, and a morphological analysis means that creates word data from the word data obtained by this morphological analysis means. A configuration comprising a noun extraction means for extracting a noun accompanied by a case particle to create noun data, and an unnecessary word removal means for extracting an index word by removing unnecessary words from the noun data obtained by the noun extraction means. It's Toshide.
[実施例]
以下、本発明の一実施例について図面を参照して説明す
る。[Example] Hereinafter, an example of the present invention will be described with reference to the drawings.
第1図は本発明に係る自動索引システムの一実施例の構
成図であるr
本実施例の自動索引システムては、先ず、文章lを、形
態素解析用辞書ファイル2を用いて、形態素解析部3に
おいて、単語に分割し、それぞれの単語に品詞情報を付
与して、単語データ4を作成する。FIG. 1 is a block diagram of an embodiment of an automatic indexing system according to the present invention. In the automatic indexing system of this embodiment, a morphological analysis unit first analyzes a sentence l using a morphological analysis dictionary file 2. In step 3, word data 4 is created by dividing the word into words and adding part-of-speech information to each word.
次に、格助詞取り出しルールファイル5を用いて1名詞
抽出部6において、格助詞を伴なう名詞を抽出し、名詞
データ7を作成する。ここて格助詞を伴なう名詞を抽出
することにより、文章て重要な意味を持つ名詞を取り出
すことか可能となる。Next, a noun extraction unit 6 extracts nouns accompanied by case particles using the case particle extraction rule file 5, and creates noun data 7. By extracting nouns accompanied by case particles, it becomes possible to extract nouns that have important meanings in the text.
次いて、不要語辞書ファイル8を用いて、不要語除去部
9において1名詞データ7から不要語を除去する。ここ
でいう不要語とは、代名詞等の名詞をいう。Next, an unnecessary word removal unit 9 removes unnecessary words from one noun data 7 using the unnecessary word dictionary file 8. The unnecessary words here refer to nouns such as pronouns.
以上の作業により、索引語lOを抽出する。By the above operations, the index word IO is extracted.
[発明の効果]
以上説明したように本発明は、自動索引において、対象
の文章を形態素解析する手段と、形態素解析した結果か
ら格助詞を伴なう名詞を取り出す1段と、取り出した名
詞から不要語を取り除く手段とを有しているのて1文章
中の名詞をすべて取り出すのではなく、より重要である
名詞のみを取り出すことか出来るという効果かある。[Effects of the Invention] As explained above, the present invention provides, in an automatic index, a means for morphologically analyzing a target sentence, a step for extracting a noun with a case particle from the result of the morphological analysis, and a step for extracting a noun accompanied by a case particle from the result of the morphological analysis. Having a means for removing unnecessary words has the effect of being able to extract only the more important nouns, rather than all the nouns in one sentence.
これにより、対象文章に付与する索引語の語数が限定さ
れている場合、より重要な名詞を索引語として取り出す
ことか可能であるという効果かある。This has the effect that if the number of index words to be added to a target sentence is limited, it is possible to extract more important nouns as index words.
第1図は本発明に係る自動索引システムの一実施例を示
す構成図である。
■:文章
2:形態素解析用辞書ファイル
3:形態素解析部
4;単語データ
5:格助詞取り出しルールファイル
6:名詞抽出部
7:名詞データ
8:不要語辞書ファイル
9:不要語除去部
lO:索引語
代理人 弁理士 渡 辺 喜 平
第
図FIG. 1 is a block diagram showing an embodiment of an automatic indexing system according to the present invention. ■: Sentence 2: Morphological analysis dictionary file 3: Morphological analysis unit 4; Word data 5: Case particle extraction rule file 6: Noun extraction unit 7: Noun data 8: Unnecessary word dictionary file 9: Unnecessary word removal unit IO: Index Agent Patent Attorney Kihei Watanabe
Claims (1)
素解析手段と、この形態素解析手段により得られた単語
データから格助詞を伴なう名詞を取り出して名詞データ
を作成する名詞抽出手段と、この名詞抽出手段により得
られた名詞データから不要語を取り除いて索引語を抽出
する不要語除去手段とを備えたことを特徴とする自動索
引システム。a morphological analysis means for morphologically analyzing a target sentence to create word data; a noun extraction means for extracting nouns with case particles from the word data obtained by the morphological analysis means to create noun data; An automatic indexing system comprising: an unnecessary word removing means for extracting index words by removing unnecessary words from noun data obtained by the noun extracting means.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP63301036A JPH02148265A (en) | 1988-11-30 | 1988-11-30 | Automatic indexing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP63301036A JPH02148265A (en) | 1988-11-30 | 1988-11-30 | Automatic indexing system |
Publications (1)
Publication Number | Publication Date |
---|---|
JPH02148265A true JPH02148265A (en) | 1990-06-07 |
Family
ID=17892081
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP63301036A Pending JPH02148265A (en) | 1988-11-30 | 1988-11-30 | Automatic indexing system |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPH02148265A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0778182A (en) * | 1993-06-18 | 1995-03-20 | Hitachi Ltd | Keyword allocating system |
-
1988
- 1988-11-30 JP JP63301036A patent/JPH02148265A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0778182A (en) * | 1993-06-18 | 1995-03-20 | Hitachi Ltd | Keyword allocating system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6965857B1 (en) | Method and apparatus for deriving information from written text | |
EP0403660A1 (en) | Machine translation system | |
EP1078322B1 (en) | System for creating a dictionary | |
JP5028823B2 (en) | Synonym pair extraction apparatus and synonym pair extraction method | |
JPH02148265A (en) | Automatic indexing system | |
Hawwari et al. | Building an Arabic multiword expressions repository | |
Kanaan et al. | An improved algorithm for the extraction of triliteral Arabic roots | |
JP2536633B2 (en) | Compound word extraction device | |
Lamb | The nature of the machine translation problem | |
KR0123238B1 (en) | Morphemes analysis system | |
Sass | The verb argument browser | |
JP2812511B2 (en) | Keyword extraction device | |
Vernerová et al. | To Pay or to Get Paid: Enriching a Valency Lexicon with Diatheses. | |
JPH0244463A (en) | Original text input method in machine translation system | |
JPH05233689A (en) | Automatic document abstracting method | |
JP2893740B2 (en) | Machine translation equipment | |
JPH07152778A (en) | Document retrieval device | |
JPS6151270A (en) | Inter-language relation extracting system | |
JP3020230B2 (en) | Dictionary compact device and natural language processing method | |
JPS63219074A (en) | Natural language analyzing device | |
KR20220043546A (en) | Methods and systems for syntactic and semantic information extraction from plant procedures | |
JPS61281367A (en) | Noun phrase determining system for english analysis | |
JPH01124027A (en) | Automatic forming system for circular index | |
JPH0715692B2 (en) | Context processor | |
JPS5957373A (en) | Index preparation system |