JPH0391077A

JPH0391077A - Keyword index generation method

Info

Publication number: JPH0391077A
Application number: JP1228672A
Authority: JP
Inventors: Koji Akiyama; 幸司秋山; Masahiro Kawasaki; 正博川崎; Juichiro Yamazaki; 山崎　重一郎
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1989-09-04
Filing date: 1989-09-04
Publication date: 1991-04-16

Abstract

PURPOSE:To improve the retrieving rate of a desired sentence by reading out a word showing the abstract conception of a selected word and a word showing the abstract conception of a selected word and a word showing a conception, to which a name belongs, and outputting those words to a keyword index update part together with the selected words. CONSTITUTION:A word development part 16 reads out the word showing the abstract conception of the word selected by a word selection part 12, and the word showing the conception, to which the name belongs, from a storing means 15 and outputs those words to a keyword index update part 13 together with the selected words. Thus, the extremely concrete conception is developed from the selected word to the word showing the more abstract conception or developed from the name to the word showing the conception and not only the original word but also these developed word groups can be attached as the keyword of a record unit text and registered to a keyword index 14. Thus, even the word showing the conception not existent in the record unit text of the sentence can be registered to the keyword index as well and the retrieving rate of the desired sentence can be more improved.

Description

【発明の詳細な説明】（概要）テキストベースから所望の文章を検索するために用いら
れるキーワード索引の生成方式に関し、所望の文章の検
索率をより向上することを目的とし、文章の記録単位テキストを単語分割手段により単語分割
し、その分割された単語の中から単語選別部によりキー
ワードになり得る単語だけを選別してキーワード索引更
新部によりキーワード歯弓に登録するキーワード索引生
成方式において、キーワードになり得る各単語について
各々その単語のより抽象的な概念及び名称が属する概念
を表す単語群を予め記憶している記憶手段と、前記単語
選別部により選別された単語に基づき該記憶手段から該
選別された単語の抽象的な概念を示す単語及び名称が属
する概念を示す単語を読み出し、それらを該選別された
単語と共に前記キーワード索引更新部へ出力する単語展
開部とを具備するよう構成する。[Detailed Description of the Invention] (Summary) The purpose of this invention is to further improve the retrieval rate of a desired text with respect to a keyword index generation method used to search for a desired text from a text base. In the keyword index generation method, a word division means divides the word into words, a word sorting unit selects only words that can become keywords from the divided words, and the keyword index updating unit registers the words in the keyword arch. a storage means that stores in advance a group of words representing a more abstract concept of each possible word and a concept to which the name belongs; and a selection method from the storage means based on the words selected by the word selection section. The present invention is configured to include a word expansion section that reads out words indicating the abstract concept of the selected word and words indicating the concept to which the name belongs, and outputs them together with the selected words to the keyword index updating section.

（産業上の利用分野）本発明はキーワード索引生成方式に係り、特にテキスト
ベースから所望の文章を検索するために用いられるキー
ワード索引の生成方式に関する。(Industrial Application Field) The present invention relates to a keyword index generation method, and more particularly to a keyword index generation method used to search for a desired sentence from a text base.

（従来の技術）大量の文章をある単位テキスト（例えば一つの文あるい
は段落など）毎に記録したテキストベースから所望の文
章を検索する方式として、キーワード検索方式が知られ
ている。このキーワード検索方式は、文章の各記録単位
テキスト毎にその内容を表すキーワード群を付加し、こ
のようなキーワードから各記録単位テキストへの写像を
与える索引を構成することにより、キーワード群の中か
ら該当するキーワードを用いて所望の文章を検索するも
のである。(Prior Art) A keyword search method is known as a method for searching for a desired sentence from a text base in which a large amount of sentences are recorded in units of text (for example, one sentence or paragraph). This keyword search method adds a keyword group representing the content to each recording unit text of a sentence, and constructs an index that provides mapping from such keywords to each recording unit text. A desired sentence is searched using a corresponding keyword.

しかし、大量の文章に対してこのようなキーワードを人
手で付加することは、膨大な工数が必要であるため、従
来、計算機と辞書を用いた単語分割処理によって得られ
た単語群から自立語を抽出してキーワードとする方式（
フリーターム方式）が行なわれている。However, manually adding such keywords to a large amount of text requires a huge amount of man-hours, so traditionally, independent words were extracted from word groups obtained by word segmentation using a computer and dictionary. Method of extracting keywords (
A free-term system is in place.

[Problem to be solved by the invention]

上記のフリーターム方式では、文章中に現れる単語をそ
のままキーワードとするため、一般に物の名称（固有名
詞）や非常に具体的な概念を表す単語がキーワードにな
る傾向が強い。一方、テキストへ−スの検索者の立場か
らは、物の名称や具体的な概念を表す単語が不明である
ことが多く、そのため抽象的な概念を表す単語がキーワ
ードとしてテキストベース検索装置に入力される傾向が
ある。In the above-mentioned free-term method, words that appear in a sentence are used as keywords, so there is a strong tendency for keywords to be the names of things (proper nouns) or words that represent very specific concepts. On the other hand, from the standpoint of text-based searchers, the names of objects and words expressing concrete concepts are often unknown, so words expressing abstract concepts are input as keywords into text-based search devices. There is a tendency to

従って、従来はフリーターム方式によって生成されたキ
ーワード索引を用いる場合、検索者が検索対象である文
章の分野について良く知っているか、若しくは検索者自
身がより具体的なキーワードを概念体系（シソーラス）
によって調べない限り、所望の文章が検索されない可能
性が高いという欠点がある。Therefore, when using a keyword index conventionally generated by the free-term method, it is necessary for the searcher to be familiar with the field of text that is the target of the search, or to search for more specific keywords using a concept system (thesaurus).
The disadvantage is that there is a high possibility that the desired text will not be retrieved unless it is searched by .

本発明は上記の点に鑑みてなされたもので、所望の文章
の検索率をより向上し得るキーワード索引生成方式を提
供することを目的とする。The present invention has been made in view of the above points, and an object of the present invention is to provide a keyword index generation method that can further improve the search rate for desired sentences.

[Means to solve the problem]

第１図は本発明の原理ブロック図を示す。本発明は文章
の記録単位テキストを単語分割手段１１により単語分割
し、その分割された単語の中から単語選別部１２により
キーワードになり得る単語だけを選別してキーワード索
引更新部１３によりキーワード索引１４に登録するキー
ワード索引生成方式において、記憶手段１５と単語展開
部１６とを具備するようにしたものである。FIG. 1 shows a block diagram of the principle of the present invention. In the present invention, a recorded unit text of a sentence is divided into words by a word division means 11, and from the divided words, a word selection unit 12 selects only words that can become keywords, and a keyword index update unit 13 creates a keyword index 14. This is a keyword index generation method for registering a keyword index, which is equipped with a storage means 15 and a word expansion section 16.

ここで、上記の記憶手段１５は、キーワードになり得る
各単語について各々その単語のより抽象的な概念及び名
称が属する概念を表す単語群を予め記憶している。Here, the storage means 15 stores in advance, for each word that can be a keyword, a group of words representing a more abstract concept of the word and a concept to which the name belongs.

また、１語展開部１６は単語選別部１２により選別され
た単語の抽象的な概念を示す単語及び名称が属する概念
を示す単語を記憶手段１５から読み出し、それらを上記
選別された単語と共にキーワード索引更新部１３へ出力
する。In addition, the one-word expansion unit 16 reads out from the storage unit 15 words indicating the abstract concept of the word selected by the word selection unit 12 and words indicating the concept to which the name belongs, and uses them together with the selected words in the keyword index. It is output to the update unit 13.

[Effect]

本発明では、従来の７リ一ターム方式によるキーワード
索引生成方式と同様にして分割された単語群を、予め記
憶手段１５に用意した概念分類体系、並びに各種名称を
付された物が属する概念の知識を単語展間部１６によっ
て用いることにより、非常に具体的な概念を表す選別さ
れた単語からより抽象的な概念を表す単語へ展開し、あ
るいは名称から概念を表す単語へ展開し、もとの単語だ
けでなく、これら展開された単語群も記録単位テキスト
のキーワードとして付加してキーワード索引１４に登録
する。In the present invention, the word groups divided in the same manner as the conventional 7-term keyword index generation method are used in a concept classification system prepared in advance in the storage means 15, as well as concepts to which objects with various names belong. Knowledge is used by the word processor 16 to expand from selected words representing very specific concepts to words representing more abstract concepts, or from names to words representing concepts. Not only the words , but also these expanded word groups are added as keywords of the recording unit text and registered in the keyword index 14 .

従って、本発明によれば、文章の記録単位テキスト中に
は存在しない概念を表す単語も、キーワード索引１４に
登録することができる。Therefore, according to the present invention, words representing concepts that do not exist in the recording unit text of a sentence can also be registered in the keyword index 14.

〔実施例〕第２図は本発明方式の一実施例を適用した文章検索装置
のブロック図を示す。同図中、第１図と同一構成部分に
は同一符号を付しである。第２図において、１７は単語
分割部、１８は単語辞書で、これらは単語分割手段１１
を構成している。また、１９は概念分類体系知識で、入
力された単語のより抽象的な概念を表す単語が予め記憶
されている。[Embodiment] FIG. 2 shows a block diagram of a text retrieval device to which an embodiment of the system of the present invention is applied. In the figure, the same components as in FIG. 1 are given the same reference numerals. In FIG. 2, 17 is a word division unit, 18 is a word dictionary, and these are connected to the word division means 11.
It consists of Reference numeral 19 is conceptual classification system knowledge, in which words representing more abstract concepts than the input words are stored in advance.

２０は名称−概念関連知識で、入力された単語の名称が
属する概念を表す単語が予め記憶されている。これらの
概念分類体系知識１９及び名称−概念関連知１２０は記
憶手段１５を構成している。Reference numeral 20 denotes name-concept related knowledge, in which words representing the concept to which the name of the input word belongs are stored in advance. These concept classification system knowledge 19 and name-concept related knowledge 120 constitute the storage means 15.

また、２１は文章入力部、２２は文章登録部、２３はテ
キストベース、２４はキーワード検索装置である。Further, 21 is a text input section, 22 is a text registration section, 23 is a text base, and 24 is a keyword search device.

次に本実施例の動作について説明する。検索対象である
文章は文章入力部２１により適当な記録単位テキストに
分けられる。分けられた記録単位テキストは文章登録部
２２によってテキストベースに登録されると共に、単語
辞書１８を用いた単語分割部１７によって単語列に変換
される。Next, the operation of this embodiment will be explained. The text to be searched is divided into appropriate recording unit texts by the text input section 21. The divided recording unit text is registered in the text base by the text registration section 22, and is converted into a word string by the word division section 17 using the word dictionary 18.

この単語列は単語選別部１２によってキーワードになり
つる単語のみが選択されてから単語展間部１６に入力さ
れる。単語展開部１６は概念分類体系知識１９及び名称
−概念関連知識２０を用い、単語選別部１２により選別
されて入力された単語（キーワード）から抽象的な単語
あるいは名称が属する概念を表す単語を求め、これら求
めた単語群（Ｒ開された単語群）をもとの単語（キーワ
ード）と共にキーワード索引更新部１３へ送る。This word string is input to the word selection section 16 after the word selection section 12 selects only words that can be used as keywords. The word development unit 16 uses concept classification system knowledge 19 and name-concept related knowledge 20 to find abstract words or words representing the concept to which the name belongs from the words (keywords) selected and input by the word selection unit 12. , these obtained word groups (R-opened word groups) are sent to the keyword index updating unit 13 together with the original words (keywords).

キーワード索引更新部１３は入力された単語群をキーワ
ード索引１４に更新登録する。キーワード検索装置２４
はテキストベースの検索者により操作され、キーワード
索引１４の中から検索者の指示したキーワードを検索し
、そのキーワードに対応する記録単位テキストをテキス
トベース２３から読み出す。ここで、キーワード索引１
４には前記したように記録単位テキスト中の単語だけで
なく、記録単位テキスト中には存在しない概念を表す単
語もキーワードとして生成されて登録されているから、
検索者が入力したキーワードが所望の記録単位テキスト
中に存在しなくても、記録単位アキスト中の単語の抽象
的概念あるいは名称が属する概念を表す単語である場合
には所望の記録単位テキストを検索することができる。The keyword index update unit 13 updates and registers the input word group in the keyword index 14. Keyword search device 24
is operated by a text-based searcher to search the keyword index 14 for a keyword specified by the searcher, and read out the recording unit text corresponding to the keyword from the text base 23. Here, keyword index 1
4, as mentioned above, not only words in the recording unit text but also words representing concepts that do not exist in the recording unit text are generated and registered as keywords.
Even if the keyword entered by the searcher does not exist in the desired recording unit text, the desired recording unit text can be searched if it is a word that represents an abstract concept or a concept to which the name belongs in the word in the recording unit aquist. can do.

従って、従来よりも本実施例の方がより所望の記録単位
テキストの検索率を向上することができる。Therefore, the present embodiment can improve the retrieval rate of a desired recording unit text more than the conventional method.

次に本実施例による文章検索動作についてより具体的に
説明する。例えば新聞記事見出し文をアキストベース２
３に登録する時、文章入力部２１によって分けられた記
録単位テキストが第３図に示すものであるものとする。Next, the text search operation according to this embodiment will be explained in more detail. For example, write a newspaper article headline using Acistbase 2.
3, it is assumed that the recording unit texts divided by the text input section 21 are as shown in FIG.

この記録単位テキストを単語分割部１７．単語辞１１８
及び単語展開部１６によって従来と同様に単語分割及び
単語選別を行なうと、選別された結果は第４図に示す如
くになる。This recording unit text is processed by the word dividing unit 17. Vocabulary 118
When the word expansion unit 16 performs word division and word selection in the same manner as in the prior art, the selection results are as shown in FIG.

本実施例は更に第５図に示されるような上位語。The present embodiment further uses hypernyms as shown in FIG.

関連語が概念を表す単語別に分類された概念分類体系知
識１９と、第６図に示されるような名称と名称が属する
概念を表す４１藷が格納された名称概念関連知識２０と
を用いて単語展開部１６で単語展開を行なう。これによ
り、第４図中の「富士通」からは第６図かられかるよう
に「コンピュータメーカ」が、第４図中（７）ｒＦＭ−
ＴＯＷＮＳ、１からは第６図の名称−概念関連知識２０
より「パソコン」が、第４図中の「ＭＰＵ」からは第５
図の概念分類体系知識１９よりｒＬｓＩＪと「コンピュ
ータ」とが夫々展開される。従って、単語展開部１６に
よって第７図に示す如き単語群がキーワードとして得ら
れる。The concept classification system knowledge 19 in which related words are classified by word representing the concept, and the name concept related knowledge 20 in which names and 41 categories representing the concepts to which the names belong, as shown in FIG. 6, are stored. A word expansion section 16 performs word expansion. As a result, from "Fujitsu" in Figure 4 to "Computer Manufacturer" as shown in Figure 6, (7) rFM-
TOWNS, from 1 to the name of Figure 6 - concept related knowledge 20
"PC" in Figure 4 is "MPU" in Figure 5.
From the concept classification system knowledge 19 in the figure, rLsIJ and "computer" are respectively expanded. Therefore, the word expansion section 16 obtains a word group as shown in FIG. 7 as a keyword.

これらの単語群はキーワード索引更新部１３によって第
８図（Ｂ）に示すようにキーワード索引１４に登録され
る。キーワード索引１４は、テキストベース２３に格納
された記録単位テキストヘのポインタ情報を持っている
。従って、第３図に示した記録単位テキストが第８図（
Ａ）に示す如くテキストベース２３の番号１０２０６の
位置に格納される場合は、上記第７図の単語群（キーワ
ード群）は第８図（Ｂ）に示す如く同じテキスト番号１
０２０６と共にキーワード索引１４に登録される。These word groups are registered in the keyword index 14 by the keyword index updating unit 13 as shown in FIG. 8(B). The keyword index 14 has pointer information to the recording unit text stored in the text base 23. Therefore, the recording unit text shown in Figure 3 is changed to Figure 8 (
When stored in the position of number 10206 in the text base 23 as shown in A), the word group (keyword group) in Fig. 7 is stored in the same text number 1 as shown in Fig. 8 (B).
It is registered in the keyword index 14 along with 0206.

キーワード検索装置２４は検索者から与えられたキーワ
ードに対応する記録単位テキストを、このキーワード索
引１４を用いて取得する。これにより、本実施例によれ
ば、第３図に示した記録単位テキストを検索する場合、
従来方式では検索することができなかった「コンピュー
タメーカ」。The keyword search device 24 uses the keyword index 14 to obtain a recording unit text corresponding to the keyword given by the searcher. As a result, according to this embodiment, when searching the recording unit text shown in FIG.
``Computer manufacturers'' that could not be searched using conventional methods.

「パソコンＪ、ｒＬｓＩＪ及び「コンピュータ」のいず
れかをキーワードとして入力したときでも、検索するこ
とができる。You can also search by entering any of "PC J, rLsIJ," and "Computer" as keywords.

（発明の効果）上述の如く、本発明によれば、記録単位テキスト中には
存在しない概念を示す単語もキーワード索引に登録する
ため、フリーターム方式によるキーワード索引生成に比
べて、検索率を向上でき、検索の漏れを少なくすること
ができる等の特長を有するものである。(Effects of the Invention) As described above, according to the present invention, words representing concepts that do not exist in the recording unit text are also registered in the keyword index, so the search rate is improved compared to keyword index generation using the free term method. This feature has the advantage of being able to reduce search omissions.

[Brief explanation of drawings]

第１図は本発明の原理ブロック図、第２図は本発明の一実施例を適用した文章検索装置のブ
ロック図、第３図は文章入力部で切り出された記録単位テキストの
一例を示す図、第４図は単語分割及び単語選別された結果の一例を示す
図、第５図は概念分類体系知識の内容例を示す図、第６図は
名称−概念関連知識の内容例を示す図、第７図は単語展
開部によって得られたキーワード群の一例を示す図、第８図は第２図中のテキストベース及びキーワード索引
の内容の一例を示す図である。図において、１１は単語分割手段、１２は単語選別部、１３はキーワード索引更新部、１４はキーワード索引、１５は記憶手段、１６は単語展開部、１９は概念分類体系知識、２０は名称−概念関連知識を示す。FIG. 1 is a block diagram of the principle of the present invention. FIG. 2 is a block diagram of a text retrieval device to which an embodiment of the present invention is applied. FIG. 3 is a diagram showing an example of recording unit text cut out by the text input section. , Figure 4 is a diagram showing an example of the results of word segmentation and word selection, Figure 5 is a diagram showing an example of the content of concept classification system knowledge, Figure 6 is a diagram showing an example of the content of name-concept related knowledge, FIG. 7 is a diagram showing an example of a keyword group obtained by the word expansion section, and FIG. 8 is a diagram showing an example of the contents of the text base and keyword index in FIG. 2. In the figure, 11 is word division means, 12 is word selection section, 13 is keyword index update section, 14 is keyword index, 15 is storage means, 16 is word expansion section, 19 is concept classification system knowledge, 20 is name-concept Demonstrate relevant knowledge.

Claims

[Claims] A recording unit text of a sentence is divided into words by a word division means (11), and a word selection unit (11) selects words from among the divided words.
12) selects only words that can be used as keywords, and the keyword index update unit (13) creates a keyword index (
14) In the keyword index generation method registered in 14), for each word that can be a keyword, a storage means (15) that stores in advance a group of words representing a more abstract concept of that word and a concept to which the name belongs; Based on the words selected by the word selection section (12), words representing the abstract concept of the selected words and words representing the concept to which the name belongs are read out from the storage means (15), and these are read out from the storage means (15), and words representing the concept to which the name belongs are read out from the storage means (15). A keyword index generation method, comprising: a word expansion unit (16) that outputs the words together with the keyword index update unit (13).