JPH10301952A

JPH10301952A - Document retrieving device and method for adding and displaying meaning icon

Info

Publication number: JPH10301952A
Application number: JP11094897A
Authority: JP
Inventors: Yasuo Tanosaki; 康雄田野崎
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1997-04-28
Filing date: 1997-04-28
Publication date: 1998-11-13

Abstract

PROBLEM TO BE SOLVED: To present the meaning contents of a document by an easily understandable format and to easily retrieve the document. SOLUTION: When the device is provided with an icon table storing buffer 22b for storing a group of plural co-occurrence words correspondingly to a header keyword and an icon image storing buffer 22g for storing an icon corresponding to the co-occurrence word group, a header keyword retrieving part 20c retrieves a part coincident with the header keyword from document data. A simularity calculation part 20d calculates simularity between each cooccurrence word group corresponding to the retrieved header keyword and a word appearing around the position of the header keyword of the document data. A sentence icon adding processing part 20b adds the icon corresponding to the co-occurrence word group having the maximum simularity correspondingly to the retrieved position of the header keyword in the document data. A document table icon display part 20f displays the added icon correspondingly to the document icon.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、大量の電子化され
た文書データから特定の文書を検索する文書検索技術に
関する。[0001] 1. Field of the Invention [0002] The present invention relates to a document retrieval technique for retrieving a specific document from a large amount of electronic document data.

【０００２】[0002]

【従来の技術】近年、大量の電子化された文書データが
流通するようになり、インターネット等のネットワーク
を介して個人でも様々文書データを入手することが可能
になっている。2. Description of the Related Art In recent years, a large amount of computerized document data has been distributed, and it has become possible for individuals to obtain various document data via a network such as the Internet.

【０００３】文書を入手する手段としては、大量の文書
データ中からユーザが目的する文書データを高速に検索
するシステム、例えばＷＷＷ（world wide web）サーバ
などが実用化されている。このような検索システムで
は、ユーザがキーワードを指定し、システム側で全文検
索を行ないキーワードを含む文書を探し出す場合と、予
め文書が分野ごとに分類されて格納されており、ユーザ
が必要な文書の分野を指定する場合とがある。その他、
ユーザが文書を入手する手段としては電子メールシステ
ムなどがある。[0003] As a means for obtaining a document, a system for quickly retrieving user-desired document data from a large amount of document data, such as a WWW (world wide web) server, has been put into practical use. In such a search system, when a user specifies a keyword and a full-text search is performed on the system side to search for a document including the keyword, the document is classified in advance by field and stored. Sometimes the field is specified. Others
As a means for a user to obtain a document, there is an electronic mail system or the like.

【０００４】以上の何れの場合においても、入手した文
書においては、一般にユーザが目的とする部分とそうで
部分とが、テキスト中に混在している場合が多い。ひと
つの文書のデータ量が大きくなると、この中からユーザ
が必要とする部分を選び出すことが困難になってしま
う。このため、従来のシステムでは、文書中のユーザが
検索時に指定したキーワードを反転表示する等の方式で
明示して、ユーザが目的とする部分であるかそうでない
部分であるかを判断し易くしている。[0004] In any of the above cases, in the obtained document, in general, a portion intended by the user and a portion intended by the user are often mixed in the text. If the data amount of one document is large, it becomes difficult for the user to select a necessary part from the data. For this reason, in the conventional system, the keyword specified by the user in the document at the time of the search is clearly displayed in a method such as reverse display, so that the user can easily determine whether the target portion is the target portion or not. ing.

【０００５】また、近年では、文書全体あるいは文書中
の一部の内容をわかりやすく表示するために、自動文書
要約システムあるいは自動インデックス付加システムも
開発されている。これらのシステムでは、文書中から所
定の条件に該当する部分、すなわち文書中の重要度が高
いと判断される部分などを抽出して要約文あるいはタイ
トルを生成し、ユーザに呈示することができる。しか
し、要約文あるいはタイトルを生成するための所定の条
件が、ユーザにより注目する点が異なることを考慮して
いないため、呈示された要約文あるいはタイトルは、必
ずしも全てのユーザにとって、文書の内容を参照する際
に十分に役立っているとはいえなかった。In recent years, an automatic document summarizing system or an automatic index adding system has been developed in order to easily display the entire document or a part of the content of the document. In these systems, it is possible to extract a portion corresponding to a predetermined condition from a document, that is, a portion of the document determined to be high in importance, generate a summary sentence or a title, and present it to the user. However, since the predetermined condition for generating a summary or title does not take into account that the points of interest differ from user to user, the presented summary or title is not necessarily the content of the document for all users. It was not helpful enough for reference.

【０００６】[0006]

【発明が解決しようとする課題】このように従来の文書
データを検索するシステムでは、キーワードや分野を指
定することで目的とする文書を効率的に得ることが可能
になっているが、こうして入手した文書においては、一
般にユーザが目的とする部分のみでなく、目的としない
部分も多く含まれている。As described above, in the conventional document data search system, it is possible to efficiently obtain a target document by specifying a keyword or a field. In general, the document includes not only a portion intended by the user but also many portions not intended.

【０００７】そのため、特にひとつの文書のデータ量が
多い場合に、ユーザが文書の内容を読まずに効率良くそ
の意味内容を判断し、目的とするものを即座に選択でき
るシステムが必要となる。[0007] Therefore, a system is required that enables a user to efficiently judge the meaning of a document without reading the contents of the document and to immediately select a desired one, especially when the data amount of one document is large.

【０００８】そのための手段として従来は、ユーザによ
って指定されたキーワードの部分を明示する方式、ある
いはテキストを解析し、その意味内容を表わす要約文あ
るいはタイトルを付加する方式が実現されているが次の
ような問題があった。Conventionally, as a means for achieving this, a method of explicitly indicating a keyword part specified by a user or a method of analyzing a text and adding a summary or title indicating the meaning thereof has been realized. There was such a problem.

【０００９】すなわち、テキスト中の指定されたキーワ
ードを明示する方式では、ユーザは注目すべき部分が特
定できるものの、キーワード付近の文章についても参照
した上で文書の内容を判断しなければならず、負担はそ
れほど軽減されていない。[0009] That is, in the method of specifying a designated keyword in a text, a user can specify a portion to be noticed, but must determine the contents of a document by referring to sentences near the keyword. The burden has not been reduced so much.

【００１０】また、要約文あるいはタイトルを生成する
方式では、原則的には、原文書中の重要度が高いと判断
される部分を抽出しており、どのようなことが重要かは
ユーザによって価値基準が異なるため、個々のユーザの
要求を満たす適確な要約文あるいはタイトルを付加する
ことが困難であるという不具合があった。また、文書の
内容を文字列で表現するため、抽象度の高い意味内容を
ユーザが把握することが困難であり、所望する文書を的
確に検索することができない場合があった。In the method of generating a summary or a title, in principle, a portion of the original document which is determined to be highly important is extracted, and what is important is determined by the user. Since the standards are different, there is a problem that it is difficult to add an accurate summary or title satisfying the requirements of each user. Further, since the content of the document is represented by a character string, it is difficult for the user to grasp the meaning content having a high degree of abstraction, and the desired document may not be searched accurately.

【００１１】本発明は上記の事情を考慮してなされたも
ので、文書の意味内容をわかりやすい形態で呈示して文
書の検索を容易にすることが可能な文書検索装置及び意
味アイコン付加表示方法を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and provides a document search apparatus and a meaning icon addition display method capable of presenting a semantic content of a document in an easy-to-understand form to facilitate document search. The purpose is to provide.

【００１２】[0012]

【課題を解決するための手段】本発明は、見出しキーワ
ードと、この見出しキーワードと対応づけて、共起関係
にある少なくとも一つの共起単語を含む少なくとも一つ
の共起単語の組を格納するためのテーブル格納手段と、
前記テーブル格納手段に格納された前記共起単語の組に
対応づけて、意味的に関連するアイコンを表示するため
のアイコンデータを格納するアイコンデータ格納手段
と、文書データを格納するための文書データ格納手段
と、前記文書データ格納手段に格納された文書データか
ら、前記テーブル格納手段に格納された見出しキーワー
ドと一致する部分を検索する見出しキーワード検索手段
と、前記見出しキーワード検索手段によって前記文書デ
ータから検索された見出しキーワードに対応づけられた
前記テーブル格納手段に格納された共起単語の組のそれ
ぞれと、見出しキーワードが検索された文書データの前
記見出しキーワードの位置の周辺に出現する単語との間
で類似度を計算する類似度計算手段と、前記類似度計算
手段によって計算された類似度を参照して、類似度が最
大となった共起単語の組に対応づけられた前記アイコン
データ格納手段に格納されているアイコンデータを、前
記文書データ中の前記見出しキーワードが検索された位
置に対応づけて付加するアイコン付加手段と、前記アイ
コン付加手段によって付加されたアイコンデータを用い
て、文書データに対応づけてアイコンを表示するアイコ
ン表示手段とを具備したことを特徴とする。According to the present invention, a headline keyword and at least one set of co-occurrence words including at least one co-occurrence word in co-occurrence relation are stored in association with the headline keyword. Table storage means,
Icon data storage means for storing icon data for displaying semantically related icons in association with the set of co-occurrence words stored in the table storage means, and document data for storing document data Storage means, a heading keyword search means for searching a part matching the heading keyword stored in the table storage means from the document data stored in the document data storage means, and Between each set of co-occurrence words stored in the table storage means associated with the searched heading keyword and a word appearing around the position of the heading keyword in the searched document data of the heading keyword A similarity calculating means for calculating the similarity with: By referring to the similarity, the headline keyword in the document data is searched for the icon data stored in the icon data storage unit associated with the co-occurrence word set having the highest similarity. An icon adding unit that adds the icon in association with the position, and an icon display unit that uses the icon data added by the icon adding unit to display an icon in association with the document data.

【００１３】これにより、テキスト（文書データ）中に
同じ見出しキーワードが存在しても、その見出しキーワ
ードが存在した周辺のテキスト中の他の単語の出現状況
により、その意味を表象するアイコンとして異なったア
イコンが付加表示されるため、このアイコンによりテキ
ストの内容をより的確に柔軟性良く呈示することが可能
になる。Thus, even if the same headline keyword exists in the text (document data), the icon may have a different meaning depending on the appearance of other words in the surrounding text where the headline keyword exists. Since the icon is additionally displayed, the content of the text can be presented more accurately and flexibly by the icon.

【００１４】また、前記アイコン表示手段は、前記アイ
コン付加手段によって付加された、複数の文書データ分
のアイコンを一覧表示させるものであって、前記アイコ
ン表示手段によって一覧表示されたアイコンに対する指
示を入力し、この指示されたアイコンに対応する文書デ
ータの内容を表示する文書選択表示手段を具備すること
を特徴とする。The icon display means displays a list of icons for a plurality of document data added by the icon addition means, and inputs an instruction for the icons listed by the icon display means. Further, a document selection display means for displaying the contents of the document data corresponding to the designated icon is provided.

【００１５】これにより、文書検索システムあるいはメ
ール表示システム等において、表示対象となる候補文書
が大量に存在する場合に、文書データの内容に基づいて
付加されたアイコンによる一覧表示を行なうことで、文
書の本体にアクセスしなくても、その内容の概略を把握
することが可能になる。さらに、アイコンにより内容の
概略を把握すると共に、そのアイコンに対する指示によ
って文書の本体の必要な部分のみにアクセスすることが
可能になる。In this way, when a large number of candidate documents to be displayed exist in a document search system or a mail display system or the like, a list display is performed using icons added based on the contents of the document data. It is possible to grasp the outline of the contents without accessing the main body. Furthermore, the outline of the contents can be grasped by the icon, and only the necessary portion of the main body of the document can be accessed by the instruction to the icon.

【００１６】また、前記アイコン表示手段は、一つのア
イコンに対応づけられている文の数に応じて、表示する
アイコンの形状を変化させることを特徴とする。これに
より、一つの文書において、例えば複数の文において出
現する単語は一般に重要なキーワードであるため、文の
数に応じてアイコンの形状を変化させることにより、ユ
ーザは本文を見なくても一覧表示から重要なキーワード
を判断することができる。また、各アイコンが元の文書
データと対応づけられているため、これを指定すること
により重要な文書の内容のみを拾い読みすることが可能
になる。Further, the icon display means changes the shape of the icon to be displayed in accordance with the number of sentences associated with one icon. Thus, in a single document, for example, words appearing in a plurality of sentences are generally important keywords, so by changing the icon shape according to the number of sentences, the user can display a list without looking at the text. Can determine important keywords. In addition, since each icon is associated with the original document data, it is possible to browse only important document contents by designating the icon.

【００１７】また、前記アイコン表示手段は、一覧表示
させたアイコンのそれぞれに関連づけて、各アイコンに
対応する見出しキーワードを表示することを特徴とす
る。これにより、アイコンのみを一覧表示する場合と比
較して、見出しキーワードを添えて表示することによ
り、意味が限定されるため、文書本体の内容の概略をよ
り的確に把握することがより可能になる。また、一覧表
示として文書中の見出しキーワードを含む箇所を表示す
る場合と比較しても、少い表示面積でより凝縮した内容
の表示が可能であるため、効率良く文書本体の内容の概
略を把握することができるようになる。Further, the icon display means displays a headline keyword corresponding to each icon in association with each of the icons displayed in a list. As a result, compared with the case where only icons are displayed in a list, the meaning is limited by displaying with the headline keyword, so that it is possible to more accurately grasp the outline of the contents of the document body. . In addition, compared to the case where a list including a headline keyword in a document is displayed as a list display, it is possible to display more condensed contents with a small display area, so that the outline of the contents of the document body can be grasped efficiently. Will be able to

【００１８】また、前記アイコンデータ格納手段に格納
されたアイコンデータは、複数の異なる見出しキーワー
ドに対応する共起単語の組と対応づけられることを特徴
とする。Further, the icon data stored in the icon data storage means is associated with a set of co-occurring words corresponding to a plurality of different headline keywords.

【００１９】これにより、一般にイメージデータは多義
の意味内容を表象することが可能であり、複数の共起単
語の組と対応づけることで、アイコンデータ格納手段に
格納されるアイコンイメージデータのデータ量を低減さ
せることが可能になる。Thus, generally, image data can represent polysemic meanings, and by associating with plural sets of co-occurring words, the data amount of icon image data stored in the icon data storage means Can be reduced.

【００２０】また、前記アイコン付加手段によって、一
つの文書データに付加された複数のアイコンデータのう
ち、同一のアイコンデータであり、かつ付加元の見出し
キーワードが共通するアイコンデータをまとめるアイコ
ンソート手段を具備し、前記アイコン表示手段は、前記
アイコンソート手段によってまとめられたアイコンデー
タを用いてアイコンを表示することを特徴とする。An icon sorting means for collecting icon data which is the same icon data among a plurality of icon data added to one document data by the icon adding means and which has a common heading keyword of the adding source. Wherein the icon display means displays icons using the icon data compiled by the icon sort means.

【００２１】これにより、同一のアイコンデータ（イメ
ージ）であっても付加元の見出しキーワードが異なれ
ば、表象する意味内容が異なる場合があるが、共通する
アイコンデータをまとめて一覧表示することにより、ア
イコンにより文書データの内容を的確に表現させること
が可能になる。With this, even if the same icon data (image) is used, if the addition-source heading keyword is different, the meaning to be represented may be different, but the common icon data is displayed in a list. The icon allows the contents of the document data to be accurately represented.

【００２２】また、前記アイコン表示手段は、文書の全
体を表わす枠を表示し、この枠内に前記文書の文書デー
タに対して付加されたアイコンを表示し、前記文書選択
表示手段は、前記アイコン表示手段によって表示された
文書全体を表わす枠の内部で、かつアイコン以外の部分
が指示された場合に文書データの全体の内容を表示する
ことを特徴とする。これにより、ひとつの文書に多数の
アイコンが付加されており、その選択が困難な場合によ
り効果的に文書の内容を知ることが可能になる。The icon display means displays a frame representing the entire document, and displays an icon added to the document data of the document in the frame. The content of the entire document data is displayed when a portion other than the icon is designated inside the frame representing the entire document displayed by the display means. As a result, a large number of icons are added to one document, and the content of the document can be more effectively known when the selection is difficult.

【００２３】[0023]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態について説明する。図１は本実施形態に係わる
文書検索装置の構成を示すブロック図である。文書検索
装置は、例えば磁気ディスク等の記録媒体に記録された
プログラムを読み込み、このプログラムによって動作が
制御される一般的なコンピュータによって実現され、メ
ール文書表示機能のひとつとして実装されているものと
する。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the document search device according to the present embodiment. The document search device reads a program recorded on a recording medium such as a magnetic disk, and is realized by a general computer whose operation is controlled by the program, and is implemented as one of mail document display functions. .

【００２４】本実施形態における文書検索装置は、文書
中に含まれる見出しキーワードに該当する部分と、その
周辺の単語の出現状況に基づいて、その文書の意味内容
を表すアイコンによって文書データの内容を呈示するも
のである。The document search apparatus according to the present embodiment, based on a part corresponding to a headline keyword included in a document and the appearance of words around the part, extracts the contents of the document data using an icon representing the meaning of the document. It is presented.

【００２５】まず、本実施形態における文書検索装置の
ハードウェア構成について説明する。図１に示すように
文書検索装置は、制御装置１０、入力装置１１、表示装
置１２、外部記憶装置１３、メモリ１４、通信装置１５
によって構成されており、バスを介して相互に結合され
ている。First, the hardware configuration of the document search device according to the present embodiment will be described. As shown in FIG. 1, the document search device includes a control device 10, an input device 11, a display device 12, an external storage device 13, a memory 14, and a communication device 15.
And are mutually connected via a bus.

【００２６】制御装置１０は、ＣＰＵから構成されるも
ので、各種のハードウェア装置とバスを介して接続され
ており、各装置の制御、装置間のデータの転送などの処
理を行なうものである。The control device 10 is composed of a CPU, is connected to various hardware devices via a bus, and performs processes such as control of each device and transfer of data between the devices. .

【００２７】入力装置１１は、キーボードやマウス等の
ポインティングデバイスからなり、本装置に対する各種
のデータ及び命令を入力して制御装置１０に出力する。
表示装置１２は、カラー液晶ディスプレイ及びそのコン
トローラから構成されており、制御装置１０の制御のも
とで文書一覧の表示や文書本体の表示等を行なう。The input device 11 is composed of a pointing device such as a keyboard and a mouse. The input device 11 inputs various data and commands for the device and outputs the data and commands to the control device 10.
The display device 12 is composed of a color liquid crystal display and a controller thereof, and displays a list of documents and a document body under the control of the control device 10.

【００２８】外部記憶装置１３は、ハードディスク等の
記憶媒体及びコントローラからなり、表示対象となる文
書データや各文書に含まれる単語に関する情報（単語情
報データ）などが格納される。外部記憶装置１３内に
は、文書データ蓄積部１３ａ、意味アイコンテーブル格
納部１３ｂ、及びアイコン本体データ格納部１３ｃとが
設けられる。The external storage device 13 includes a storage medium such as a hard disk and a controller, and stores document data to be displayed and information (word information data) on words contained in each document. The external storage device 13 includes a document data storage unit 13a, a meaning icon table storage unit 13b, and an icon body data storage unit 13c.

【００２９】文書データ蓄積部１３ａには、例えば、通
信装置１５を介して取り込まれた、外部のコンピュータ
から発信されたメール文書等の文書データが蓄積され
る。文書データ蓄積部１３ａに蓄積されている文書デー
タ及び単語情報データの格納形式については後述する
（図３）。The document data storage section 13a stores, for example, document data such as a mail document transmitted from an external computer, which is captured via the communication device 15. The storage format of the document data and word information data stored in the document data storage unit 13a will be described later (FIG. 3).

【００３０】意味アイコンテーブル格納部１３ｂには、
文書の意味内容を表すアイコンを決定するために文書中
において注目すべき部分を示す複数の見出しキーワード
と、各見出しキーワードごとの意味的に共起する少なく
とも一つの共起単語と、共起単語に対応するアイコンを
示すアイコンＩＤ番号の組が登録されている意味アイコ
ンテーブルが格納されている。意味アイコンテーブル格
納部１３ｂに蓄積されている意味アイコンテーブルの格
納形式については後述する（図５）。In the meaning icon table storage unit 13b,
In order to determine an icon representing the semantic content of the document, a plurality of headline keywords indicating a notable part in the document, at least one co-occurrence word semantically co-occurring for each headline keyword, and A meaning icon table in which a set of icon ID numbers indicating corresponding icons is registered is stored. The storage format of the meaning icon table stored in the meaning icon table storage unit 13b will be described later (FIG. 5).

【００３１】アイコン本体データ格納部１３ｃには、ア
イコンを示すアイコンＩＤ番号に対応付けられて、各ア
イコンのイメージデータが縦方向のドット数情報及び横
方向のドット数情報とともに格納されている。ここで、
あるアイコンＩＤ番号に対応するアイコンデータが、意
味アイコンテーブルに登録された少なくとも一つの共起
単語の組から参照されることが可能となっており、イメ
ージデータの格納領域を節約することができる。The icon body data storage section 13c stores image data of each icon together with dot number information in the vertical direction and dot number information in the horizontal direction in association with the icon ID number indicating the icon. here,
Icon data corresponding to a certain icon ID number can be referred to from at least one set of co-occurrence words registered in the meaning icon table, and a storage area for image data can be saved.

【００３２】メモリ１４は、ダイナミックＲＡＭからな
り、制御装置１０が各種制御や処理を実行するためのプ
ログラムを格納するプログラム部と、処理の際に必要な
データを格納するためのバッファ部からなっている。プ
ログラム部とバッファ部の詳細な構成については後述す
る（図２）。The memory 14 comprises a dynamic RAM, and comprises a program section for storing programs for the control device 10 to execute various controls and processes, and a buffer section for storing data necessary for the processes. I have. Detailed configurations of the program section and the buffer section will be described later (FIG. 2).

【００３３】通信装置１５は、制御装置１０の制御のも
とで、通信回線を介して外部のコンピュータ等とデータ
のやり取りを行なう装置であり、例えばＬＡＮ回線とＬ
ＡＮコントローラ等から構成される。The communication device 15 is a device for exchanging data with an external computer or the like via a communication line under the control of the control device 10.
It is composed of an AN controller and the like.

【００３４】次に、図２を参照しながらメモリ１４に設
けられたプログラム部２０とバッファ部２２の詳細につ
いて説明する。図２に示すように、プログラム部２０
は、メイン処理部２０ａの他、メイン処理部２０ａによ
って呼び出されるサブルーチンとして、文アイコン付加
処理部２０ｂ、見出しキーワード検索部２０ｃ、類似度
計算部２０ｄ、アイコンソート部２０ｅ、文書一覧アイ
コン表示部２０ｆ、文書選択表示部２０ｇが設けられて
いる。Next, the program unit 20 and the buffer unit 22 provided in the memory 14 will be described in detail with reference to FIG. As shown in FIG.
Are sentence icon addition processing section 20b, index keyword search section 20c, similarity calculation section 20d, icon sort section 20e, document list icon display section 20f, as subroutines called by main processing section 20a in addition to main processing section 20a. A document selection display section 20g is provided.

【００３５】メイン処理部２０ａは、処理全体の制御を
司るもので、サブルーチンとして各部を呼び出して実行
させる。また、メイン処理部２０ａは、処理対象とする
文書データから例えば日本語解析処理によって単語を抽
出する単語抽出機能を持つようにすることもできる。The main processing section 20a controls the entire processing, and calls and executes each section as a subroutine. In addition, the main processing unit 20a may have a word extracting function of extracting words from document data to be processed by, for example, Japanese analysis processing.

【００３６】文アイコン付加処理部２０ｂは、見出しキ
ーワード検索部２０ｃを起動させて、処理対象とする文
中の各単語について、意味アイコンテーブルに該当する
見出しキーワードが登録されているか判別させる。ま
た、文アイコン付加処理部２０ｂは、類似度計算部２０
ｄを起動させて、見出しキーワードに対応して登録され
た複数の共起単語の組のそれぞれについて、見出しキー
ワードと同じ単語を含む文との類似度を求める。さらに
文アイコン付加処理部２０ｂは、最も類似度が高い共起
単語の組に対応付けられたアイコンと、付加元の見出し
キーワードとを文に対応づけて格納する。The sentence icon addition processing unit 20b activates the headline keyword search unit 20c to determine whether a headline keyword corresponding to the word in the sentence to be processed is registered in the meaning icon table. The sentence icon addition processing unit 20b includes a similarity calculation unit 20.
By activating d, the similarity between each set of a plurality of co-occurring words registered corresponding to the headline keyword and a sentence including the same word as the headline keyword is obtained. Furthermore, the sentence icon addition processing unit 20b stores the icon associated with the set of co-occurrence words having the highest similarity and the headline keyword of the addition source in association with the sentence.

【００３７】見出しキーワード検索部２０ｃは、文アイ
コン付加処理部２０ｂによって起動されるもので、処理
対象とする文書の文ごとに、文中に含まれる単語をもと
に、意味アイコンテーブルに格納された見出しキーワー
ドと一致する部分を検索する。The headline keyword search unit 20c is started by the sentence icon addition processing unit 20b, and is stored in the meaning icon table for each sentence of the document to be processed based on the words included in the sentence. Search for a part that matches the headline keyword.

【００３８】類似度計算部２０ｄは、見出しキーワード
検索部２０ｃによって検索された見出しキーワードに対
応づけられた共起単語の組のそれぞれと、見出しキーワ
ードが検索された文中の位置の周辺に出現する単語との
間で類似度を計算する。The similarity calculation unit 20d calculates each of the co-occurrence word sets associated with the headline keyword searched by the headline keyword search unit 20c and the word appearing around the position in the sentence where the headline keyword was searched. Calculate the similarity between.

【００３９】アイコンソート部２０ｅは、文アイコン付
加処理部２０ｂによって、一つの文書に付加された複数
のアイコンのうち、同一のアイコンデータであり、かつ
付加元の見出しキーワードが共通するアイコンをまとめ
てアイコン一覧管理テーブルを作成する。The icon sorting unit 20e sorts, among the plurality of icons added to one document by the sentence icon adding processing unit 20b, icons having the same icon data and having a common headline keyword of the addition source. To create an icon list management table.

【００４０】文書一覧アイコン表示部２０ｆは、文アイ
コン付加処理部２０ｂによって文書に対して付加され、
アイコンソート部２０ｅによってまとめられたアイコン
を用いて、文書に対応づけてアイコンによる文書一覧を
表示する。The document list icon display section 20f is added to the document by the sentence icon addition processing section 20b.
Using the icons collected by the icon sorting unit 20e, a document list of icons is displayed in association with the documents.

【００４１】文書選択表示部２０ｇは、文書一覧アイコ
ン表示部２０ｆによって一覧表示されたアイコンに対す
る指示を入力し、この指示されたアイコンに対応する文
書（あるいは文）の内容を表示する。The document selection display section 20g inputs an instruction for the icons listed by the document list icon display section 20f, and displays the contents of the document (or sentence) corresponding to the specified icon.

【００４２】また、バッファ部２２は、文単語情報格納
バッファ２２ａ、アイコンテーブル格納バッファ２２
ｂ、見出しキーワード数格納バッファ２２ｃ、文アイコ
ン対応関係格納バッファ２２ｄ、文アイコン対応関係格
納バッファ２２ｅ、アイコン一覧管理バッファ２２ｆ、
アイコンイメージ格納バッファ２２ｇ、付加元キーワー
ド格納バッファ２２ｈ、作業変数格納エリア２２ｉが設
けられている。The buffer unit 22 includes a sentence word information storage buffer 22a and an icon table storage buffer 22.
b, headword keyword number storage buffer 22c, sentence icon correspondence storage buffer 22d, sentence icon correspondence storage buffer 22e, icon list management buffer 22f,
An icon image storage buffer 22g, an addition source keyword storage buffer 22h, and a work variable storage area 22i are provided.

【００４３】文単語情報格納バッファ２２ａは、図４に
示す構造の単語情報データを格納するためのバッファで
ある。内部は構造体となっており、文の総数が変数ｎＳ
ｅｎｔ、ｉ番目の文のｊ番目の単語が配列変数ｗｏｒｄ
［ｉ］［ｊ］、ｉ番目の文に出現した単語の総数が配列
変数ｎＷｏｒｄ［ｉ］に格納される。The sentence word information storage buffer 22a is a buffer for storing word information data having the structure shown in FIG. The inside is a structure, and the total number of sentences is a variable nS
ent, the j-th word of the i-th sentence is an array variable word
[I] [j], the total number of words appearing in the i-th sentence are stored in the array variable nWord [i].

【００４４】アイコンテーブル格納バッファ２２ｂは、
図５に示す構造の意味アイコンテーブルのひとつの見出
しキーワードに対応した要素を格納するためのバッファ
である。内部は構造体となっており、見出しキーワード
が配列変数ｋｅｙＷｏｒｄ、これに対応づけられた共起
単語のｊ番目の組の中のｋ番目の単語が配列変数ｃｏｗ
ｏｒｄ［ｊ］［ｋ］に、ｊ番目の組の共起単語の総数が
配列変数ｎＣｏｗｏｒｄ［ｊ］、ｊ番目の共起単語の組
に対応づけられたアイコンのＩＤ番号が配列変数ｉｄＳ
ｉｃｏｎ［ｊ］、共起単語の組の組数がｎＳｅｔにそれ
ぞれ格納される。The icon table storage buffer 22b
It is a buffer for storing an element corresponding to one heading keyword of the meaning icon table having the structure shown in FIG. The inside is a structure, the heading keyword is the array variable keyWord, and the k-th word in the j-th set of co-occurring words associated with this is the array variable cow.
ord [j] [k] is the array variable nCoword [j] where the total number of co-occurring words in the j-th set is the array variable nCoword [j], and the array variable idS is the ID number of the icon associated with the j-th co-occurring word set.
icon [j] and the number of co-occurrence word pairs are stored in nSet.

【００４５】見出しキーワード数格納バッファ２２ｃ
は、見出しキーワードの総数を格納するためのバッファ
であり、外部記憶装置１３の意味アイコンテーブル格納
部１３ｂに格納された意味アイコンテーブル中の見出し
キーワード総数が変数ｎＫｅｙＷｏｒｄに格納される。Headline keyword number storage buffer 22c
Is a buffer for storing the total number of heading keywords. The total number of heading keywords in the meaning icon table stored in the meaning icon table storage unit 13b of the external storage device 13 is stored in a variable nKeyWord.

【００４６】文アイコン対応関係格納バッファ２２ｄ
は、文書中の各文に付加された一般に複数となるアイコ
ンを示すアイコンＩＤ番号を格納するためのバッファで
あり、配列変数からなる。ｉ番目の文のｋ番目のアイコ
ンのＩＤ番号が配列変数ｉｄＤｉｃｏｎ［ｉ］［ｋ］に
格納される。Sentence icon correspondence storage buffer 22d
Is a buffer for storing icon ID numbers indicating a plurality of icons generally added to each sentence in the document, and is composed of array variables. The ID number of the k-th icon of the i-th sentence is stored in the array variable idDicon [i] [k].

【００４７】文アイコン数格納バッファ２２ｅは、文書
中の各文に付加されたアイコンの数を格納するためのバ
ッファであり、配列変数からなる。ｉ番目の文に付加さ
れたアイコンの数が配列変数ｎＩｃｏｎ［ｉ］に格納さ
れる。The sentence icon number storage buffer 22e is a buffer for storing the number of icons added to each sentence in the document, and is composed of array variables. The number of icons added to the i-th sentence is stored in an array variable nIcon [i].

【００４８】アイコン一覧管理バッファ２２ｆは、図６
に示す構造のアイコン一覧管理テーブルを格納するため
のバッファである。アイコン一覧管理テーブルには、各
文書ごとに、各アイコンに対応付けられた各文の文番号
及び各アイコンに対応付けられた文の総数が格納され
る。内部は構造体となっており、ｉ番目の文書に含まれ
ているアイコンの総数が変数ｎＩｃｏｎＤｏｃ［ｉ］
に、ｉ番目の文書のｊ番目のアイコンのＩＤ番号が配列
変数ｉｄＩｃｏｎＤｏｃ［ｉ］［ｊ］に、ｉ番目の文書
のｊ番目のアイコンに対応付けられた文の数が配列変数
ｎＩｃｏｎＳｅｎｔ［ｉ］［ｊ］に、ｉ番目の文書のｊ
番目のアイコンに対応付けられたｋ番目の文のその文書
中での文番号がｉｃｏｎＳｅｎｔ［ｉ］［ｊ］［ｋ］
に、ｉ番目の文書のｊ番目のアイコンに対応付けられた
付加元の見出しキーワードが配列変数ｂｕｆＫｅｙ
［ｉ］［ｊ］に格納される。The icon list management buffer 22f is configured as shown in FIG.
Is a buffer for storing an icon list management table having the structure shown in FIG. The icon list management table stores, for each document, the sentence number of each sentence associated with each icon and the total number of sentences associated with each icon. The inside is a structure, and the total number of icons included in the i-th document is a variable nIconDoc [i].
The ID number of the j-th icon of the i-th document is the array variable idIconDoc [i] [j], and the number of sentences associated with the j-th icon of the i-th document is the array variable nIconSent [i]. [J] contains j of the i-th document
The sentence number in the document of the k-th sentence associated with the th icon is iconSent [i] [j] [k]
In addition, the headline keyword of the addition source associated with the j-th icon of the i-th document is an array variable bufKey.
[I] is stored in [j].

【００４９】アイコンイメージ格納バッファ２２ｇは、
アイコン本体データ格納部１３ｃに格納されているアイ
コンのイメージデータ及び縦方向のドット数情報、横方
向のドット数情報を格納するためのバッファである。The icon image storage buffer 22g is
It is a buffer for storing icon image data, vertical dot number information, and horizontal dot number information stored in the icon body data storage unit 13c.

【００５０】付加元キーワード格納バッファ２２ｈは、
文中のキーワードにアイコンを付加する際に、アイコン
に対応づけて付加元のキーワードを格納しておくための
バッファである。配列変数からなりｉ番目のｋ番目のア
イコンの付加元キーワードがｂｕｆＫｅｙ［ｉ］［ｋ］
に格納される。The addition source keyword storage buffer 22h is
When adding an icon to a keyword in a sentence, it is a buffer for storing an addition source keyword in association with the icon. The addition source keyword of the i-th k-th icon consisting of array variables is bufKey [i] [k]
Is stored in

【００５１】バッファ部２２にはその他、各作業用変数
を格納するために作業変数格納エリア２２ｉが確保され
ている。作業変数格納エリア２２ｉには、後述処理で用
いられる文書番号カウント用変数ｉＤｏｃ、文番号カウ
ント用変数ｉＳｅｎｔ、共起単語の組の番号を表わす変
数ｊＳｅｔ、単語の番号を表わす変数ｉＷｏｒｄ、見出
しキーワードを表わす変数ｉＫｅｙ、比較元の文のアイ
コン番号を表わす変数ｉＳｒｃＩｃｏｎ、比較先のアイ
コン番号を表わす変数ｉＤｓｔＩｏｃｏｎ、アイコン番
号カウント用変数ｉＩｃｏｎなど、各変数用の領域が確
保される。The buffer section 22 also has a work variable storage area 22i for storing each work variable. In the work variable storage area 22i, a document number counting variable iDoc, a sentence number counting variable iSent, a variable jSet indicating a number of a co-occurring word set, a variable iWord indicating a word number, and a heading keyword used in processing described later are stored. An area for each variable is secured, such as a variable iKey to represent, a variable iSrcIcon to represent an icon number of a source statement, a variable iDstIcon to represent a destination icon number, and a variable iIcon for counting icon numbers.

【００５２】次に、外部記憶装置１３に格納される文書
データ及び単語情報データの格納形式について図３を参
照しながら説明する。各文書データは、ヘッダ部とテキ
ストデータ部からなっており、ヘッダ部にはタイトルデ
ータ、作成日時データ、作成者データ、ＩＤ番号などの
文書の属性を表すデータが含まれている。文書データ
は、ＩＤ番号順（０，１，２，…，Ｎ−１）に格納され
ている。また、各文書データには、それぞれ単語情報デ
ータが対応付けて格納されている。単語情報データに
は、対応する文書を構成する文の総数を示す総数情報の
他、各文ごとにそのテキストデータに対して形態素解析
を行なって抽出された単語、及び抽出された単語の総数
が格納されている。Next, the storage format of document data and word information data stored in the external storage device 13 will be described with reference to FIG. Each document data includes a header portion and a text data portion, and the header portion includes data representing the attributes of the document such as title data, creation date / time data, creator data, and ID number. The document data is stored in the order of ID numbers (0, 1, 2, ..., N-1). Further, each piece of document data is stored in association with word information data. The word information data includes, in addition to the total number information indicating the total number of sentences constituting the corresponding document, words extracted by performing morphological analysis on the text data for each sentence, and the total number of extracted words. Is stored.

【００５３】図４には文書に対応する単語情報データの
一例を示している。図４は、一つの文書に対応する単語
情報データのデータ形式を示している。図４に示すよう
に、単語情報データには、文書中に含まれる文の総数
と、各文ごとの単語数と、テキスト中の各単語が格納さ
れている。FIG. 4 shows an example of word information data corresponding to a document. FIG. 4 shows the data format of word information data corresponding to one document. As shown in FIG. 4, the word information data stores the total number of sentences included in the document, the number of words for each sentence, and each word in the text.

【００５４】単語情報データは、対応する文書データと
共に提供されるものとしても良いし、文書データに対し
て日本語解析処理によって単語を抽出する単語抽出機能
によって取得されるものであっても良い。The word information data may be provided together with the corresponding document data, or may be obtained by a word extracting function of extracting words from the document data by Japanese language analysis processing.

【００５５】図５には意味アイコンテーブルのデータ格
納形式の一例を示している。図５に示すように、意味ア
イコンテーブルには、見出しキーワード毎に、見出しキ
ーワードと意味的に共起する少なくとも一つの共起単語
の組が格納されている。共起単語の組には、少なくとも
一つの共起単語が含まれる。また、各共起単語の組に対
しては、組中に含まれる共起単語数、アイコンを示すア
イコンＩＤ番号が対応付けて格納されている。また、見
出しキーワード毎に、アイコンＩＤ番号と対応づけられ
た共起単語の組の組数が格納されている。さらに、意味
アイコンテーブルには、見出しキーワードの総数も格納
されているものとする（図示せず）。なお、意味アイコ
ンテーブルに登録される共起単語としては、対応する見
出しキーワードと異なる単語を格納するようにしてある
ものとする。FIG. 5 shows an example of the data storage format of the meaning icon table. As shown in FIG. 5, the meaning icon table stores, for each heading keyword, at least one set of co-occurring words that co-occur with the heading keyword. The set of co-occurring words includes at least one co-occurring word. Further, the number of co-occurring words included in the set and the icon ID number indicating the icon are stored in association with each set of co-occurring words. In addition, the number of sets of co-occurrence words associated with the icon ID number is stored for each heading keyword. Further, the total number of headline keywords is also stored in the meaning icon table (not shown). Note that words that are different from the corresponding headline keywords are stored as co-occurrence words registered in the meaning icon table.

【００５６】図６にはアイコン一覧管理テーブルのデー
タ格納形式の一例を示している。図６に示すように、ア
イコン一覧管理テーブルには、各文書ごとに、文書に対
して付加されたアイコンの総数が格納されている。ま
た、付加された各アイコン毎に、アイコンを示すアイコ
ンＩＤ番号と、このアイコンＩＤ番号が示すアイコンに
対応付けられた文の総数を示す文数と、各文の文番号、
及びアイコンに対応付けられた付加元の見出しキーワー
ドとが対応付けて格納される。FIG. 6 shows an example of the data storage format of the icon list management table. As shown in FIG. 6, the icon list management table stores, for each document, the total number of icons added to the document. Also, for each of the added icons, an icon ID number indicating the icon, a number of sentences indicating the total number of sentences associated with the icon indicated by the icon ID number, a sentence number of each sentence,
And the headline keyword of the addition source associated with the icon are stored in association with each other.

【００５７】次に、本実施形態における文書検索装置の
全体の動作について、図７に示すフローチャートを参照
しながら説明する。本実施形態における文書検索装置
は、意味アイコンテーブルに登録されている見出しキー
ワードを処理対象とする文書中から検索して、この文書
中における注目すべき部分とする。文書検索装置は、単
純に文書中に存在した見出しキーワードに応じてアイコ
ンを決定するのではなく、文書中に存在した見出しキー
ワードの周辺のテキスト中の他の単語の出現状況に応じ
てアイコンを決定する。すなわち、文書検索装置は、文
書中から検索された見出しキーワードと共起関係がある
単語として意味アイコンテーブルに登録されている単語
が、文書中の見出しキーワードの近傍に存在する場合
に、該当する共起単語に対応するアイコン（アイコンＩ
Ｄ番号）を文の意味内容を表象するアイコンとして決定
するものである。Next, the overall operation of the document search apparatus according to this embodiment will be described with reference to the flowchart shown in FIG. The document search device according to the present embodiment searches for a headline keyword registered in the meaning icon table from a document to be processed and regards the keyword as a noteworthy part in this document. The document search device does not simply determine an icon according to a heading keyword existing in a document, but determines an icon according to an appearance state of another word in text around the heading keyword existing in the document. I do. That is, when a word registered in the meaning icon table as a word having a co-occurrence relationship with a headline keyword searched from a document exists near the headline keyword in the document, the document search apparatus determines that Icon corresponding to the word (icon I
D number) is determined as an icon representing the meaning of the sentence.

【００５８】ここでは、予め外部記憶装置１３の文書デ
ータ蓄積部１３ａ中には、図３に示す形式の文書データ
及び単語情報データが、ＩＤ番号が０からＮ−１のもの
までＮ個格納されているものとする。Here, N document data and word information data in the format shown in FIG. 3 are stored in advance in the document data storage unit 13a of the external storage device 13 from ID numbers 0 to N-1. It is assumed that

【００５９】処理全体の制御は、メモリ１４のプログラ
ム部２０に格納されたメイン処理部２０ａが担当する。
まず、メイン処理部２０ａは、文書番号カウント用変数
ｉＤｏｃに０を格納する（ステップＡ１）。The overall processing is controlled by the main processing unit 20a stored in the program unit 20 of the memory 14.
First, the main processing unit 20a stores 0 in the document number counting variable iDoc (step A1).

【００６０】続いてメイン処理部２０ａは、文書データ
蓄積部１３ａ中のｉＤｏｃ番目の文書の単語情報データ
を、文単語情報格納バッファ２２ａに格納する（ステッ
プＡ２）。このとき文単語情報格納バッファ２２ａ中の
各変数、すなわち文の総数を表わす変数ｎＳｅｎｔ、単
語を示す配列変数ｗｏｒｄ［］［］、単語の総数を示す
配列変数ｎＷｏｒｄ［］にデータが格納される。Subsequently, the main processing section 20a stores the word information data of the iDoc-th document in the document data storage section 13a in the sentence word information storage buffer 22a (step A2). At this time, data is stored in each variable in the sentence word information storage buffer 22a, that is, a variable nSent indicating the total number of sentences, an array variable word [] [] indicating the word, and an array variable nWord [] indicating the total number of words.

【００６１】続いてメイン処理部２０ａは、文番号カウ
ント用変数ｉＳｅｎｔに０を格納する（ステップＡ
３）。このあと、メイン処理部２０ａは、文アイコン付
加処理部２０ｂを起動させ、ｉＳｅｎｔ番目の文にアイ
コンの付加を行なう（ステップＡ４）。文アイコン付加
処理部２０ｂによるアイコン付加の処理の詳細について
は後述する（図８、図９、図１０）。Subsequently, the main processing unit 20a stores 0 in the sentence number counting variable iSent (step A).
3). Thereafter, the main processing unit 20a activates the sentence icon addition processing unit 20b, and adds an icon to the iSent-th sentence (step A4). Details of the icon addition processing by the sentence icon addition processing unit 20b will be described later (FIGS. 8, 9, and 10).

【００６２】結果は文アイコン対応関係格納バッファ２
２ｄに格納される。このあと文番号カウント用変数ｉＳ
ｅｎｔの値をインクリメントし（ステップＡ５）、文の
総数を表わす変数ｎＳｅｎｔの値以下か比較して、全て
の文についての処理が完了したかを判別する（ステップ
Ａ６）。The result is a sentence icon correspondence relation storage buffer 2
2d. After this, the statement number counting variable iS
The value of ent is incremented (step A5), and it is determined whether or not the processing has been completed for all the sentences by comparing with the value of the variable nSent representing the total number of sentences (step A6).

【００６３】比較が成功すれば、全ての文についての処
理が完了していないので、ステップＡ４からの処理を繰
り返す。比較が成功しなかった場合には、メイン処理部
２０ａは、アイコンソート部２０ｅを起動させる（ステ
ップＡ７）。ここでは、ステップＡ４において各文毎に
付加されたアイコンを整理し、その結果がアイコン一覧
管理バッファ２２ｆにアイコン一覧管理テーブルとして
格納される。アイコンソート部２０ｅによるアイコンソ
ートの処理の詳細については後述する（図１１，図１
２）。If the comparison is successful, the processing for all the sentences has not been completed, and the processing from step A4 is repeated. If the comparison is not successful, the main processing unit 20a activates the icon sorting unit 20e (Step A7). Here, the icons added to each sentence in step A4 are arranged, and the result is stored in the icon list management buffer 22f as an icon list management table. The details of the icon sorting process by the icon sorting unit 20e will be described later (see FIGS. 11 and 1).
2).

【００６４】続くステップ６ｈでは、メイン処理部２０
ａは、文書番号カウント用変数ｉＤｏｃの値をインクリ
メントし、文書数Ｎ以下か比較して、全ての文書につい
ての処理が完了したかを判別する（ステップＡ９）。In the following step 6h, the main processing unit 20
“a” increments the value of the document number counting variable iDoc and compares it with the number of documents N or less to determine whether or not the processing has been completed for all documents (step A9).

【００６５】比較が成功すれば、全ての文書についての
処理が完了していないので、ステップＡ２からの処理を
繰り返す。比較が成功しなかった場合には、メイン処理
部２０ａは、文書一覧アイコン表示部２０ｆを起動させ
て、前述したアイコン付加処理及びアイコンソート処理
によって作成したアイコン一覧管理バッファ２２ｆの内
容を参照して文書一覧を表示させる（ステップＡ１
０）。If the comparison is successful, the processing for all documents has not been completed, and the processing from step A2 is repeated. If the comparison is not successful, the main processing unit 20a activates the document list icon display unit 20f and refers to the contents of the icon list management buffer 22f created by the icon addition process and the icon sort process described above. Display a document list (step A1
0).

【００６６】文書一覧アイコン表示部２０ｆによって表
示される文書一覧では、例えば文書毎に矩形領域が設け
られ、それぞれの矩形領域において文書の意味内容を表
象するアイコンが表示される。また、文書一覧において
表示されるアイコンは、対応する文の数に応じてアイコ
ンの形状、例えばサイズが変化されて表示される。文書
一覧アイコン表示部２０ｆの文書一覧の表示の詳細につ
いては後述する（図１３、図１４）。In the document list displayed by the document list icon display section 20f, for example, a rectangular area is provided for each document, and an icon representing the meaning of the document is displayed in each rectangular area. Further, the icons displayed in the document list are displayed with the shape, for example, the size of the icons changed according to the number of corresponding sentences. Details of the display of the document list on the document list icon display section 20f will be described later (FIGS. 13 and 14).

【００６７】続いて、メイン処理部２０ａは、文書選択
表示部２０ｇを起動させる（ステップＡ１１）。文書選
択表示部２０ｇは、文書一覧アイコン表示部２０ｆによ
って表示された文書一覧に対するユーザからの指示に応
じて文書本体の内容を表示させる。Subsequently, the main processing section 20a activates the document selection display section 20g (step A11). The document selection display unit 20g displays the contents of the document body in response to an instruction from the user for the document list displayed by the document list icon display unit 20f.

【００６８】文書選択表示部２０ｇは、文書毎の矩形領
域を表わす枠内でアイコン以外の部分が指定されると元
文書全体を表示させ、アイコンの内部が指定されると、
アイコン一覧管理バッファ２２ｆに対応づけられている
文の内容のみを表示させる。文書選択表示部２０ｇの文
書（あるいは文）の表示の詳細については後述する（図
１５）。The document selection display section 20g displays the entire original document when a portion other than an icon is designated within a frame representing a rectangular area for each document, and when the inside of the icon is designated,
Only the contents of the sentence associated with the icon list management buffer 22f are displayed. The details of the display of the document (or sentence) on the document selection display section 20g will be described later (FIG. 15).

【００６９】以上が本実施形態の文書検索装置によるメ
イン処理全体の流れである。次に、ステップＡ４におけ
る、文アイコン付加処理部２０ｂによる処理の詳細につ
いて、図８に示すフローチャートを参照しながら説明す
る。ここでは、ｉＤｏｃ番目の文書に含まれる、ｉＳｅ
ｎｔ番目の文についてアイコンの付加を行なうものとし
て説明する。The above is the flow of the entire main processing by the document search device of this embodiment. Next, details of the processing by the sentence icon addition processing unit 20b in step A4 will be described with reference to the flowchart shown in FIG. Here, the iSe document contained in the iDoc-th document
Description will be made on the assumption that an icon is added to the nt-th sentence.

【００７０】まず、文アイコン付加処理部２０ｂは、ｉ
Ｓｅｎｔ番目の文に付加するアイコン数をカウントする
変数ｎＩｃｏｎ［ｉＳｅｎｔ］に初期値０を代入する
（ステップＢ１）。また、文アイコン付加処理部２０ｂ
は、文単語情報格納バッファ２２ａに格納された単語情
報データ中の単語の番号をカウントする変数ｉＷｏｒｄ
に初期値０を格納する（ステップＢ２）。First, the sentence icon addition processing section 20b sets i
An initial value 0 is substituted for a variable nIcon [iSent] for counting the number of icons added to the Sent-th sentence (step B1). The sentence icon addition processing unit 20b
Is a variable iWord that counts the number of words in the word information data stored in the sentence word information storage buffer 22a.
Is stored with the initial value 0 (step B2).

【００７１】このあと見出しキーワード検索部２０ｃが
起動される。見出しキーワード検索部２０ｃは、文単語
情報格納バッファ２２ａに格納された単語情報データ中
の変数ｉＷｏｒｄが示す単語に該当する見出しキーワー
ドが、意味アイコンテーブルに登録されているかを検索
する（ステップＢ３）。見出しキーワード検索部２０ｃ
による見出しキーワードの検索処理からは、該当する見
出しキーワードが検索できた場合には戻り値として真値
を伴って復帰される。なお、見出しキーワード検索部２
０ｃの詳細な処理については後述する（図９）。Thereafter, the heading keyword search section 20c is activated. The headline keyword search unit 20c searches whether the headline keyword corresponding to the word indicated by the variable iWord in the word information data stored in the sentence word information storage buffer 22a is registered in the meaning icon table (step B3). Heading keyword search section 20c
Is returned with a true value as a return value if the corresponding headline keyword can be searched. Note that the heading keyword search unit 2
Detailed processing of 0c will be described later (FIG. 9).

【００７２】ステップＢ３において、見出しキーワード
の検索処理が真の値を伴って復帰した場合、文アイコン
付加処理部２０ｂは、単語ｗｏｒｄ［ｉＳｅｎｔ］［ｉ
Ｗｏｒｄ］が意味アイコンテーブルの見出しキーワード
に登録されていたことになるため、ステップＢ５に制御
を移す。一方、戻り値が真値でなかった場合には、文ア
イコン付加処理部２０ｂは、ステップＢ１５に制御を移
す。In step B3, when the search processing of the headline keyword returns with a true value, the sentence icon addition processing unit 20b sets the word word [iSent] [i
Word] has been registered as a heading keyword in the meaning icon table, and thus control is transferred to step B5. On the other hand, if the return value is not a true value, the sentence icon addition processing unit 20b transfers control to step B15.

【００７３】ステップＢ５では、文アイコン付加処理部
２０ｂは、意味アイコンテーブル中で、見出しキーワー
ドに対応する共起単語の組の番号を表わす変数ｊＳｅｔ
に初期値０を代入する。At step B5, the sentence icon addition processing section 20b sets the variable jSet representing the number of the co-occurrence word set corresponding to the heading keyword in the meaning icon table.
To the initial value 0.

【００７４】このあとステップＢ６では類似度計算部２
０ｄが起動される。類似度計算部２０ｄは、変数ｊＳｅ
ｔで表わされる共起単語の組と、見出しキーワードを含
む文（テキスト）の周辺に出現する単語の組との類似度
計算を行ない、類似度を表わす変数ｓｉｍを求める（ス
テップＢ６）。なお、類似度計算部２０ｄによる詳細な
処理については後述する（図１０）。Thereafter, in step B6, the similarity calculating section 2
0d is activated. The similarity calculation unit 20d calculates the variable jSe
The similarity between the set of co-occurring words represented by t and the set of words appearing around the sentence (text) including the headline keyword is calculated, and a variable sim representing the similarity is obtained (step B6). The detailed processing by the similarity calculation unit 20d will be described later (FIG. 10).

【００７５】ここで、文アイコン付加処理部２０ｂは、
変数ｊＳｅｔの値が０ならばステップＢ９に制御を移
す。また、変数ｊＳｅｔの値が０でないならば、文アイ
コン付加処理部２０ｂは、一つの見出しキーワードに対
応する複数の共起単語の組の中で最も高い類似度値を求
めるために、類似度ｓｉｍの値が類似度ｓｉｍＭａｘの
値よりも大きな値であるかを比較する（ステップＢ
８）。類似度ｓｉｍＭａｘには、ステップＢ６における
類似度計算によって得られる、複数の共起単語の組のそ
れぞれの類似度値の中で最も高い類似度値が格納され
る。Here, the sentence icon addition processing unit 20b
If the value of the variable jSet is 0, the control shifts to step B9. If the value of the variable jSet is not 0, the sentence icon addition processing unit 20b calculates the similarity sim to obtain the highest similarity value among a plurality of co-occurrence word sets corresponding to one heading keyword. Is larger than the value of the similarity simMax (step B).
8). In the similarity simMax, the highest similarity value among the similarity values of a plurality of sets of co-occurring words, which is obtained by the similarity calculation in step B6, is stored.

【００７６】類似度ｓｉｍが類似度ｓｉｍＭａｘよりも
大きな値ならば、文アイコン付加処理部２０ｂは、類似
度ｓｉｍの値を新たなｓｉｍＭａｘの値とし（ステップ
Ｂ９）、最も高い類似度値を持つ共起単語の組の番号を
示すｊＳｅｔＭａｘに、現在の変数ｊＳｅｔの値を格納
する（ステップＢ１０）。If the similarity sim is larger than the similarity simMax, the sentence icon addition processing unit 20b sets the value of the similarity sim to a new value of simMax (step B9), and sets the value of the common icon having the highest similarity value. The current value of the variable jSet is stored in jSetMax indicating the number of the set of the vocabulary words (step B10).

【００７７】一方、ステップＢ８において条件が成立し
ない場合には、文アイコン付加処理部２０ｂは、そのま
まステップＢ１１に制御を移す。続いて文アイコン付加
処理部２０ｂは、変数ｊＳｅｔの値をインクリメントし
て、次の共起単語の組を処理対象にする（ステップＢ１
１）。On the other hand, if the condition is not satisfied in step B8, the sentence icon addition processing section 20b shifts the control to step B11 as it is. Subsequently, the sentence icon addition processing unit 20b increments the value of the variable jSet and sets the next set of co-occurring words as a processing target (step B1).
1).

【００７８】次に、文アイコン付加処理部２０ｂは、変
数ｊＳｅｔの値が、共起単語の組の組数ｎＳｅｔの値以
下であるか比較して、現在処理対象としている見出しキ
ーワードに対応する全ての共起単語の組について類似度
計算を行なったかを判別する（ステップＢ１２）。Next, the sentence icon addition processing unit 20b compares whether the value of the variable jSet is equal to or less than the value of the number of sets of co-occurrence words nSet, and determines whether the It is determined whether the similarity calculation has been performed for the set of co-occurring words (step B12).

【００７９】ここで、比較が成立すれば全ての共起単語
の組について類似度計算の処理が完了していないので、
文アイコン付加処理部２０ｂは、ステップＢ６の処理に
戻り一連の処理を繰り返す。Here, if the comparison is established, the similarity calculation processing has not been completed for all the co-occurring word sets.
The sentence icon addition processing section 20b returns to the processing of step B6 and repeats a series of processing.

【００８０】一方、比較が成立しない場合には、文アイ
コン付加処理部２０ｂは、ステップＢ１０において格納
した、共起単語の組の番号を示すｊＳｅｔＭａｘに対応
するアイコンのＩＤ番号、つまり変数ｉｄＳｉｃｏｎ
［ｊＳｅｔＭａｘ］の値を、文アイコン対応関係格納バ
ッファ２２ｄの変数ｉｄＤｉｃｏｎ［ｉＳｅｎｔ］［ｎ
Ｉｃｏｎ［ｉＳｅｎｔ］］に格納する（ステップＢ１
３）。On the other hand, if the comparison does not hold, the sentence icon addition processing unit 20b stores the ID number of the icon corresponding to jSetMax indicating the number of the co-occurrence word set stored in step B10, that is, the variable idSicon.
The value of [jSetMax] is set to the variable idDicon [iSent] [n in the sentence icon correspondence storage buffer 22d.
Icon [iSent]] (Step B1
3).

【００８１】さらに、文アイコン付加処理部２０ｂは、
ここで用いた見出しキーワード（ステップＢ３の検索処
理で得られた見出しキーワード）を、付加元キーワード
格納バッファ２２ｈの変数ｂｕｆＫｅｙ［ｉＳｅｎｔ］
［ｎＩｃｏｎ［ｉＳｅｎｔ］］にアイコンに対応づけて
格納し（ステップＢ１４）、ｉＳｅｎｔ番目の文に付加
されたアイコンの数を表わす変数ｎＩｃｏｎ［ｉＳｅｎ
ｔ］の値をインクリメントする（ステップＢ１５）。Further, the sentence icon addition processing section 20b
The headline keyword used here (the headline keyword obtained by the search processing in step B3) is stored in the variable bufKey [iSent] in the addition source keyword storage buffer 22h.
[NIcon [iSent]] is stored in association with the icon (step B14), and a variable nIcon [iSen] representing the number of icons added to the iSent-th sentence.
t] is incremented (step B15).

【００８２】このあと、文アイコン付加処理部２０ｂ
は、単語の番号をカウントする変数ｉＷｏｒｄをインク
リメントして、現在処理対象としているｉＳｅｎｔ番目
の文中の次の単語に処理対象を変更する（ステップＢ１
６）。Thereafter, the sentence icon addition processing section 20b
Increments the variable iWord that counts the number of words, and changes the processing target to the next word in the iSent-th sentence currently being processed (step B1).
6).

【００８３】文アイコン付加処理部２０ｂは、変数ｉＷ
ｏｒｄの値が、ｉＳｅｎｔ番目の文の単語の総数ｎＷｏ
ｒｄ［ｉＳｅｎｔ］の値以下か比較して、文中の全ての
単語についての処理が完了したかを判別する（ステップ
Ｂ１７）。The sentence icon addition processing unit 20b sets the variable iW
The value of ord is the total number of words in the iSent-th sentence nWo
It is determined whether or not the processing has been completed for all the words in the sentence by comparing whether it is equal to or less than the value of rd [iSent] (step B17).

【００８４】比較が成立すれば、文中の全ての単語につ
いての処理が完了していないので、ステップＢ３からの
処理を繰り返す。比較が成立しなかった場合には、文ア
イコン付加処理部２０ｂによる処理を終了し、メイン処
理に復帰する。If the comparison is established, the processing for all the words in the sentence has not been completed, and the processing from step B3 is repeated. If the comparison is not established, the processing by the sentence icon addition processing unit 20b ends, and the process returns to the main processing.

【００８５】次に、図８のステップＢ３において実行さ
れる見出しキーワード検索部２０ｃによる見出しキーワ
ードの検索処理の詳細について、図９に示すフローチャ
ートを参照しながら説明する。Next, the details of the index keyword search processing performed by the index keyword search section 20c executed in step B3 of FIG. 8 will be described with reference to the flowchart shown in FIG.

【００８６】まず、見出しキーワード検索部２０ｃは、
見出しキーワード数格納バッファ２２ｃの変数ｎＫｅｙ
Ｗｏｒｄに意味アイコンテーブル格納部１３ｂ中に格納
されている見出しキーワードの総数を格納する（ステッ
プＣ１）。First, the headline keyword search unit 20c
Variable nKey of heading keyword number storage buffer 22c
The word stores the total number of headline keywords stored in the meaning icon table storage unit 13b (Step C1).

【００８７】また、見出しキーワードの検索処理では、
見出しキーワード検索部２０ｃは、意味アイコンテーブ
ル格納部１３ｂ中の見出しキーワードをカウントする変
数ｉＫｅｙに初期値０を格納する（ステップＣ２）。In the search process of the headline keyword,
The headline keyword search unit 20c stores the initial value 0 in a variable iKey that counts headline keywords in the meaning icon table storage unit 13b (step C2).

【００８８】次に、見出しキーワード検索部２０ｃは、
意味アイコンテーブル格納部１３ｂに格納された意味ア
イコンテーブルのｉＫｅｙ番目の一つの見出しキーワー
ドに対応する要素をアイコンテーブル格納バッファ２２
ｂに格納する（ステップＣ３）。Next, the headline keyword search unit 20c
The element corresponding to the iKey-th one heading keyword of the meaning icon table stored in the meaning icon table storage unit 13b is stored in the icon table storage buffer 22.
b (step C3).

【００８９】続いて、見出しキーワード検索部２０ｃ
は、単語ｗｏｒｄ［ｉＳｅｎｔ］［ｉＷｏｒｄ］が、ア
イコンテーブル格納バッファ２２ｂに格納された見出し
キーワードｋｅｙＷｏｒｄと等しいかを判断する（ステ
ップＣ４）。Subsequently, the headline keyword search section 20c
Determines whether the word word [iSent] [iWord] is equal to the heading keyword keyWord stored in the icon table storage buffer 22b (step C4).

【００９０】この比較の結果、単語ｗｏｒｄ［ｉＳｅｎ
ｔ］［ｉＷｏｒｄ］と、見出しキーワードｋｅｙＷｏｒ
ｄとが等しければ、真の値を伴って見出しキーワード検
索部２０ｃでの処理からメイン処理に復帰する（ステッ
プＣ５）。As a result of this comparison, the word word [iSen
t] [iWord] and heading keyword keyWor
If d is equal, the process returns to the main process from the process in the index keyword search unit 20c with a true value (step C5).

【００９１】一方、単語ｗｏｒｄ［ｉＳｅｎｔ］［ｉＷ
ｏｒｄ］と見出しキーワードｋｅｙＷｏｒｄとが等しく
ない場合、変数ｉＫｅｙの値をインクリメントして、処
理対象とする見出しキーワードを変更する（ステップＣ
６）。On the other hand, the word word [iSent] [iW
ord] is not equal to the heading keyword KeyWord, the value of the variable iKey is incremented to change the heading keyword to be processed (step C).
6).

【００９２】ここで、変数ｉＫｅｙの値が、意味アイコ
ンテーブルに格納されている見出しキーワードの総数、
すなわち見出しキーワード数格納バッファ２２ｃに格納
された変数ｎＫｅｙＷｏｒｄより小さいか比較して、意
味アイコンテーブルに登録された全ての見出しキーワー
ドについて処理が完了したかを判別する（ステップＣ
７）。Here, the value of the variable iKey is the total number of headline keywords stored in the semantic icon table,
That is, it is determined whether or not the processing has been completed for all the headline keywords registered in the meaning icon table by comparing whether the variable is smaller than the variable nKeyWord stored in the headline keyword number storage buffer 22c (step C).
7).

【００９３】この比較の結果、全ての見出しキーワード
について処理が完了していなければ、見出しキーワード
検索部２０ｃは、ステップＣ３からの一連の処理を繰り
返す。As a result of this comparison, if the processing has not been completed for all the headline keywords, the headline keyword search unit 20c repeats a series of processing from step C3.

【００９４】また、比較の結果、全ての見出しキーワー
ドについて処理が完了した場合、すなわち単語ｗｏｒｄ
［ｉＳｅｎｔ］［ｉＷｏｒｄ］に該当する見出しキーワ
ードが意味アイコンテーブルに登録されていない場合に
は、偽の値を伴って見出しキーワード検索部２０ｃでの
処理からメイン処理に復帰する（ステップＣ８）。As a result of the comparison, when the processing is completed for all the heading keywords, that is, when the word word
If the headline keyword corresponding to [iSent] [iWord] is not registered in the meaning icon table, the process returns to the main process from the headword keyword search unit 20c with a false value (step C8).

【００９５】次に、図８のステップＢ６において実行さ
れる類似度計算部２０ｄによる類似度計算処理の詳細に
ついて、図１０に示すフローチャートを参照しながら説
明する。Next, details of the similarity calculation processing by the similarity calculation unit 20d executed in step B6 of FIG. 8 will be described with reference to the flowchart shown in FIG.

【００９６】本実施形態における類似度計算処理では、
意味アイコンテーブル中の共起単語の組と、文単語情報
格納バッファ２２ａに格納された単語情報データのｉＳ
ｅｎｔ番目の文との間で単語ベクトルの内積を計算して
正規化することで類似度ｓｉｍを求める。In the similarity calculation processing in this embodiment,
The set of co-occurring words in the meaning icon table and the iS of the word information data stored in the sentence word information storage buffer 22a
The similarity sim is obtained by calculating and normalizing the inner product of the word vector with the ent-th sentence.

【００９７】具体的には、意味アイコンテーブル中で、
見出しキーワードの検索処理によって求めたｉＫｅｙ番
目の見出しキーワードに対応する共起単語の組のうちｊ
Ｓｅｔ番目のものについて、文単語情報格納バッファ２
２ａに格納された単語情報データのｉＳｅｎｔ番目の文
と共通に出現する単語の数を数え、この文及び共起単語
の組中の単語数の積の値で割ることによって正規化し、
類似度ｓｉｍを算出する。Specifically, in the meaning icon table,
J of the set of co-occurring words corresponding to the iKey-th heading keyword obtained by the heading keyword searching process
For the Set-th item, sentence word information storage buffer 2
2a is normalized by counting the number of words that appear in common with the iSent-th sentence of the word information data stored in 2a and dividing by the value of the product of the number of words in this sentence and the set of co-occurring words,
The similarity sim is calculated.

【００９８】まず、類似度計算部２０ｄは、類似度算出
のもとになる単語の数を表わす変数Ｓに初期値０を代入
する（ステップＤ１）。また、類似度計算部２０ｄは、
文単語情報格納バッファ２２ａに格納された単語情報デ
ータの、ｉＳｅｎｔ番目の文中の単語の番号をカウント
する変数ｊＷｏｒｄに初期値０を代入する（ステップＤ
２）。First, the similarity calculator 20d substitutes an initial value 0 for a variable S representing the number of words on which similarity is calculated (step D1). The similarity calculating unit 20d calculates
In the word information data stored in the sentence word information storage buffer 22a, an initial value 0 is substituted for a variable jWord for counting the number of the word in the iSent-th sentence (step D).
2).

【００９９】ここで、類似度計算部２０ｄは、変数ｊＷ
ｏｒｄが変数ｉＷｏｒｄと等しいか比較する（ステップ
Ｄ３）。この比較の結果、変数ｊＷｏｒｄと変数ｉＷｏ
ｒｄとが等しかった場合、変数ｊＷｏｒｄが示す単語は
見出しキーワードに一致しているということで、変数Ｓ
に１を加え（ステップＤ４）、ステップＤ１１に制御を
移す。Here, the similarity calculating section 20d calculates the variable jW
It is compared whether ord is equal to the variable iWord (step D3). As a result of this comparison, the variable jWord and the variable iWo
If rd is equal, the word indicated by the variable jWord matches the headline keyword, and the variable S
Is added to (Step D4), and control is transferred to Step D11.

【０１００】一方、変数ｊＷｏｒｄと変数ｉＷｏｒｄと
が等しくない場合、類似度計算部２０ｄは、ステップＤ
５に制御を移す。ステップＤ５では、見出しキーワード
検索部２０ｃによる見出しキーワード検索処理によって
アイコンテーブル格納バッファ２２ｂに格納された共起
単語の組、すなわち変数ｉＷｏｒｄが示す単語と一致す
る一つの見出しキーワードに対する要素に含まれる複数
の共起単語の組のうち、ｊＳｅｔ番目の組の中の共起単
語をカウントする変数ｊＣｏｗｏｒｄに初期値０を代入
する（ステップＤ５）。On the other hand, if the variable jWord is not equal to the variable iWord, the similarity calculation unit 20d proceeds to step D
Control is transferred to 5. In step D5, a set of co-occurrence words stored in the icon table storage buffer 22b by the headline keyword search processing by the headline keyword search unit 20c, that is, a plurality of elements included in the element for one headline keyword that matches the word indicated by the variable iWord The initial value 0 is substituted for a variable jCoword that counts the co-occurrence words in the jSet-th set among the co-occurrence word sets (step D5).

【０１０１】続いて、類似度計算部２０ｄは、見出しキ
ーワードと一致した単語ｗｏｒｄ［ｉＳｅｎｔ］［ｊＷ
ｏｒｄ］が共起単語ｃｏｗｏｒｄ［ｊＳｅｔ］［ｊＣｏ
ｗｏｒｄ］と等しいか比較する（ステップＤ６）。この
比較の結果、二つの単語が等しければ、類似度計算部２
０ｄは、変数Ｓに１を加え（ステップＤ７）、ステップ
Ｄ１０に制御を移す。Subsequently, the similarity calculating section 20d calculates the word word [iSent] [jW that matches the headline keyword.
ord] is a co-occurrence word coword [jSet] [jCo
[word.word] (step D6). As a result of this comparison, if the two words are equal, the similarity calculation unit 2
For 0d, 1 is added to the variable S (step D7), and control is transferred to step D10.

【０１０２】一方、比較の結果、二つの単語が等しくな
かった場合、類似度計算部２０ｄは、ステップＤ８に制
御を移す。ステップＤ８では、変数ｊＣｏｗｏｒｄをイ
ンクリメントして、次の共起単語に処理対象を変更す
る。On the other hand, as a result of the comparison, when the two words are not equal, the similarity calculating section 20d shifts the control to step D8. In step D8, the variable jCoword is incremented, and the processing target is changed to the next co-occurrence word.

【０１０３】続いて類似度計算部２０ｄは、変数ｊＣｏ
ｗｒｄの値が、共起単語数ｎＣｏｗｒｄ［ｊＳｅｔ］の
値以下か比較して、ｊＳｅｔ番目の共起単語の組中の全
ての共起単語について処理を完了したかを判別する（ス
テップＤ９）。Subsequently, the similarity calculating section 20d calculates the variable jCo
By comparing whether the value of wrd is equal to or less than the value of the number of co-occurring words nCowrd [jSet], it is determined whether or not the processing has been completed for all the co-occurring words in the jSet-th co-occurring word set (step D9).

【０１０４】全ての共起単語についての処理が完了して
いなければ、類似度計算部２０ｄは、ステップＤ６から
の処理を繰り返す。また、全ての共起単語についての処
理が完了していれば、類似度計算部２０ｄは、ステップ
Ｄ１０に制御を移す。If the processing has not been completed for all the co-occurring words, the similarity calculating section 20d repeats the processing from step D6. If the processing has been completed for all the co-occurring words, the similarity calculating unit 20d transfers the control to step D10.

【０１０５】ステップＤ１０では、変数ｊＷｏｒｄの値
をインクリメントして、次の単語に処理対象を変更す
る。このあと、類似度計算部２０ｄは、変数ｊＷｏｒｄ
の値が、ｉＳｅｎｔ番目の文に出現した単語の総数ｎＷ
ｏｒｄ［ｉＳｅｎｔ］値以下か比較し、全ての単語につ
いての処理が完了したかを判別する（ステップＤ１
１）。この比較の結果、全ての単語について処理が完了
していなければ、類似度計算部２０ｄは、ステップＤ３
からの処理を繰り返す。In step D10, the value of the variable jWord is incremented, and the processing target is changed to the next word. Thereafter, the similarity calculation unit 20d sets the variable jWord
Is the total number nW of words appearing in the iSent-th sentence
It is compared with the value of ord [iSent] or less to determine whether or not the processing has been completed for all the words (step D1).
1). As a result of the comparison, if the processing has not been completed for all the words, the similarity calculating unit 20d determines in step D3
Is repeated.

【０１０６】一方、全ての単語について処理が完了して
いれば、類似度計算部２０ｄは、ステップＤ１２に制御
を移す。以上のステップＤ１２までの処理によって、ｉ
Ｓｅｎｔ番目の文中の全ての単語のそれぞれについて、
ｊＳｅｔ番目の共起単語の組の全ての共起単語、及び見
出しキーワードとの比較がそれぞれ行われ、その結果、
共通して存在すると判別された単語の数が変数Ｓに格納
されている。On the other hand, if the processing has been completed for all the words, the similarity calculating section 20d shifts the control to step D12. By the processing up to step D12, i
For each of all the words in the sentence Sent,
All the co-occurrence words of the jSet-th co-occurrence word set and the headline keyword are compared, and as a result,
The number of words determined to exist in common is stored in a variable S.

【０１０７】ステップＤ１２では、変数Ｓの値を、ｉＳ
ｅｎｔ番目の文中の単語の総数を示す変数ｎＷｏｒｄ
［ｉＳｅｎｔ］と、ｊＳｅｔ番目の共起単語の組中の共
起単語の総数を示す変数ｎＣｏｗｏｒｄ［ｊＳｅｔ］の
積で割ることによって正規化し、ｉＳｅｎｔ番目の文と
ｊＳｅｔ番目の共起単語の組との類似度値ｓｉｍを求め
る。以上の処理により、類似度計算部２０ｄは類似度計
算処理を終え、類似度ｓｉｍの値を戻り値にしてアイコ
ン付加処理に復帰する。In step D12, the value of the variable S is
Variable nWord indicating the total number of words in the ent-th sentence
[ISent] is normalized by dividing by the product of a variable nCoword [jSet] indicating the total number of co-occurring words in the set of jSet-th co-occurring words. Is calculated. Through the above processing, the similarity calculation unit 20d ends the similarity calculation processing, returns the value of the similarity sim as a return value, and returns to the icon addition processing.

【０１０８】以上のようにして、文アイコン付加処理部
２０ｂによるアイコンの付加処理では、処理対象とする
ｉＳｅｎｔ番目の文中の各単語について、意味アイコン
テーブルに該当する見出しキーワードが登録されている
か判別される。そして、該当する見出しキーワードが存
在する場合には、この見出しキーワードに対応して登録
された複数の共起単語の組のそれぞれについて、見出し
キーワードと同じ単語を含む文との類似度を求め、最も
類似度が高い共起単語の組に対応付けられたアイコンが
ｉＳｅｎｔ番目の文に対応づけて格納される。また、こ
の時の見出しキーワードも格納される。As described above, in the icon addition processing by the sentence icon addition processing section 20b, it is determined whether or not a heading keyword corresponding to the meaning icon table is registered for each word in the iSent-th sentence to be processed. You. If there is a corresponding headline keyword, the similarity between each of a plurality of sets of co-occurring words registered corresponding to the headline keyword and a sentence including the same word as the headline keyword is calculated. The icon associated with the set of co-occurrence words having a high degree of similarity is stored in association with the iSent-th sentence. The headline keyword at this time is also stored.

【０１０９】従って、ｉＳｅｎｔ番目の文中の複数の単
語が、意味アイコンテーブルに登録された複数の見出し
キーワードの何れかと同じであれば、該当するそれぞれ
の見出しキーワードに対応する複数の共起単語の組との
類似度値に基づいてアイコンが文に対して対応づけられ
る。つまり、一つの文に対して、複数のアイコンが対応
づけられ得る。Therefore, if a plurality of words in the iSent-th sentence are the same as any of a plurality of headline keywords registered in the meaning icon table, a group of a plurality of co-occurrence words corresponding to each corresponding headline keyword is provided. The icon is associated with the sentence based on the similarity value with. That is, a plurality of icons can be associated with one sentence.

【０１１０】次に、図７のステップＡ７において実行さ
れるアイコンソート部２０ｅによるアイコンのソート処
理の詳細について、図１１及び図１２に示すフローチャ
ートを参照しながら説明する。Next, the details of the icon sorting process performed by the icon sorting unit 20e in step A7 of FIG. 7 will be described with reference to the flowcharts shown in FIGS.

【０１１１】ここでは、文アイコン付加処理部２０ｂに
より文書番号ｉＤｏｃの文書に含まれる各文に対して付
加された複数のアイコン（アイコンＩＤ番号）ごとに、
対応する文を示す文番号が整理されたアイコン一覧管理
テーブルを作成する。Here, for each of a plurality of icons (icon ID numbers) added to each sentence included in the document with the document number iDoc by the sentence icon addition processing unit 20b,
An icon list management table in which sentence numbers indicating corresponding sentences are arranged is created.

【０１１２】まず、アイコンソート部２０ｅは、文書中
のアイコンの数をカウントする変数ｎＩｃｏｎＤｏｃ
［ｉＤｏｃ］に初期値０を代入し（ステップＥ１）、比
較元の文番号を表わす変数ｉＳｒｃＳｅｎｔに初期値０
を代入する（ステップＥ２）。アイコンソート部２０ｅ
は、さらに比較元の文のアイコン番号を表わす変数ｉＳ
ｒｃＩｃｏｎに初期値０を代入する（ステップＥ３）。First, the icon sort unit 20e sets a variable nIconDoc for counting the number of icons in a document.
The initial value 0 is substituted into [iDoc] (step E1), and the initial value 0 is substituted into the variable iSrcSent representing the statement number of the comparison source.
Is substituted (step E2). Icon sorting unit 20e
Is a variable iS representing the icon number of the source sentence
An initial value 0 is substituted for rcIcon (step E3).

【０１１３】アイコンソート部２０ｅは、比較元のアイ
コンのＩＤ番号を表わす変数ｉｄＳｒｃに、ｉＳｒｃＳ
ｅｎｔ番目の文のｉＳｒｃＩｃｏｎ番目のアイコンのＩ
Ｄ番号を示す変数ｉｄＤｉｃｏｎ［ｉＳｒｃＳｅｎｔ］
［ｉＳｒｃＩｃｏｎ］の値を代入する（ステップＥ
４）。The icon sort unit 20e stores iSrcS in the variable idSrc representing the ID number of the icon of the comparison source.
iSrcIcon of the entth sentence I of the ith icon
Variable idDicon [iSrcSent] indicating D number
Substitute the value of [iSrcIcon] (step E
4).

【０１１４】ここでアイコンソート部２０ｅは、代入さ
れた変数ｉｄＳｒｃの値が、チェック済みを表わす−１
であるか比較を行なう（ステップＥ５）。この比較の結
果、変数ｉｄＳｒｃの値が−１であれば、ステップＥ９
に制御を移す。Here, the icon sort unit 20e determines that the value of the variable idSrc that has been substituted is -1 indicating that the check has been completed.
Are compared (step E5). As a result of this comparison, if the value of the variable idSrc is -1, step E9
Transfer control to.

【０１１５】一方、変数ｉｄＳｒｃの値が−１でない場
合には、アイコンソート部２０ｅは、アイコンのＩＤ番
号を格納する変数ｉｄＩｃｏｎＤｏｃ［ｉＤｏｃ］［ｎ
ＩｃｏｎＤｏｃ［ｉＤｏｃ］］に、変数ｉｄＳｒｃの
値、すなわちアイコンＩＤ番号を格納する（ステップＥ
６）。On the other hand, if the value of the variable idSrc is not -1, the icon sort unit 20e stores the variable idIConDoc [iDoc] [n
The value of the variable idSrc, that is, the icon ID number is stored in IconDoc [iDoc]] (step E).
6).

【０１１６】また、アイコンソート部２０ｅは、アイコ
ンの付加元キーワードを格納するための変数ｋｅｙＳｒ
ｃ［ｉＤｏｃ］［ｎＩｃｏｎＤｏｃ［ｉＤｏｃ］］に、
付加元キーワード格納バッファ２２ｈの変数ｂｕｆＫｅ
ｙ［ｉＳｒｃＳｅｎｔ］［ｉＳｒｃＩｃｏｎ］に格納さ
れている見出しキーワードを格納する（ステップＥ
７）。The icon sorting unit 20e includes a variable keySr for storing the keyword from which the icon is added.
c [iDoc] [nIConDoc [iDoc]]
Variable bufKe of the addition source keyword storage buffer 22h
y [iSrcSent] The headline keyword stored in [iSrcIcon] is stored (step E).
7).

【０１１７】このあとアイコンソート部２０ｅは、文書
中の一つのアイコンに対応する文の総数をカウントする
変数ｎＩｃｏｎＳｅｎｔ［ｉＤｏｃ］［ｎＩｃｏｎＤｏ
ｃ［ｉＤｏｃ］］に０を代入する（ステップＥ８）。Thereafter, the icon sorting unit 20e calculates a variable nIconSent [iDoc] [nIconDo] for counting the total number of sentences corresponding to one icon in the document.
Substitute 0 for c [iDoc]] (step E8).

【０１１８】さらに、アイコンソート部２０ｅは、アイ
コンに対応する文の文番号を格納する以下の式（１）で
示す変数Ａに文番号ｉＳｒｃＳｅｎｔを代入する（ステ
ップＥ９）。Further, the icon sorting unit 20e substitutes the sentence number iSrcSent for the variable A represented by the following equation (1) for storing the sentence number of the sentence corresponding to the icon (step E9).

【０１１９】[0119]

【数１】 (Equation 1)

【０１２０】アイコンソート部２０ｅは、変数ｉＳｒｃ
Ｉｃｏｎに１を加えて、処理対象とする比較元の文のア
イコン番号を変更する（ステップＥ１０）。この値が比
較元の文に付加されたアイコン数ｎＩｃｏｎ［ｉＳｒｃ
Ｓｅｎｔ］の値以下か比較することによって、全ての比
較元の文のアイコン番号を対象とした処理を完了したか
を判別する（ステップＥ１１）。この比較の結果、全て
の比較元の文のアイコン番号を対象とした処理を完了し
ていない場合には、アイコンソート部２０ｅは、比較先
の文番号を表わす変数ｉＤｓｔＳｅｎｔにｉＳｒｃＳｅ
ｎｔの値を代入し（ステップＥ１２）、次に比較先のア
イコン番号を表わす変数ｉＤｓｔＩｃｏｎに変数ｉＳｒ
ｃＩｃｏｎの値を代入する（ステップＥ１３）。The icon sorting unit 20e stores the variable iSrc
The icon number of the comparison source sentence to be processed is changed by adding 1 to Icon (step E10). This value is the number of icons nIcon [iSrc added to the sentence of the comparison source.
[Sent], it is determined whether or not the processing for the icon numbers of all the comparison source sentences has been completed (step E11). As a result of this comparison, if the processing for the icon numbers of all the comparison source sentences has not been completed, the icon sort unit 20e stores iSrcSe in the variable iDstSent representing the comparison destination sentence number.
The value of nt is then substituted (step E12), and the variable iSr is
The value of cIcon is substituted (step E13).

【０１２１】続いて、アイコンソート部２０ｅは、比較
先のアイコンのＩＤ番号を表わす変数ｉｄＤｉｃｏｎ
［ｉＤｓｔＳｅｎｔ］［ｉＤｓｔＩｃｏｎ］の値が、先
に格納した比較元のアイコンＩＤ番号を表す変数ｉｄＳ
ｒｃに等しく、かつｂｕｆＫｅｙ［ｉＤｓｔＳｅｎｔ］
［ｉＤｓｔＩｃｏｎ］に格納されている付加元キーワー
ドが先に変数ｂｕｆＫｅｙ［ｉＳｒｃＳｅｎｔ］［ｉＳ
ｒｃＩｃｏｎ］に格納したキーワードと等しいか比較す
る（ステップＥ１４）。Subsequently, the icon sorting unit 20e outputs a variable idDicon representing the ID number of the icon of the comparison destination.
The value of [iDstSent] and [iDstIcon] is a variable idS representing the icon ID number of the comparison source stored earlier.
equal to rc and bufKey [iDstSent]
The addition source keyword stored in [iDstIcon] is first stored in the variable bufKey [iSrcSent] [iS
[rcIcon] is compared with the keyword stored therein (step E14).

【０１２２】ここで、比較先と比較元のアイコンのＩＤ
番号及び付加元キーワードが等しかった場合、アイコン
ソート部２０ｅは、変数ｉｄＤｉｃｏｎ［ｉＤｓｔＳｅ
ｎｔ］［ｉＤｓｔＩｃｏｎ］にチェック済みであること
を示す数値である−１（マイナス１）を代入する（ステ
ップＥ１５）。Here, the IDs of the comparison destination and comparison source icons
When the number and the addition source keyword are equal, the icon sorting unit 20e determines the variable idDicon [idStSe
nt] [iDstIcon] is substituted with −1 (minus 1) which is a numerical value indicating that the check has been completed (step E15).

【０１２３】次にアイコンソート部２０ｅは、文番号ｉ
ＤｓｔＳｅｎｔの値が、先に格納したアイコンに対応す
る文の文番号を格納する変数Ａの値と等しいか比較する
（ステップＥ１６）。比較が成立すれば、そのままステ
ップＥ１９に制御を移す。この処理によりアイコンに重
複した文が対応づけられるのを防いでいる。Next, the icon sorting unit 20e determines that the sentence number i
A comparison is made as to whether the value of DstSent is equal to the value of the variable A that stores the statement number of the sentence corresponding to the previously stored icon (step E16). If the comparison is established, the control is directly transferred to step E19. This processing prevents the duplicated sentence from being associated with the icon.

【０１２４】文番号ｉＤｓｔＳｅｎｔの値が変数Ａと等
しくなかった場合には、アイコンソート部２０ｅは、ア
イコンに対応する文の総数をカウントする変数ｎＩｃｏ
ｎＳｅｎｔ［ｉＤｏｃ］［ｎＩｃｏｎＤｏｃ［ｉＤｏ
ｃ］］の値をインクリメントし（ステップＥ１７）、新
たなｎＩｃｏｎＳｅｎｔ［ｉＤｏｃ］［ｎＩｃｏｎＤｏ
ｃ］］の値に対して、アイコンに対応する文の文番号を
格納する変数Ａに文番号ｉＤｓｔＳｅｎを代入し（ステ
ップＥ１８）、ステップＥ１９に制御を移す。If the value of the sentence number iDstSent is not equal to the variable A, the icon sort unit 20e sets the variable nIco to count the total number of sentences corresponding to the icon.
nSent [iDoc] [nIconDoc [iDoc
c]] is incremented (step E17), and a new nIconSent [iDoc] [nIconDo] is set.
c]], the sentence number iDstSen is substituted for the variable A storing the sentence number of the sentence corresponding to the icon (step E18), and the control is transferred to step E19.

【０１２５】一方、ステップＥ１４における比較の結
果、比較先と比較元のアイコンのＩＤ番号及び付加元キ
ーワードが等しくなかった場合、及びステップＥ１８の
処理の後には、アイコンソート部２０ｅは、変数ｉＤｓ
ｔＩｃｏｎの値をインクリメントし、比較先のアイコン
番号を更新する（ステップＥ１９）。On the other hand, as a result of the comparison in step E14, if the ID numbers and the addition source keywords of the icons of the comparison target and the comparison source are not equal, and after the processing of step E18, the icon sorting unit 20e sets the variable iDs
The value of tIcon is incremented, and the icon number of the comparison destination is updated (step E19).

【０１２６】アイコンソート部２０ｅは、変数ｉＤｓｔ
Ｉｃｏｎの値が文中の単語数ｎＩｃｏｎ［ｉＤｓｔＳｅ
ｎｔ］の値以下か比較して、全ての比較先のアイコン番
号についての処理が完了したかを判別する（ステップＥ
２０）。The icon sorting unit 20e stores the variable iDst
If the value of Icon is the number of words in the sentence nIcon [iDstSe
nt] to determine whether the processing has been completed for all the icon numbers of the comparison destination (step E).
20).

【０１２７】全ての比較先のアイコン番号についての処
理が完了していなければ、アイコンソート部２０ｅは、
ステップＥ１３からの一連の処理を繰り返す。また、処
理が完了している場合には、アイコンソート部２０ｅ
は、文番号ｉＤｓｔＳｅｎｔの値をインクリメントする
（ステップＥ２１）。If the processing has not been completed for all the comparison target icon numbers, the icon sorting unit 20e
A series of processes from Step E13 is repeated. If the processing is completed, the icon sort unit 20e
Increments the value of statement number iDstSent (step E21).

【０１２８】さらに、アイコンソート部２０ｅは、文番
号ｉＤｓｔＳｅｎｔの値が文書中の文の総数ｎＳｅｎｔ
以下か比較して、全ての文番号について処理を完了した
かを判別する（ステップＥ２２）。この比較の結果、全
ての文番号についての処理を完了していなければ、アイ
コンソート部２０ｅは、比較先のアイコン番号を表わす
変数ｉＤｓｔＩｃｏｎに０を代入した後（ステップＥ２
３）、ステップＥ１３に制御を移す。Further, the icon sort unit 20e determines that the value of the sentence number iDstSent is the total number nSent of sentences in the document.
It is determined whether or not the processing has been completed for all the sentence numbers by comparing the following (step E22). As a result of this comparison, if the processing has not been completed for all the sentence numbers, the icon sorting unit 20e substitutes 0 for a variable iDstIcon representing the icon number of the comparison destination (step E2).
3) Transfer control to step E13.

【０１２９】一方、比較の結果、全ての文番号について
の処理を完了している場合、アイコンソート部２０ｅ
は、文書中のアイコンの数をカウントする変数ｎＩｃｏ
ｎＤｏｃ［ｉＤｏｃ］の値をインクリメントした後（ス
テップＥ２４）、ステップＥ４に制御を移し、前述した
処理を繰り返す。On the other hand, as a result of the comparison, if the processing has been completed for all the sentence numbers, the icon sorting unit 20e
Is a variable nIco that counts the number of icons in the document
After incrementing the value of nDoc [iDoc] (step E24), control is transferred to step E4, and the above-described processing is repeated.

【０１３０】なお、ステップＥ１１の比較の結果、全て
の比較元の文のアイコン番号を対象とした処理を完了し
ている場合には、アイコンソート部２０ｅは、文番号ｉ
ＳｒｃＳｅｎｔの値をインクリメントして、次の文番号
に処理対象を変更する（ステップＥ２５）。As a result of the comparison in step E11, if the processing for the icon numbers of all the comparison source sentences has been completed, the icon sorting unit 20e sets the sentence number i
The value of SrcSent is incremented, and the processing target is changed to the next statement number (step E25).

【０１３１】アイコンソート部２０ｅは、文番号ｉＳｒ
ｃＳｅｎｔの値が文書中の文の総数ｎＳｅｎｔ以下か比
較して、全ての文番号についての処理を完了したかを判
別する（ステップＥ２６）。The icon sorting unit 20e determines the sentence number iSr
It is determined whether the value of cSent is equal to or less than the total number of sentences in the document, nSent, to determine whether the processing for all the sentence numbers has been completed (step E26).

【０１３２】この比較の結果、全ての文番号についての
処理を完了していなければ、ステップ１０ｃに制御を移
して、前述した処理を繰り返して行なう。一方、全ての
文番号についての処理を完了している場合には、アイコ
ンソート部２０ｅは、アイコンソート部２０ｅによるア
イコンのソート処理を終え、メイン処理に復帰する。As a result of this comparison, if the processing has not been completed for all the sentence numbers, the control is shifted to step 10c, and the above-mentioned processing is repeated. On the other hand, when the processes for all the sentence numbers have been completed, the icon sorting unit 20e ends the icon sorting process by the icon sorting unit 20e, and returns to the main process.

【０１３３】以上のようなアイコンソート部２０ｅによ
るアンコンのソート処理により、図６に示すようなアイ
コン一覧管理テーブルが作成されアイコン一覧管理バッ
ファ２２ｆに格納される。すなわち、変数ｉＤｏｃが示
す文書ごとにアイコン総数（変数ｎＩｃｏｎＤｏｃ［ｉ
Ｄｏｏｃ］）が求められ、変数ｉｄＳｒｃが示すアイコ
ン（アイコンＩＤ番号）に対応して、変数ｎＩｃｏｎＳ
ｅｎｔ［ｉＤｏｃ］［ｎＩｃｏｎＤｏｃ［ｉＤｏｃ］］
が示す文数と、付加元キーワード（見出しキーワード）
とが格納される。また、各文番号アイコンＩＤ番号に対
応して、それぞれに対応する変数Ａ（変数ｉｃｏｎＳｅ
ｎｔ）によって示される文番号が格納される。The icon list management table as shown in FIG. 6 is created and stored in the icon list management buffer 22f by the above-described icon sort processing by the icon sort section 20e. That is, the total number of icons for each document indicated by the variable iDoc (variable nIConDoc [i
Doc]) is obtained, and the variable nIconS is set in correspondence with the icon (icon ID number) indicated by the variable idSrc.
ent [iDoc] [nIconDoc [iDoc]]
The number of sentences indicated by and the source keyword (headline keyword)
Are stored. In addition, corresponding to each sentence number icon ID number, a corresponding variable A (variable iconSe
nt) is stored.

【０１３４】次に、図７のステップＡ１０において実行
される文書一覧アイコン表示部２０ｆによる文書一覧の
処理の詳細について、図１３に示すフローチャートを参
照しながら説明する。Next, the details of the document list processing by the document list icon display section 20f executed in step A10 of FIG. 7 will be described with reference to the flowchart shown in FIG.

【０１３５】まず、文書一覧アイコン表示部２０ｆは、
文書番号カウント用変数ｉＤｏｃに初期値０を格納する
（ステップＦ１）。このあと、文書一覧アイコン表示部
２０ｆは、表示装置１２の表示画面中に、変数ｉＤｏｃ
の値に対応した位置に文書を表わす矩形を描画する（ス
テップ１１ｂ）。なお、ここで表示した矩形を文書矩形
枠と呼ぶ。First, the document list icon display section 20f displays
The initial value 0 is stored in the document number counting variable iDoc (step F1). Thereafter, the document list icon display unit 20f displays the variable iDoc on the display screen of the display device 12.
A rectangle representing the document is drawn at a position corresponding to the value (step 11b). The rectangle displayed here is called a document rectangular frame.

【０１３６】続いて文書一覧アイコン表示部２０ｆは、
アイコン番号カウント用変数ｉＩｃｏｎに初期値０を代
入する（ステップＦ３）。文書一覧アイコン表示部２０
ｆは、アイコン一覧管理バッファ２２ｆ中のアイコンに
対応付けられた文の総数ｎＩｃｏｎＳｅｎｔ［ｉＤｏ
ｃ］［ｉＩｃｏｎ］を参照して、アイコンを表示するた
めの矩形領域の拡大係数ｃを決定する（ステップＦ
４）。本実施形態における文書検索装置では、例えば変
数ｎＩｃｏｎＳｅｎｔ［ｉＤｏｃ］［ｉＩｃｏｎ］の平
方根の値を拡大係数ｃの値としている。従って、多くの
文が対応付けられているアイコンほど拡大係数ｃの値が
大きくなる。Then, the document list icon display section 20f displays
The initial value 0 is substituted for the icon number counting variable iIcon (step F3). Document list icon display section 20
f is the total number nIconSent [iDo of sentences associated with the icons in the icon list management buffer 22f.
c] Referring to [iIcon], determine enlargement coefficient c of the rectangular area for displaying the icon (step F).
4). In the document search device according to the present embodiment, for example, the value of the square root of the variable nIconSent [iDoc] [iIcon] is set as the value of the expansion coefficient c. Therefore, the value of the enlargement coefficient c increases as the icon is associated with more sentences.

【０１３７】次に、文書一覧アイコン表示部２０ｆは、
変数ｉｄＩｃｏｎＤｏｃ［ｉＤｏｃ］［ｉＩｃｏｎ］に
格納されているＩＤ番号に対応したイメージデータ及び
縦方向のサイズ情報及び横方向のサイズ情報をアイコン
本体データ格納部１３ｃよりアイコンイメージ格納バッ
ファ２２ｇ中にロードする（ステップＦ５）。Next, the document list icon display section 20f displays
The image data corresponding to the ID number stored in the variable idIconDoc [iDoc] [iIcon] and the vertical size information and the horizontal size information are loaded from the icon body data storage unit 13c into the icon image storage buffer 22g ( Step F5).

【０１３８】また、文書一覧アイコン表示部２０ｆは、
文書矩形枠内において、アイコンを表示するための矩形
領域のサイズを計算する（ステップＦ６）。ここで、ア
イコンを表示するための矩形領域の縦横サイズは、アイ
コンイメージ格納バッファ２２ｇに格納されている縦方
向のサイズ情報及び横方向のサイズ情報の各値に、先に
求めた拡大係数ｃの値を乗じたものとする。The document list icon display section 20f displays
The size of a rectangular area for displaying an icon within the document rectangular frame is calculated (step F6). Here, the vertical and horizontal sizes of the rectangular area for displaying the icon are calculated by adding the values of the vertical size information and the horizontal size information stored in the icon image storage buffer 22g to the values of the enlargement coefficient c obtained earlier. The value is multiplied.

【０１３９】文書一覧アイコン表示部２０ｆは、拡大係
数ｃの値が１より大きいか比較し（ステップＦ７）、こ
の比較の結果、拡大係数ｃの値が１より大きい場合に
は、アイコンイメージ格納バッファ２２ｇに格納されて
いるイメージデータに対して、表示された際に先に求め
た矩形領域と同じサイズとなるように拡大処理を行なう
（ステップＦ８）。The document list icon display section 20f compares whether the value of the enlargement coefficient c is greater than 1 (step F7). If the comparison shows that the value of the enlargement coefficient c is greater than 1, the icon image storage buffer is displayed. Enlargement processing is performed on the image data stored in 22g so that it has the same size as the previously obtained rectangular area when displayed (step F8).

【０１４０】このあと、文書一覧アイコン表示部２０ｆ
は、ステップＦ２で描画した文書矩形枠内の、変数ｉＩ
ｃｏｎの値に対応した位置の、ステップＦ６で求めたサ
イズの矩形領域内に、アイコンメージ格納バッファ２２
ｇ中に格納されているイメージデータを表示装置１２上
に展開表示する（ステップＦ９）。なお、ここでアイコ
ンが表示されている矩形領域の枠をアイコン矩形枠と呼
ぶ。アイコン矩形枠は、論理的な矩形領域の外辺であ
り、一般に表示されないものである。Thereafter, the document list icon display section 20f
Is the variable iI in the document rectangular frame drawn in step F2.
The icon image storage buffer 22 is located at a position corresponding to the value of con in the rectangular area of the size determined in step F6.
The image data stored in g is expanded and displayed on the display device 12 (step F9). Here, the frame of the rectangular area where the icon is displayed is called an icon rectangular frame. The icon rectangular frame is the outer side of the logical rectangular area and is not generally displayed.

【０１４１】さらに、文書一覧アイコン表示部２０ｆ
は、アイコン矩形枠に対応づけて、アイコンの付加元キ
ーワードを格納するための変数ｋｅｙＳｒｃ［ｉＤｏ
ｃ］［ｉＩｃｏｎ］に格納されている付加元キーワード
の文字列を表示する（ステップＦ１０）。Further, the document list icon display section 20f
Is a variable keySrc [iDo] for storing an icon addition source keyword in association with an icon rectangular frame.
c] The character string of the addition source keyword stored in [iIcon] is displayed (step F10).

【０１４２】次に、アイコン一覧管理バッファ２２ｆ
は、変数ｉＩｃｏｎをインクリメントして、処理対象と
するアイコン番号を変更する（ステップＦ１１）。文書
一覧アイコン表示部２０ｆは、変数ｉＩｃｏｎの値が、
ｉＤｏｃ番目の文書に含まれているアイコンの総数ｎＩ
ｃｏｎＤｏｃ［ｉＤｏｃ］の値以上か比較して、全ての
アイコン番号について処理が完了したかを判別する（ス
テップＦ１２）。Next, the icon list management buffer 22f
Changes the icon number to be processed by incrementing the variable iIcon (step F11). The document list icon display unit 20f indicates that the value of the variable iIcon is
The total number nI of icons contained in the iDoc-th document
It is determined whether or not the processing has been completed for all the icon numbers by comparing with the value of conDoc [iDoc] or more (step F12).

【０１４３】全てのアイコン番号についての処理が完了
していなければ、ステップＦ４からの一連の処理を繰り
返す。一方、処理が完了していれば、文書一覧アイコン
表示部２０ｆは、変数ｉＤｏｃの値をインクリメントし
て、次の文書に処理対象を変更する（ステップＦ１
３）。If the processing has not been completed for all the icon numbers, a series of processing from step F4 is repeated. On the other hand, if the processing is completed, the document list icon display unit 20f increments the value of the variable iDoc and changes the processing target to the next document (step F1).
3).

【０１４４】次に、文書一覧アイコン表示部２０ｆは、
変数ｉＤｏｃの値が文書数Ｎ以下か比較して、全ての文
書についての処理が完了したか判別する（ステップＦ１
４）。この比較の結果、全ての文書についての処理が完
了していなければ、ステップＦ２に制御を移し、次の文
書について処理を繰り返す。一方、全ての文書について
の処理が完了していれば、文書一覧アイコン表示部２０
ｆでの処理を終了しメイン処理に復帰する。Next, the document list icon display section 20f displays
By comparing whether the value of the variable iDoc is equal to or less than the number N of documents, it is determined whether or not the processing has been completed for all documents (step F1).
4). As a result of this comparison, if the processing has not been completed for all the documents, the control is shifted to step F2, and the processing is repeated for the next document. On the other hand, if the processing has been completed for all documents, the document list icon display unit 20
The process at f is terminated and the process returns to the main process.

【０１４５】なお、文書一覧アイコン表示部２０ｆは、
表示画面３０において表示した各文書矩形枠、及び各ア
イコン矩形枠の範囲を示すデータ（例えば矩形枠の対角
の２点の座標位置）をバッファ部２２において格納して
おく。例えば、アイコン一覧管理バッファ２２ｆに格納
されたアイコン一覧管理テーブルの各要素（文書、アイ
コン）と対応づけて格納する。この各矩形枠の範囲を示
すデータは、後述する文書選択表示部２０ｇによる文書
選択の処理の際に参照される。Note that the document list icon display section 20f displays
Data indicating the range of each document rectangular frame and each icon rectangular frame displayed on the display screen 30 (for example, the coordinate positions of two diagonal points of the rectangular frame) are stored in the buffer unit 22. For example, it is stored in association with each element (document, icon) of the icon list management table stored in the icon list management buffer 22f. The data indicating the range of each rectangular frame is referred to at the time of document selection processing by the document selection display unit 20g described later.

【０１４６】図１４には、文書一覧アイコン表示部２０
ｆによる文書一覧の表示処理によって表示された表示画
面３０の一例を示している。図１４に示す表示画面３０
には、文書一覧の表示を終了させる指示を入力するため
の終了アイコン３２が設けられている。また、文書一覧
として、４つの文書に対応する文書矩形枠０〜３が設け
られて、それぞれの文書矩形枠内に複数のアイコン矩形
枠が設けられアイコンが表示されている。FIG. 14 shows the document list icon display section 20.
6 shows an example of the display screen 30 displayed by the document list display process by f. Display screen 30 shown in FIG.
Is provided with an end icon 32 for inputting an instruction to end the display of the document list. Further, as a document list, document rectangular frames 0 to 3 corresponding to four documents are provided, and a plurality of icon rectangular frames are provided in each document rectangular frame to display icons.

【０１４７】図１４に示す文書一覧では、文書が複数の
アイコンによって表わされているため、アイコンの種類
（イメージ）と、その組み合わせによって、文書内容の
概略を把握をすることができる。さらに、アイコンのサ
イズにより、アイコンの種類に関係する文の量を把握す
ることができる。In the document list shown in FIG. 14, since a document is represented by a plurality of icons, an outline of the contents of the document can be grasped by the type (image) of the icon and the combination thereof. Furthermore, the amount of sentences related to the type of the icon can be grasped from the size of the icon.

【０１４８】次に、図７のステップＡ１１において実行
される文書選択表示部２０ｇによる文書の選択・表示処
理の詳細について、図１５に示すフローチャートを参照
しながら説明する。Next, the details of the document selection / display processing by the document selection / display section 20g executed in step A11 of FIG. 7 will be described with reference to the flowchart shown in FIG.

【０１４９】まず、文書選択表示部２０ｇは、入力装置
１１（例えばマウス等のポインティングデバイス）を用
いて、図１４に示す表示画面３０上の一点の指定を行な
わせて、その指定された座標値を入力する（ステップＧ
１）。First, using the input device 11 (for example, a pointing device such as a mouse), the document selection display section 20g allows the user to specify one point on the display screen 30 shown in FIG. (Step G
1).

【０１５０】次に、文書選択表示部２０ｇは、指定され
た点が文書一覧アイコン表示部２０ｆによって表示され
た何れかの文書矩形枠の内部か否かを判定する（ステッ
プＧ２）。すなわち、文書選択表示部２０ｇは、文書一
覧アイコン表示部２０ｆによる文書一覧の表示処理の際
にバッファ部２２に格納された各文書矩形枠の範囲を示
すデータを参照して判定する。Next, the document selection display section 20g determines whether or not the designated point is inside any of the document rectangular frames displayed by the document list icon display section 20f (step G2). That is, the document selection display unit 20g makes the determination with reference to the data indicating the range of each document rectangular frame stored in the buffer unit 22 during the process of displaying the document list by the document list icon display unit 20f.

【０１５１】何れかの文書矩形枠の内部ならば、文書選
択表示部２０ｇは、該当する文書矩形枠に対応する文書
番号を求め変数ｉＤｏｃに格納する（ステップＧ３）。
さらに、文書選択表示部２０ｇは、指定された点が文書
一覧アイコン表示部２０ｆによって表示された何れかの
アイコン矩形枠の内部にあるか判定する（ステップＧ
４）。すなわち、文書選択表示部２０ｇは、文書一覧ア
イコン表示部２０ｆによる文書一覧の表示処理の際にバ
ッファ部２２に格納された各アイコン矩形枠の範囲を示
すデータを参照して判定する。If it is inside any of the document rectangular frames, the document selection display section 20g obtains the document number corresponding to the corresponding document rectangular frame and stores it in the variable iDoc (step G3).
Further, the document selection display unit 20g determines whether the designated point is inside any of the icon rectangular frames displayed by the document list icon display unit 20f (step G).
4). That is, the document selection display unit 20g makes the determination with reference to the data indicating the range of each icon rectangular frame stored in the buffer unit 22 during the process of displaying the document list by the document list icon display unit 20f.

【０１５２】指定された点が、アイコン矩形枠の内部に
ある場合、文書選択表示部２０ｇは、対応するアイコン
番号を求め、変数ｉＩｃｏｎに格納する（ステップＧ
５）。続いて、文書選択表示部２０ｇは、変数ｉの値を
０からｎＩｃｏｎＳｅｎｔ［ｉＤｏｃ］［ｉＩｃｏｎ］
−１まで順に変化させ、文書データ蓄積部１３ａに格納
されているｉＤｏｃ番目の文書中の変数ｉｃｏｎＳｅｎ
ｔ［ｉＤｏｃ］［ｉＩｃｏｎ］［ｉ］で表わされる文の
テキスト内容を表示する（ステップＧ６）。If the specified point is inside the icon rectangular frame, the document selection display section 20g obtains the corresponding icon number and stores it in the variable iIcon (step G).
5). Subsequently, the document selection display unit 20g changes the value of the variable i from 0 to nIconSent [iDoc] [iIcon].
−1, and the variable iconSen in the iDoc-th document stored in the document data storage unit 13a.
The text content of the sentence represented by t [iDoc] [iIcon] [i] is displayed (step G6).

【０１５３】一方、ステップＧ４において、指定された
点が、アイコン矩形枠の外側にあった場合には、文書選
択表示部２０ｇは、文書データ蓄積部１３ａに格納され
ているｉＤｏｃ番目の文書の全文のテキスト内容の表示
を行なう（ステップＧ７）。On the other hand, in step G4, if the designated point is outside the icon rectangular frame, the document selection display unit 20g displays the full text of the iDoc-th document stored in the document data storage unit 13a. Is displayed (step G7).

【０１５４】なお、ステップＧ１において指定された点
が、文書矩形枠の外側であった場合には、図１４に示し
たような、予め表示されている終了アイコン３２の内部
であるか判定する（ステップＧ８）。ここで指定された
点が終了アイコン３２の内部であれば、文書選択表示部
２０ｇでの処理を終えメイン処理に復帰する。また、指
定された点が、終了アイコン３２の外部であった場合、
ステップＧ１の処理に戻る。When the point specified in step G1 is outside the document rectangular frame, it is determined whether the point is inside the end icon 32 displayed in advance as shown in FIG. 14 (see FIG. 14). Step G8). If the designated point is inside the end icon 32, the processing in the document selection display section 20g is completed and the processing returns to the main processing. When the designated point is outside the end icon 32,
The process returns to step G1.

【０１５５】以上が、本実施形態における文書検索装置
の処理の流れである。なお、本発明は前述した実施形態
の内容に限定されるものではない。例えば、文書一覧ア
イコン表示部２０ｆによって表示される文書一覧におい
て、アイコンは、静止画データに基づいて表示されるの
ではなく、動画データを用いて時間の経過と共に変化さ
せることもできる。また、アイコンデータは、外部記憶
装置１３より記憶容量の大きなＤＶＤ−ＲＡＭなどに格
納してもよい。The above is the processing flow of the document search device in the present embodiment. The present invention is not limited to the contents of the above-described embodiment. For example, in the document list displayed by the document list icon display unit 20f, the icons can be changed over time using moving image data instead of being displayed based on still image data. The icon data may be stored in a DVD-RAM or the like having a larger storage capacity than the external storage device 13.

【０１５６】また、前述した説明では、単語情報データ
は、文ごとに単語を分類し、各文毎にアイコンの付加処
理の対象としているが、段落毎などのような他の分類単
位毎に単語情報データを生成してアイコンの付加処理の
対象とすることもできる。In the above description, the word information data classifies the words for each sentence and targets the icon addition processing for each sentence. It is also possible to generate information data and make it the target of the icon addition processing.

【０１５７】また、文書一覧表示において、アイコンに
対応する文数に応じてアイコンのサイズを変更するもの
として説明しているが、アイコンのサイズを共通とし
て、対応する文数に応じて他の属性を付加して一覧表示
するようにしても良い。例えばアイコンに付随して文数
を表わす文字（数字）やマークを表示したり、文数に応
じた所定の色表示を行なうようにすることもできる。In the document list display, the icon size is changed according to the number of sentences corresponding to the icon. However, the icon size is common, and other attributes are changed according to the corresponding number of sentences. May be added and displayed in a list. For example, characters (numbers) or marks representing the number of sentences may be displayed along with the icons, or a predetermined color display according to the number of sentences may be performed.

【０１５８】なお、上述した実施形態において記載した
手法は、コンピュータに実行させることのできるプログ
ラムとして、例えば磁気ディスク（フロッピーディス
ク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、
ＤＶＤ等）、半導体メモリなどの記録媒体に書き込んで
各種装置に適用したり、通信媒体により伝送して各種装
置に適用することも可能である。本装置を実現するコン
ピュータは、記録媒体に記録されたプログラムを読み込
み、このプログラムによって動作が制御されることによ
り、上述した処理を実行する。その他、発明の趣旨を逸
脱しない範囲で種々の変形が可能である。Note that the method described in the above-described embodiment includes programs that can be executed by a computer, such as a magnetic disk (floppy disk, hard disk, etc.), an optical disk (CD-ROM,
It is also possible to write the data on a recording medium such as a DVD or a semiconductor memory and apply it to various devices, or to transmit it via a communication medium and apply it to various devices. A computer that realizes the present apparatus reads the program recorded on the recording medium, and executes the above-described processing by controlling the operation of the program. In addition, various modifications can be made without departing from the spirit of the invention.

【０１５９】[0159]

【発明の効果】以上詳述したように本発明によれば、見
出しキーワードと、この見出しキーワードと対応づけ
て、共起関係にある少なくとも一つの共起単語を含む少
なくとも一つの共起単語の組を格納し、この格納された
前記共起単語の組に対応づけて、意味的に関連するアイ
コンを表示するためのアイコンデータを格納し、文書デ
ータから、前記見出しキーワードと一致する部分を検索
し、この文書データから検索された見出しキーワードに
対応づけられた前記共起単語の組のそれぞれと、見出し
キーワードが検索された文書データの前記見出しキーワ
ードの位置の周辺に出現する単語との間で類似度を計算
し、この計算された類似度を参照して、類似度が最大と
なった共起単語の組に対応づけられたアイコンデータ
を、前記文書データ中の前記見出しキーワードが検索さ
れた位置に対応づけて付加し、この付加されたアイコン
データを用いて、文書データに対応づけてアイコンを表
示することにより、テキスト（文書データ）中に同じ見
出しキーワードが存在しても、その見出しキーワードが
存在した周辺のテキスト中の他の単語の出現状況によ
り、その意味を表象するアイコンとして異なったアイコ
ンが付加表示されるため、このアイコンによりテキスト
の意味内容をわかりやすい形態で呈示して文書の検索を
容易にすることができる。As described above in detail, according to the present invention, a set of at least one co-occurrence word including at least one co-occurrence word in a co-occurrence relation is associated with the heading keyword. Is stored, icon data for displaying a semantically related icon is stored in association with the stored set of co-occurrence words, and a portion that matches the headline keyword is searched from the document data. A similarity between each of the set of co-occurrence words associated with the headline keyword searched from the document data and a word appearing around the position of the headline keyword in the searched document data. And calculates the icon data associated with the co-occurrence word set having the highest similarity in the document data by referring to the calculated similarity. By adding the headline keyword in association with the searched position and displaying the icon in association with the document data using the added icon data, the same headline keyword exists in the text (document data). However, depending on the appearance of other words in the surrounding text where the heading keyword exists, a different icon is additionally displayed as an icon representing the meaning, so that the meaning of the text can be easily understood by using this icon. To facilitate document search.

[Brief description of the drawings]

【図１】本発明の実施形態に係わる文書検索装置のハー
ドウェアの構成を示すブロック図。FIG. 1 is a block diagram showing a hardware configuration of a document search device according to an embodiment of the present invention.

【図２】メモリ１４に設けられたプログラム部２０とバ
ッファ部２２の詳細な構成を示す図。FIG. 2 is a diagram showing a detailed configuration of a program unit 20 and a buffer unit 22 provided in a memory 14.

【図３】外部記憶装置１３に格納される文書データ及び
単語情報データの格納形式を示す図。FIG. 3 is a view showing a storage format of document data and word information data stored in an external storage device 13;

【図４】本実施形態における一つの文書に対応する単語
情報データの形式の一例を示す図。FIG. 4 is an exemplary view showing an example of a format of word information data corresponding to one document in the embodiment.

【図５】本実施形態における意味アイコンテーブルの構
成を示す図。FIG. 5 is a diagram showing a configuration of a meaning icon table according to the embodiment.

【図６】本実施形態におけるアイコン一覧管理テーブル
のデータ格納形式の一例を示す図。FIG. 6 is a view showing an example of a data storage format of an icon list management table in the embodiment.

【図７】本実施形態における文書検索装置の全体の動作
の流れについて示すフローチャート。FIG. 7 is an exemplary flowchart showing the flow of the entire operation of the document search device according to the embodiment.

【図８】文アイコン付加処理部による処理の詳細につい
て示すフローチャート。FIG. 8 is a flowchart showing details of processing by a sentence icon addition processing unit.

【図９】見出しキーワード検索部による処理の詳細につ
いて示すフローチャート。FIG. 9 is a flowchart illustrating details of a process performed by a heading keyword search unit.

【図１０】類似度計算部による処理の詳細について示す
フローチャート。FIG. 10 is a flowchart illustrating details of a process performed by a similarity calculation unit.

【図１１】アイコンソート部による一部の処理の詳細に
ついて示すフローチャート。FIG. 11 is a flowchart showing details of a part of processing by an icon sorting unit.

【図１２】アイコンソート部による一部の処理の詳細に
ついて示すフローチャート。FIG. 12 is a flowchart showing details of a part of processing by an icon sorting unit.

【図１３】文書一覧アイコン表示部による処理の詳細に
ついて示すフローチャート。FIG. 13 is a flowchart showing details of processing by a document list icon display unit.

【図１４】文書一覧アイコン表示部による処理によって
表示される表示画面の一例を示す図。FIG. 14 is a view showing an example of a display screen displayed by processing by a document list icon display unit.

【図１５】文書選択表示部による処理の詳細について示
すフローチャート。FIG. 15 is a flowchart illustrating details of processing by a document selection display unit.

[Explanation of symbols]

１０…制御装置１１…入力装置１２…表示装置１３…外部記憶装置１３ａ…文書データ蓄積部１３ｂ…意味アイコンテーブル格納部１３ｃ…アイコン本体データ格納部１４…メモリ１５…通信装置２０ａ…メイン処理部２０ｂ…文アイコン付加処理部２０ｃ…見出しキーワード検索部２０ｄ…類似度計算部２０ｅ…アイコンソート部２０ｆ…文書一覧アイコン表示部２０ｇ…文書選択表示部２２ａ…文単語情報格納バッファ２２ｂ…アイコンテーブル格納バッファ２２ｃ…見出しキーワード数格納バッファ２２ｄ…文アイコン対応関係格納バッファ２２ｅ…文アイコン数格納バッファ２２ｆ…アイコン一覧管理バッファ２２ｇ…アイコンイメージ格納バッファ２２ｈ…付加元キーワード格納バッファ２２ｉ…作業変数格納エリア DESCRIPTION OF SYMBOLS 10 ... Control device 11 ... Input device 12 ... Display device 13 ... External storage device 13a ... Document data storage part 13b ... Semantic icon table storage part 13c ... Icon body data storage part 14 ... Memory 15 ... Communication device 20a ... Main processing part 20b ... sentence icon addition processing unit 20c ... headline keyword search unit 20d ... similarity calculation unit 20e ... icon sort unit 20f ... document list icon display unit 20g ... document selection display unit 22a ... sentence word information storage buffer 22b ... icon table storage buffer 22c ... headword keyword number storage buffer 22d ... sentence icon correspondence storage buffer 22e ... sentence icon number storage buffer 22f ... icon list management buffer 22g ... icon image storage buffer 22h ... addition source keyword storage buffer 22i ... work variable storage A

Claims

[Claims]

1. A table storage means for storing a set of at least one co-occurrence word including at least one co-occurrence word in a co-occurrence relationship with the headline keyword, and the table Icon data storage means for storing icon data for displaying semantically related icons in association with the set of co-occurrence words stored in the storage means, and document data storage means for storing document data From the document data stored in the document data storage means,
A headline keyword search unit for searching for a part that matches the headline keyword stored in the table storage unit; and a storage unit in the table storage unit associated with the headline keyword searched from the document data by the headline keyword search unit. Similarity calculating means for calculating a similarity between each of the set of co-occurred words and a word appearing around the position of the headline keyword in the document data from which the headline keyword has been searched; The icon data stored in the icon data storage unit associated with the set of co-occurrence words having the maximum similarity is referred to by referring to the similarity calculated by the means. An icon adding unit that adds a keyword in association with a searched position; With the added icon data Te, document search apparatus characterized by comprising a icon display means for displaying an icon in association with the document data.

2. The icon display means displays a list of icons for a plurality of document data added by the icon addition means, and inputs an instruction for the icons displayed in a list by the icon display means. 2. The document search device according to claim 1, further comprising a document selection display unit for displaying the contents of the document data corresponding to the designated icon.

3. The document search device according to claim 2, wherein said icon display means changes the shape of the icon to be displayed according to the number of sentences associated with one icon.

4. The document search apparatus according to claim 2, wherein said icon display means displays a headline keyword corresponding to each icon in association with each of the icons displayed in a list.

5. The document search device according to claim 1, wherein the icon data stored in the icon data storage unit is associated with a set of co-occurring words corresponding to a plurality of different headline keywords.

6. A plurality of icon data added to one document data by said icon adding means.
An icon sorter for grouping icon data having the same icon data and having a common headline keyword as an addition source, wherein the icon display unit displays an icon using the icon data collected by the icon sorter. 2. The document search apparatus according to claim 1, wherein the search is performed.

7. The icon display means displays a frame representing the entire document, and displays an icon added to the document data of the document in the frame. 3. The document search apparatus according to claim 2, wherein the entire contents of the document data are displayed when a portion other than an icon is designated inside a frame representing the entire document displayed by the display means.

8. A headline keyword and a set of at least one co-occurrence word including at least one co-occurrence word in a co-occurrence relation are stored in association with the headline keyword. The icon data for displaying the semantically related icons is stored in association with the set of. The document data is searched for a portion that matches the headline keyword, and the part corresponding to the headline keyword searched from the document data is searched. Calculating a similarity between each of the set of the co-occurred words and a word appearing around the position of the heading keyword in the document data from which the heading keyword has been searched, and calculating the calculated similarity. By referring to the icon data associated with the co-occurrence word set having the highest similarity, the headline keyword in the document data is searched. It added in association with the position, with this added icon data, meaning the icon added display method characterized by displaying an icon in association with the document data.

9. A set of at least one co-occurrence word including at least one co-occurrence word having a co-occurrence relation is stored in association with the heading keyword, and the stored co-occurrence word is stored. The icon data for displaying the semantically related icons is stored in association with the set of. The document data is searched for a portion that matches the headline keyword, and the part corresponding to the headline keyword searched from the document data is searched. Calculating a similarity between each of the set of the co-occurred words and a word appearing around the position of the heading keyword in the document data from which the heading keyword has been searched, and calculating the calculated similarity. By referring to the icon data associated with the co-occurrence word set having the highest similarity, the headline keyword in the document data is searched. It added in association with the position, with this added icon data, computer-readable recording medium storing a program for controlling a computer to display an icon in association with the document data.