JP2003196130A

JP2003196130A - File management device and computer program

Info

Publication number: JP2003196130A
Application number: JP2001392621A
Authority: JP
Inventors: Naoya Uematsu; 直也植松
Original assignee: JustSystems Corp
Current assignee: JustSystems Corp
Priority date: 2001-12-25
Filing date: 2001-12-25
Publication date: 2003-07-11

Abstract

<P>PROBLEM TO BE SOLVED: To solve the complicatedness in the classification and management of a large quantity of files by contents. <P>SOLUTION: An analytic processing part 22 analyses the structure of an accumulated file. A generation processing part 24 XML-converts the accumulated file to reflect the result of the analysis, and an importance determining part 60 determines the importance of each factor of the accumulated file. A mark giving part 62 gives a mark to the factor determined important to reflect it to the XML-converted accumulated file, and a summary forming part 64 forms a summary while eliminating the unnecessary factors. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、ファイル管理技
術に関する。この発明は特に、多数のファイルを検索が
容易となる形で効率的に管理する技術に関する。TECHNICAL FIELD The present invention relates to a file management technique. The present invention particularly relates to a technique for efficiently managing a large number of files in a form that facilitates searching.

【０００２】[0002]

【従来の技術】近年、ＰＣ（パーソナルコンピュータ）
の普及により、あらゆる書類の電子化が進んでいる。ワ
ードプロセッサなどの文書作成ソフトウエアを用いてフ
ァイルを電子的に作成し、これがハードディスクに蓄積
されていく。企業内の環境ではコンピュータ同士をネッ
トワークで接続し、大量の文書ファイルを複数ユーザ間
で共有することも多い。インターネットが普及した現在
では、ウェブページや電子メールなどの社外から受け取
るデータも増えている。こうした大量のファイルの中か
ら誰でも所望のファイルを探せるように、所定の管理者
が予め分類しておくこともある。2. Description of the Related Art In recent years, a PC (personal computer)
With the spread of, all documents are being digitized. Files are created electronically using document creation software such as a word processor, and these are stored in the hard disk. In an enterprise environment, computers are often connected to each other via a network, and a large number of document files are often shared by multiple users. Now that the Internet has spread, the amount of data received from outside such as web pages and emails is increasing. A predetermined administrator may classify the files in advance so that anyone can search for a desired file from such a large number of files.

【０００３】[0003]

【発明が解決しようとする課題】ここで、複数のファイ
ルを分類する方法として、ファイルの内容に応じてグル
ープ化してそれぞれを別々のフォルダに格納する方法が
ある。しかしながら、ファイルのグループを一義的に定
めることは困難であるばかりか、大量のファイルを特定
の管理者が予め分類したとしても分類の基準が管理者の
主観に依存して却って検索が困難となる場合もある。こ
うして管理と検索の双方が容易でないために、貴重な資
料が再利用されずに眠ったままになることは決して珍し
くない。As a method of classifying a plurality of files, there is a method of grouping according to the contents of the files and storing each in a separate folder. However, it is not only difficult to uniquely define a group of files, but even if a large number of files are classified in advance by a particular administrator, the criteria for classification depend on the subjectivity of the administrator, making it difficult to search. In some cases. Thus, it is not uncommon for valuable material to remain asleep without being reused because it is both difficult to manage and retrieve.

【０００４】一方、複数ユーザ間で共有されるファイル
は、多くのユーザにとって利用価値の高いものもあれ
ば、そもそもほんの一部のユーザにしか利用価値がない
ものもある。また、頻繁に再利用されるものやそうでな
いものも含まれる。したがって、大量のファイルのうち
大半が各ユーザにとって不要である可能性が高い。その
ようなファイルが混在した状態でファイルを種類別に分
類しても、必ずしも検索の容易化にはつながらない。真
に利用価値の高いファイルだけを簡単に探し出すことが
できれば、多くのユーザに作業効率の向上をもたらすこ
とになる。On the other hand, some files shared by a plurality of users have a high utility value for many users, and some have a utility value for only a few users in the first place. It also includes those that are frequently reused and those that are not. Therefore, it is likely that most of the large number of files will not be needed by each user. Classification of files by type in a state where such files are mixed does not necessarily lead to easy search. If you can easily find only the files that are really useful, many users will be able to work more efficiently.

【０００５】本発明者は以上の認識に基づき本発明をな
したもので、その目的は、利便性の高い方法で多数のフ
ァイルを管理するための技術の提供にある。The present inventor has made the present invention based on the above recognition, and an object thereof is to provide a technique for managing a large number of files by a highly convenient method.

【０００６】[0006]

【課題を解決するための手段】本発明のある態様は、フ
ァイル管理装置に関する。この装置は、コンピュータで
利用可能に電子的に記録された複数のファイルを管理す
る装置であって、ファイルの構造を解析する構造解析部
と、その解析の結果、その構造が特徴的か否かを判定す
る重要度判定部と、重要度判定部にてファイルの構造が
特徴的と判定された場合、その構造が特徴的である旨を
所定のマークで示しそれをファイルに関連づけるマーク
付与部とを含む。One aspect of the present invention relates to a file management device. This device is a device that manages a plurality of files that are electronically recorded so that they can be used by a computer, and a structure analysis unit that analyzes the structure of a file and whether the structure is characteristic as a result of the analysis. An importance determining unit that determines whether the structure of the file is characteristic by the importance determining unit, and a mark assigning unit that indicates that the structure is characteristic by a predetermined mark and associates it with the file. including.

【０００７】「複数のファイル」とは、後の要約作成の
対象となるファイル群であり、要約作成のための特別な
管理が成されている必要はない。また、ここで言うファ
イルとは、文書ファイルなどテキストデータを想定して
いるがこれに限る趣旨ではなく、イメージデータや音声
データであってもよいし、それら任意の組合せにより構
成されていてもよい。The "plurality of files" are a group of files to be created later, and it is not necessary to perform special management for creating the abstract. Further, the file mentioned here is assumed to be text data such as a document file, but is not limited to this, and may be image data or audio data, or may be configured by any combination thereof. .

【０００８】また、ファイルの構造を解析するにあた
り、既知の技術である構文解析、意味解析、レイアウト
解析などの手法を利用してよい。「構造が特徴的」と
は、画像データが多く含まれていたり、音声データが多
く含まれていたり、ということが想定できるがこれに限
る趣旨ではない。In analyzing the file structure, known techniques such as syntax analysis, semantic analysis and layout analysis may be used. The "structure is characteristic" can be assumed to include a large amount of image data or a large amount of audio data, but is not limited to this.

【０００９】本発明の別の態様もファイル管理装置に関
する。この装置は、コンピュータで利用可能に電子的に
記録された複数のファイルを管理する装置であって、フ
ァイルを要素に分解する構造解析部と、分解された各要
素がそのファイルにおいて特徴的か否かを判定する重要
度判定部と、重要度判定部にて特徴的と判定された要素
に、それが前記ファイル中にて顕在化されるよう所定の
マークを付与するマーク付与部とを含む。Another aspect of the present invention also relates to a file management device. This device is a device that manages a plurality of electronically recorded files that can be used by a computer, and a structure analysis unit that decomposes a file into elements and whether or not each decomposed element is characteristic of the file. An importance degree determining unit that determines whether or not the element and a mark assigning unit that assigns a predetermined mark to the element determined to be characteristic by the importance degree determining unit so that the element is revealed in the file.

【００１０】ここでファイルを要素に分解する手法とし
て形態素解析が想定できるがこれに限る趣旨ではない。
当然、上述の構文解析、意味解析、レイアウト解析など
の手法が組み合わされてもよい。また、付与されるマー
クは、複数であってよく、一般にその要素の「特徴的で
ある度合い」が「１、２、３」のように数段階に分けて
表される。また、そのマークに要素の属性が利用されて
もよい。例えば、形態素解析により、ある要素が「固有
名詞」と解析されたとき、その要素に「固有名詞」とい
うマークを付与する。Morphological analysis can be assumed here as a method of decomposing a file into elements, but the invention is not limited to this.
Naturally, the methods such as the above-mentioned syntax analysis, semantic analysis, layout analysis, etc. may be combined. Further, a plurality of marks may be given, and in general, the "characteristic degree" of the element is expressed in several stages such as "1, 2, 3". Moreover, the attribute of the element may be used for the mark. For example, when a certain element is analyzed as a "proper noun" by morphological analysis, the mark "proper noun" is given to the element.

【００１１】また、重要度判定部は、要素のファイル中
において出現する位置を検出し、その位置を参照してそ
の要素が特徴的か否か判定してもよい。例えば、ある文
書ファイルにこの装置を適用することを想定し、そのフ
ァイルにタイトルが記述してある場合、一般にタイトル
中に含まれる単語はキーワードとなる場合が多いので、
そのような単語は特徴的であると判定する。Further, the importance determining section may detect the position of the element appearing in the file, and refer to the position to determine whether or not the element is characteristic. For example, assuming that this device is applied to a certain document file, and the title is described in that file, the words contained in the title are often keywords, so
Such words are determined to be characteristic.

【００１２】また、重要度判定部は、要素の当該ファイ
ル中における出現の頻度と、他のファイル中における出
現の頻度を比較することで、その要素が特徴的であるか
否かを判定してもよい。例えば、そのファイル中である
一定回数出現する単語を特徴的と判定する。また、例え
ば同一装置内において管理されている他のファイルを参
照し、特徴的か否か判定すべき要素の出現頻度を比較
し、他のファイル中で出現頻度が多いものを特徴的と判
定したり、逆に特徴的でなく一般的と判定する。この判
定の基準は多数存在し、一般に、この装置を利用するユ
ーザがその基準を適宜設定する。Further, the importance degree judging section judges whether or not the element is characteristic by comparing the frequency of appearance of the element in the file with the frequency of appearance of the element in another file. Good. For example, a word that appears a certain number of times in the file is determined to be characteristic. Further, for example, other files managed in the same device are referred to, the appearance frequencies of elements that should be judged as to whether they are characteristic or not are compared, and the one with the highest appearance frequency among other files is judged to be characteristic. Or, conversely, it is judged to be general rather than characteristic. There are many criteria for this determination, and generally, a user who uses this device sets the criteria as appropriate.

【００１３】また、マークが付与された対象とそのマー
クの組み合わせをインデックスとして記録する索引格納
部と、をさらに含んでもよい。例えば、マークが付与さ
れた要素とその出現頻度の対応を示したテーブルが想定
できるがこれに限る趣旨ではない。Further, it may further include an object to which a mark is added and an index storage section for recording a combination of the mark as an index. For example, a table showing the correspondence between the marked elements and their appearance frequencies can be assumed, but the invention is not limited to this.

【００１４】また、索引格納部は、重要度判定部におい
て特徴的と判定された対象を抽出することで、特徴的で
ないと判定された対象を省きファイルを圧縮しインデッ
クスを作成してもよい。ファイル中で、特徴的でない
と、つまりは重要でないと判定された要素を削除する。
そのときファイルの要約がインデックスとして作成され
てもよい。当然、このとき文章として体裁を成すよう、
上述の形態素解析や構文解析などの手法が用いられる。Further, the index storage section may extract the object determined to be characteristic by the importance degree determination section, omit the object determined to be not characteristic, and compress the file to create an index. In the file, delete the elements that are determined to be uncharacteristic, that is, not important.
The file summary may then be indexed. Naturally, at this time, make sure that the text is formatted,
Techniques such as the above morphological analysis and syntactic analysis are used.

【００１５】また、索引格納部は、作成されるインデッ
クスが保存されるべき記憶媒体の記憶可能な残容量に応
じて、ファイルのインデックスを作成する際の圧縮の程
度を決定してもよい。また、索引格納部は、ファイルの
インデックスを作成する際の圧縮の程度に関する指示、
例えば「ファイルのサイズを３０％にする」や「２００
語以内にする」などの指示をユーザから受け付けてもよ
い。Further, the index storage unit may determine the degree of compression at the time of creating the index of the file according to the storable remaining capacity of the storage medium in which the created index is to be stored. In addition, the index storage unit gives an instruction regarding the degree of compression when creating an index of a file,
For example, "Set file size to 30%" or "200
An instruction such as “within words” may be received from the user.

【００１６】また、ファイルの更新履歴を検査する更新
検査部を含んでもよく、そのファイルの更新が確認され
たとき索引格納部が再度前記ファイルのインデックスを
作成してもよい。Further, an update checking section for checking the update history of the file may be included, and the index storage section may re-create the index of the file when the update of the file is confirmed.

【００１７】なお、以上の構成要素の任意の組合せや、
本発明の構成要素や表現を方法、装置、システム、コン
ピュータプログラム、コンピュータプログラムを格納し
た記録媒体などの間で相互に置換したものもまた、本発
明の態様として有効である。Any combination of the above components,
It is also effective as an aspect of the present invention that the components and expressions of the present invention are mutually replaced among methods, devices, systems, computer programs, recording media storing computer programs, and the like.

【００１８】[0018]

【発明の実施の形態】（前提技術）この前提技術におけ
るファイル検索装置は、検索条件としてユーザが指定し
た文章に類似するファイルを複数のファイルの中から検
索する。これにより、予め内容に応じてファイルを分類
しておかなくともファイルの検索が容易となり、大量の
ファイルを管理する負担が軽減される。BEST MODE FOR CARRYING OUT THE INVENTION (Prerequisite Technology) A file search device according to this prerequisite technology searches a plurality of files for a file similar to a sentence designated by a user as a search condition. As a result, it becomes easy to search for files without classifying the files according to the contents in advance, and the burden of managing a large number of files is reduced.

【００１９】図１は、前提技術におけるファイル検索装
置の構成を示す機能ブロック図である。ファイル検索装
置１０は、複数のファイルから所望のファイルを検索す
る際に参照される索引データの生成に必要な処理をなす
処理ユニット２０と、ユーザから指定された条件に基づ
いて検索処理をなす検索ユニット３０と、検索対象とな
る複数のファイル（以下、「蓄積ファイル」という。）
や検索処理に必要なデータを保持する保持ユニット４０
と、本装置と外部との間でデータの入出力を処理する入
出力処理部５０と、を有する。FIG. 1 is a functional block diagram showing the configuration of a file search device in the base technology. The file search device 10 includes a processing unit 20 that performs a process required to generate index data that is referred to when a desired file is searched from a plurality of files, and a search process that performs a search process based on a condition designated by a user. The unit 30 and a plurality of files to be searched (hereinafter referred to as "accumulation files")
Holding unit 40 for holding data necessary for search processing
And an input / output processing unit 50 that processes input / output of data between this device and the outside.

【００２０】このファイル検索装置１０は、ハードウエ
ア的にはコンピュータのＣＰＵやメモリなどの構成で実
現でき、ソフトウエア的にはファイル管理やファイル検
索機能のあるプログラムなどによって実現できるが、本
図ではそれらの連携によって実現される機能ブロックを
描いている。したがって、これらの機能ブロックはハー
ドウエア、ソフトウエアの組合せによっていろいろなか
たちで実現できる。The file search device 10 can be realized by hardware such as a CPU and a memory of a computer, and can be realized by software such as a program having a file management or file search function. It depicts the functional blocks that are realized through these collaborations. Therefore, these functional blocks can be realized in various ways depending on the combination of hardware and software.

【００２１】処理ユニット２０は、保持ユニット４０が
保持する複数の蓄積ファイルを処理対象とし、それぞれ
から特徴的な文字列を複数抽出する。この複数の特徴的
な文字列は、その蓄積ファイルの内容を端的に示したコ
ンセプト（概念）を形成するものとし、このコンセプト
を索引データとして記録する。処理ユニット２０は、蓄
積ファイルに含まれる文字列を言語解析する解析処理部
２２と、その解析結果に基づいて索引データを生成する
生成処理部２４とを含む。The processing unit 20 processes a plurality of accumulated files held by the holding unit 40 and extracts a plurality of characteristic character strings from each of them. The plurality of characteristic character strings form a concept (concept) that briefly shows the contents of the storage file, and the concept is recorded as index data. The processing unit 20 includes an analysis processing unit 22 that linguistically analyzes a character string included in the accumulated file, and a generation processing unit 24 that generates index data based on the analysis result.

【００２２】解析処理部２２は、前処理部２６と文字列
抽出部２７を含む。前処理部２６は、言語解析に先だっ
て前処理を行う。例えば、処理対象となる蓄積ファイル
からファイル形式や文書形式を検出し、これに基づいて
その蓄積ファイルをテキスト形式などの非定型な形式に
変換して解析容易な状態を形成してもよい。ひとつの蓄
積ファイルを複数のブロックに分割して解析に適した状
態を形成してもよい。このとき形態素解析、構文解析、
意味解析などの技術を利用してもよい。The analysis processing section 22 includes a preprocessing section 26 and a character string extraction section 27. The preprocessing unit 26 performs preprocessing prior to language analysis. For example, a file format or a document format may be detected from the storage file to be processed, and the storage file may be converted into an atypical format such as a text format based on the file format or the document format to form an easily analyzed state. One storage file may be divided into a plurality of blocks to form a state suitable for analysis. At this time, morphological analysis, syntactic analysis,
A technique such as semantic analysis may be used.

【００２３】文字列抽出部２７は、処理対象の蓄積ファ
イルから複数の文字列を抽出する。後述する単語辞書に
含まれる単語を抽出する形でもよいし、スペースやブラ
ンクで区切られた文字列を単語として認識してもよい。The character string extraction unit 27 extracts a plurality of character strings from the storage file to be processed. The words included in a word dictionary described later may be extracted, or a character string delimited by spaces or blanks may be recognized as a word.

【００２４】生成処理部２４は、統計処理部２８と索引
生成部２９を含む。統計処理部２８は、抽出された文字
列のその蓄積ファイルにおける出現頻度を計数するとと
もに、ファイル格納部４２が保持する複数の蓄積ファイ
ル全体にわたるその文字列の出現頻度を計数する。この
とき文字列同士の類似性を考慮する。例えば、類義語、
同義語、統制語として定義された相互に意味が近似する
複数の単語間の相違を吸収して文字列の出現頻度を計数
する。The generation processing unit 24 includes a statistical processing unit 28 and an index generation unit 29. The statistical processing unit 28 counts the appearance frequency of the extracted character string in the storage file and also the appearance frequency of the character string in all the plurality of storage files held by the file storage unit 42. At this time, the similarity between the character strings is considered. For example, synonyms,
The appearance frequency of a character string is counted by absorbing the difference between a plurality of words defined as synonyms and controlled words and having similar meanings.

【００２５】索引生成部２９は、統計処理部２８によっ
て計数された文字列の出現頻度に基づいて索引データを
生成する。この索引データは、抽出された複数の文字列
にそれぞれの出現頻度に応じた重み付けが付加された一
覧として構成される。各文字列に対する重み付けは、処
理対象の蓄積ファイルにおける出願頻度が高い文字列ほ
ど重み付けを高くする一方で、ファイル格納部４２が保
持する複数の蓄積ファイル全体にわたって出現頻度が高
い文字列に対しては重み付けを低くする。その結果、そ
の蓄積ファイルに特有な文字列を統計的な手法で顕在化
させることができる。各蓄積ファイルが前処理部２６に
よって複数のブロックに分割された場合はブロックごと
に索引データが生成される。The index generator 29 generates index data based on the appearance frequency of the character string counted by the statistical processor 28. This index data is configured as a list in which weights are added to the extracted plurality of character strings according to their respective appearance frequencies. As for the weighting for each character string, the character string having a higher application frequency in the storage file to be processed is set to have a higher weighting, while the character string having a high appearance frequency over all the plurality of storage files held by the file storage unit 42 is set. Lower the weight. As a result, the character string unique to the accumulated file can be revealed by a statistical method. When each accumulated file is divided into a plurality of blocks by the preprocessing unit 26, index data is generated for each block.

【００２６】保持ユニット４０は、ファイル格納部４
２、索引格納部４４、辞書格納部４６、および関連デー
タ格納部４８を含む。ファイル格納部４２は、複数の蓄
積ファイルを保持する。例えばワードプロセッサなどの
文書作成ソフトウエアによって生成された文書ファイ
ル、ＨＴＭＬ（Hyper Text Markup Language）やＸＭＬ
（eXtensible Markup Language）などの記述言語を用い
て生成されたファイルなど、多様な形式のファイルを含
み、その内容は必ずしも文章でなくともよい。また、蓄
積ファイル自体は、検索を前提とした分類および定型化
が予めなされることを必要としない。The holding unit 40 includes a file storage unit 4
2, an index storage unit 44, a dictionary storage unit 46, and a related data storage unit 48. The file storage unit 42 holds a plurality of accumulated files. For example, a document file generated by document creation software such as a word processor, HTML (Hyper Text Markup Language) or XML.
It includes files in various formats, such as files generated using a description language such as (eXtensible Markup Language), and the contents thereof need not necessarily be sentences. Further, the accumulated file itself does not need to be classified and standardized in advance for retrieval.

【００２７】索引格納部４４は、処理ユニット２０によ
って生成された索引データを蓄積ファイルと対応づけら
れたかたちで保持する。辞書格納部４６は、単語辞書、
類義語辞書、同義語辞書、統制語辞書など、処理ユニッ
ト２０による言語解析や統計処理において参照されるデ
ータを保持する。関連データ格納部４８は、検索ユニッ
ト３０による処理においてオプション的に利用されるデ
ータを保持する。例えば、検索条件として指定された言
葉を上位概念の単語、下位概念の単語、関連性をもつ単
語などに置き換えるために参照する関連辞書を保持す
る。こうしたデータを処理ユニット２０が蓄積ファイル
から抽出して生成してもよい。The index storage unit 44 holds the index data generated by the processing unit 20 in a form associated with the storage file. The dictionary storage unit 46 is a word dictionary,
Data such as a synonym dictionary, a synonym dictionary, and a controlled word dictionary that are referred to in the language analysis and statistical processing by the processing unit 20 are held. The related data storage unit 48 holds data that is optionally used in the processing by the search unit 30. For example, it holds a related dictionary that is referred to in order to replace a word specified as a search condition with a word of a superordinate concept, a word of a subordinate concept, a word having relevance, or the like. Such data may be generated by the processing unit 20 by extracting it from the accumulated file.

【００２８】検索ユニット３０は、ユーザから検索条件
を受け取り、これに適合する蓄積ファイルをファイル格
納部４２から抽出する。検索ユニット３０は、検索条件
と索引データを比較する比較処理部３２と、比較結果に
基づいて検索条件に適合する蓄積ファイルをユーザに提
示する結果処理部３４を含む。The search unit 30 receives a search condition from the user and extracts a storage file matching the search condition from the file storage unit 42. The search unit 30 includes a comparison processing unit 32 that compares the search condition with the index data, and a result processing unit 34 that presents to the user an accumulation file that matches the search condition based on the comparison result.

【００２９】比較処理部３２は、条件設定部３６および
類似度判断部３７を含む。条件設定部３６は、ユーザか
ら検索条件を取得する。この検索条件は、自然文によっ
て記述された文章のかたちでもよいし、何らかの文字列
を含んだファイルのかたちでもよい。その検索条件は処
理ユニット２０に送られて前述した索引データの生成過
程と同様の処理対象となり、その検索条件のコンセプト
が生成される。The comparison processing section 32 includes a condition setting section 36 and a similarity determination section 37. The condition setting unit 36 acquires the search condition from the user. This search condition may be in the form of a sentence described by a natural sentence or in the form of a file containing some character string. The search condition is sent to the processing unit 20 and is subjected to the same processing as the above-described index data generation process, and the concept of the search condition is generated.

【００３０】類似度判断部３７は、検索条件のコンセプ
トと索引データとして記録されたコンセプト同士を比較
することにより、検索条件と蓄積ファイルの類似度を検
出する。比較の際に、辞書格納部４６や関連データ格納
部４８が保持する各種辞書に基づき、検索条件に含まれ
る文字列と関連する他の文字列を追加してその検索条件
を補完してもよい。The similarity determination unit 37 detects the similarity between the search condition and the accumulated file by comparing the concept of the search condition and the concepts recorded as index data. At the time of comparison, based on various dictionaries stored in the dictionary storage unit 46 or the related data storage unit 48, another character string related to the character string included in the search condition may be added to complement the search condition. .

【００３１】ここで、検索条件と索引データの比較には
ベクトル空間モデルを利用する。すなわち、検索条件の
コンセプトと索引データのコンセプトをそれぞれ多次元
空間上のベクトルとして表現し、これらを比較する。コ
ンセプトにｎ個の文字列が含まれる場合はｎ次元のベク
トル空間が形成され、各文字列の出現頻度に応じた重み
付けが各成分に加えられる。こうして形成されるベクト
ル同士の近似度が検索条件と蓄積ファイルの類似度とな
る。Here, a vector space model is used to compare the search condition with the index data. That is, the concept of search conditions and the concept of index data are expressed as vectors in a multidimensional space, and these are compared. When the concept includes n character strings, an n-dimensional vector space is formed and each component is weighted according to the appearance frequency of each character string. The degree of approximation between the vectors thus formed is the degree of similarity between the search condition and the accumulated file.

【００３２】結果処理部３４は、一覧生成部３８および
表示処理部３９を含む。一覧生成部３８は、類似度の高
い順に蓄積ファイルの一覧を生成する。このとき一覧に
含まれる蓄積ファイルの数が適当な数に限定されるよう
調整してもよい。The result processing section 34 includes a list generation section 38 and a display processing section 39. The list generation unit 38 generates a list of accumulated files in descending order of similarity. At this time, the number of stored files included in the list may be adjusted to an appropriate number.

【００３３】表示処理部３９は、検索結果として蓄積フ
ァイルの一覧を画面に表示させる。蓄積ファイルの一覧
は、ファイル名とその内容の要約で構成してもよい。The display processing unit 39 displays a list of accumulated files as a search result on the screen. The list of accumulated files may consist of file names and a summary of their contents.

【００３４】入出力処理部５０は、ファイル検索装置１
０に対する各種処理の指示、検索条件の入力、検索結果
の出力など、ファイル検索装置１０とその外部との間で
データを入出力するインタフェイスである。ファイル検
索装置１０がスタンドアロンで実現される場合にはユー
ザと本装置を結ぶインタフェイスとなり、ファイル検索
装置１０がネットワークサーバとして実現される場合に
は本装置をクライアント端末とネットワークを介して接
続させる通信インタフェイスとなる。The input / output processing unit 50 is used by the file search device 1
This is an interface for inputting and outputting data between the file search device 10 and the outside thereof, such as various processing instructions for 0, input of search conditions, output of search results, and the like. When the file search device 10 is realized as a stand-alone, it serves as an interface connecting a user and this device, and when the file search device 10 is realized as a network server, communication for connecting this device with a client terminal via a network. Become an interface.

【００３５】図２は、前提技術における索引データの生
成過程を示すフローチャートである。まず、複数のファ
イルから処理対象となる蓄積ファイルを設定し（Ｓ１
０）、その蓄積ファイルに前処理を施し（Ｓ１２）、そ
の蓄積ファイルから形態素解析などの処理により文字列
を抽出する（Ｓ１４）。抽出された文字列ごとに出現頻
度などの統計的なデータを算出し（Ｓ１６）、これをも
とに索引データを生成する（Ｓ１８）。まだ索引データ
生成がされていない蓄積ファイルがファイル格納部４２
に残っている場合（Ｓ２０Ｙ）、その残りファイルを処
理対象にしてＳ１０〜Ｓ１８の処理を施し、すべての蓄
積ファイルを処理するまでこれを繰り返す（Ｓ２０）。FIG. 2 is a flowchart showing the index data generation process in the base technology. First, a storage file to be processed is set from a plurality of files (S1
0), preprocesses the accumulated file (S12), and extracts a character string from the accumulated file by a process such as morphological analysis (S14). Statistical data such as the appearance frequency is calculated for each extracted character string (S16), and index data is generated based on this (S18). The storage file that has not been generated as index data is the file storage unit 42.
If the remaining files remain (S20Y), the remaining files are subjected to the processes of S10 to S18, and this is repeated until all the accumulated files are processed (S20).

【００３６】図３は、前提技術における検索過程を示す
フローチャートである。まず、検索条件となる文章をユ
ーザが自然文の形で指定すると（Ｓ３０）、処理ユニッ
ト２０がその検索条件から文字列を抽出して索引データ
を生成する（Ｓ３２）。その索引データと索引格納部４
４が保持する複数の索引データを照合してそれぞれの類
似度を判断し（Ｓ３４）、その類似度の順に蓄積ファイ
ルの一覧を生成し（Ｓ３６）、これを検索結果として画
面に表示させる（Ｓ３８）。FIG. 3 is a flowchart showing the search process in the base technology. First, when the user specifies a sentence as a search condition in the form of a natural sentence (S30), the processing unit 20 extracts a character string from the search condition and generates index data (S32). The index data and the index storage unit 4
4 compares a plurality of index data held therein to determine the degree of similarity (S34), generates a list of accumulated files in the order of the degree of similarity (S36), and displays this as a search result on the screen (S38). ).

【００３７】以上の前提技術との対比において、以下、
実施の形態を説明する。なお、前提技術に含まれる機能
ブロックと同じ働きをなす機能ブロックに対しては同じ
名称と符号を付すとともに、その説明を適宜省略する。In comparison with the above base technology,
An embodiment will be described. In addition, the same names and reference numerals are given to the functional blocks having the same functions as the functional blocks included in the base technology, and the description thereof will be appropriately omitted.

【００３８】（実施の形態）本実施の形態では、ファイ
ルを検索する際に利用するユーザが利用しやすいインデ
ックスファイルを作成するものである。特にそのインデ
ックスファイルとして、要約が利用される。ファイルの
要約を作成する際に、ファイルの構造を解析し、その結
果特徴的であると判定された部分に、その「特徴的であ
る」程度を重要度という属性で表しマークとして付与す
る。そして、付与されたマークを手がかりに要約が作成
される。以下の説明では、要約作成の対象となるファイ
ルをＸＭＬ形式に変換することで、ファイルの構造の解
析結果や重要度といった属性をファイル中で顕在化させ
ることが容易となる。(Embodiment) In the present embodiment, an index file which is easy for a user to use when searching for a file is created. In particular, the abstract is used as the index file. When creating a summary of a file, the structure of the file is analyzed, and as a result, the "characteristic" degree is represented by an attribute of importance and added as a mark to a portion determined to be characteristic. Then, a summary is created with the added marks as clues. In the following description, by converting the file that is the target of the summary creation into the XML format, it becomes easy to reveal the attributes such as the analysis result of the file structure and the importance in the file.

【００３９】図４は、実施の形態に係るファイル管理サ
ーバ１２０を含む検索システム１３０の全体構成を示す
機能ブロック図である。検索システム１３０において、
ファイル管理サーバ１２０はネットワーク１２４を介し
て複数のユーザ端末１２２と接続される。ファイル管理
サーバ１２０は、例えば業務において複数のユーザ間で
共有されるビジネス文書などの複数の蓄積ファイルを保
持する。ユーザがファイル管理サーバ１２０上の蓄積フ
ァイルを操作または検索する場合、ユーザ端末１２２が
その指示をファイル管理サーバ１２０へ発する。ユーザ
端末１２２はＰＣ等の情報処理装置である。ネットワー
ク１２４は、例えば企業内で敷設されるＬＡＮ（Local
Area Network）である。FIG. 4 is a functional block diagram showing the overall configuration of the search system 130 including the file management server 120 according to the embodiment. In the search system 130,
The file management server 120 is connected to the plurality of user terminals 122 via the network 124. The file management server 120 holds a plurality of storage files such as business documents shared by a plurality of users in business. When the user operates or searches the accumulated file on the file management server 120, the user terminal 122 issues the instruction to the file management server 120. The user terminal 122 is an information processing device such as a PC. The network 124 is, for example, a LAN (Local) installed in a company.
Area Network).

【００４０】図５は、ファイル管理サーバ１２０の構成
を示す機能ブロック図である。ファイル管理サーバ１２
０は、処理ユニット２０と、検索ユニット３０と、保持
ユニット４０と、および入出力処理部５０を有する。入
出力処理部５０は、ネットワーク１２４を介してユーザ
端末１２２との間でデータを送受信する。FIG. 5 is a functional block diagram showing the configuration of the file management server 120. File management server 12
0 has a processing unit 20, a search unit 30, a holding unit 40, and an input / output processing unit 50. The input / output processing unit 50 transmits / receives data to / from the user terminal 122 via the network 124.

【００４１】処理ユニット２０は、前提技術と同様の構
成である解析処理部２２および生成処理部２４と、本実
施形態に特有の構成である重要度判定部６０と、マーク
付与部６２と、要約作成部６４と更新監視部１００を含
む。The processing unit 20 has an analysis processing unit 22 and a generation processing unit 24 having the same configuration as the base technology, an importance degree determination unit 60 having a configuration unique to this embodiment, a mark giving unit 62, and a summary. The creating unit 64 and the update monitoring unit 100 are included.

【００４２】重要度判定部６０は、解析処理部２２と生
成処理部２４における処理の結果にもとづき、ファイル
を構成する要素が特徴的であるか否かを判定する。マー
ク付与部６２は、重要度判定部６０において特徴的であ
ると判定された要素に対し、その「特徴的である」程度
を示すマークを重要度として付与する。重要度は重み付
けがされていてもよく、つまり何段階かに重要度を分け
てマークが付与されていてもよい。The importance determining section 60 determines whether or not the elements forming the file are characteristic, based on the results of the processing in the analysis processing section 22 and the generation processing section 24. The mark assigning unit 62 assigns a mark indicating the degree of “characteristic” to the element determined to be characteristic by the importance degree determining unit 60 as the degree of importance. The degrees of importance may be weighted, that is, the marks may be given by dividing the degrees of importance into several levels.

【００４３】要約作成部６４は、特徴的であると判定さ
れた要素を残し、それ以外の要素を適宜削除することで
対象となるファイルの要約を作成する。要約作成部６４
はさらにサイズ決定部６８を含み、これは、作成される
要約の大きさを決定する。サイズ決定部６８は要約のサ
イズの指示を、例えば「１００字以内」のようにユーザ
から受け付けてもよいし、作成される要約が格納される
べき記録媒体、例えばハードディスクの使用可能残容量
に応じて要約のサイズを決定してもよい。The abstract creating unit 64 creates an abstract of the target file by leaving the elements determined to be characteristic and deleting other elements as appropriate. Summary creation unit 64
Further includes a size determiner 68, which determines the size of the digest to be created. The size determination unit 68 may accept an instruction of the size of the abstract from the user, for example, "within 100 characters", or depending on the available remaining capacity of the recording medium in which the created abstract is stored, for example, the hard disk. May determine the size of the summary.

【００４４】更新監視部１００は、ファイル格納部４２
を監視し、そこに保持され要約作成の対象となるファイ
ルに更新があった場合、要約を再度作成するよう解析処
理部２２、生成処理部２４、重要度判定部６０、マーク
付与部６２、要約作成部６４に対し上述の処理をするよ
うに指示を出す。The update monitoring unit 100 includes a file storage unit 42.
Is monitored, and if there is an update in the file that is held in the file and is subject to the digest creation, the analysis processing unit 22, the generation processing unit 24, the importance determination unit 60, the mark assigning unit 62, the abstract are created so that the digest is created again. It instructs the creating unit 64 to perform the above-mentioned processing.

【００４５】保持ユニット４０は、前提技術と同様の構
成であるファイル格納部４２、索引格納部４４、辞書格
納部４６、および関連データ格納部４８と、本実施形態
に特有の構成である要約格納部６６を含む。要約格納部
６６は、要約作成部６４で作成された要約を記録し保持
する。The holding unit 40 has a file storage unit 42, an index storage unit 44, a dictionary storage unit 46, and a related data storage unit 48 which have the same configuration as the base technology, and a summary storage which is a configuration unique to this embodiment. Including part 66. The summary storage unit 66 records and holds the summary created by the summary creating unit 64.

【００４６】検索ユニット３０は、前提技術と同様の構
成である比較処理部３２と、検索の結果をユーザ端末１
２２に提示する結果提示部１１２を含む。The search unit 30 has a comparison processing unit 32 having the same configuration as the base technology, and a search result obtained by the user terminal 1
22 includes a result presentation unit 112 to present to 22.

【００４７】以上の構成による、要約の作成手順を図６
に示すフローチャートをもとに説明する。まず、解析処
理部２２は蓄積ファイルを形態素解析、レイアウト解
析、構文解析などの手法を用いて構造の解析を行う（Ｓ
１００）。次に生成処理部２４が蓄積ファイルをＸＭＬ
変換して解析の結果を反映させる（Ｓ１０２）。つづい
て、重要度判定部６０が変換された蓄積ファイルの各要
素の重要度を判定する（Ｓ１０４）。判定の基準は、フ
ァイル中の出現頻度や出現位置であったり、品詞名であ
ったり、それら組合せなど多数存在する。マーク付与部
６２が重要と判定された要素に対して、図８に示したよ
う、マークを付与しそれを先のＸＭＬ変換された蓄積フ
ァイルに反映させ（Ｓ１０６）、要約作成部６４はそれ
をもとに不要な要素を削除し要約を作成する（Ｓ１０
８）。FIG. 6 shows the procedure for creating an abstract with the above configuration.
It will be described based on the flowchart shown in FIG. First, the analysis processing unit 22 analyzes the structure of the accumulated file using a method such as morphological analysis, layout analysis, and syntax analysis (S).
100). Next, the generation processing unit 24 stores the accumulated file in XML.
The result of the analysis is converted and reflected (S102). Subsequently, the importance determination unit 60 determines the importance of each element of the converted accumulated file (S104). There are many criteria such as the frequency and position of appearance in a file, the name of a part of speech, and a combination thereof. As shown in FIG. 8, the mark assigning unit 62 attaches a mark to the element determined to be important and reflects it in the XML-converted accumulated file (S106), and the abstract creating unit 64 applies it. Unnecessary elements are deleted to create a summary (S10).
8).

【００４８】図７は、要約を作成すべき蓄積ファイルの
原文の一部であり、図８はそのファイルを単純にＸＭＬ
変換したものであり、図９はさらに重要度判定の結果を
ＸＭＬの文法に沿って盛り込んだものである。図７にお
いて、１行目の中央にタイトルとして「Ａ社とＢ社、メ
モリ事業で提携」、２行目の右端に日付として「２００
１．１２．０１」、３行目以降に本文「日本のコンピュ
ータ大手Ａ社は、メモリ事業での提携をめぐり、米国の
Ｂ社との間で暫定合意が成立したと発表した。合意は・
・・。」が記述されている。FIG. 7 shows a part of the original text of a storage file for which an abstract is to be created, and FIG. 8 simply shows that file as XML.
FIG. 9 shows the result of conversion, and the result of the importance determination is further included according to the XML grammar. In FIG. 7, in the center of the first line, the title is “A and B companies, tie-up in memory business”, and in the right end of the second line, the date is “200
1.12.01, "after the third line, the text" A Japanese computer giant A announced a tentative agreement with company B in the United States over a partnership in the memory business. "
・・. Is described.

【００４９】図８は、単純にＸＭＬ変換した文書であり
重要度を判定していないので、属性として「タイトル」
「日付」「本文」の３種類のみが示されている。図９で
は、重要度判定部６０における判定の結果が反映され、
特徴的であると判定された要素には、「重要度」という
タグが付与されており、その際重要度の度合いはその高
い順に「３、２、１」の３段階で記述されている。FIG. 8 is a document that is simply XML-converted and its importance is not determined. Therefore, the attribute is "title".
Only three types of "date" and "text" are shown. In FIG. 9, the result of the determination by the importance determination unit 60 is reflected,
The element determined to be characteristic is tagged with "importance", and the degree of importance is described in three stages of "3, 2, 1" in descending order.

【００５０】例えば、図７の１行目の「メモリ事業」と
いう語に着目すると、この語の品詞、出現頻度、出現位
置を基準に重要度判定部６０で重要度が最も高い「３」
と判定され、この重要度「３」がマーク付与部６２によ
り「メモリ事業」という語に付与される。その場合、Ｘ
ＭＬ形式のファイル中では図９の３行目に示すように、
「＜重要度レベル＝“３”＞メモリ事業＜／重要度
＞」と属性を示すタグが「メモリ事業」という語を挟み
重要度が「３」である旨を顕在化している。ここでは、
ＸＭＬ形式のファイル作成する際に必要となる文法規則
上の要素宣言、属性宣言などの記述は省略している。重
要度を表すタグに、出現頻度や品詞がなどの属性をタグ
に盛り込んで、例えば「＜重要度レベル＝“３” 品
詞＝“名詞” 出現頻度＝“１０”＞メモリ事業＜／重
要度＞」としてもよい。For example, focusing on the word "memory business" on the first line in FIG. 7, "3" having the highest importance in the importance determining section 60 based on the part of speech, the frequency of appearance, and the position of appearance of this word.
Then, the importance “3” is assigned to the word “memory business” by the mark assigning unit 62. In that case, X
As shown in the third line of FIG. 9 in the ML format file,
“<Importance level =“ 3 ”> Memory business </ importance>” ”and the tag indicating the attribute sandwiches the word“ memory business ”to reveal that the importance is“ 3 ”. here,
Descriptions such as element declarations and attribute declarations in the grammatical rules necessary for creating an XML file are omitted. By adding attributes such as the frequency of occurrence and the part of speech to the tag indicating the degree of importance, for example, “<importance level =“ 3 ”part of speech =“ noun ”appearance frequency =“ 10 ”> memory business <// importance>"

【００５１】図９に示したように、重要度を盛り込んだ
あと、サイズ決定部６８において設定された条件に応じ
て要約を作成する。例えば、ユーザにより「３０字以
内」という指示がなされていると想定し、ここでは２９
字で「Ａ社は、メモリ事業の提携をめぐり、Ｂ社と暫定
合意に達する。」という要約が作成され、要約格納部６
６にその基となった蓄積ファイルと関連づけられて保持
される。As shown in FIG. 9, after incorporating the degree of importance, a summary is created in accordance with the conditions set in the size determining unit 68. For example, assuming that the user has instructed that "30 characters or less", 29
In the text, the summary "Company A reaches a tentative agreement with Company B over the memory business alliance."
6 is stored in association with the storage file that is the basis of the file.

【００５２】図１０は、図７で示した蓄積ファイル中の
単語の出現頻度示す出現頻度テーブルを示しており、こ
れは索引格納部に保持され、検索の際に利用される。品
詞の欄に記述されている「名」「固」「動」は、それぞ
れ「一般名詞」「固有名詞」「動詞」を示しており、
「名・動」は、例えば「合意」と「合意する」のように
名詞と動詞の両方に使用されていることを示している。FIG. 10 shows an appearance frequency table showing the appearance frequencies of the words in the accumulation file shown in FIG. 7, which is held in the index storage unit and used in the search. The "name", "fix", and "verb" described in the column of part of speech indicate "general noun", "proper noun", and "verb", respectively.
"Name / verb" indicates that it is used for both a noun and a verb, such as "agree" and "agree".

【００５３】図７は、蓄積ファイルの一部のみを示して
いるが、全体では、例えば「メモリ事業」という語はこ
の蓄積ファイル中で１０回出現していることが示されて
いる。また、「特記」事項として、特に重要な出現位
置、例えばタイトル中に出現している場合はその旨を記
述する。図示はしないが、ファイル管理サーバ１２０に
より管理されている他の蓄積ファイルに関しても出現頻
度が記述されている同様のテーブルが索引格納部４４に
保持されている。Although FIG. 7 shows only a part of the accumulated file, it is shown that the word “memory business” appears ten times in this accumulated file as a whole. Also, as a “special note” item, a particularly important appearance position, for example, when it appears in the title, that fact is described. Although not shown, the index storage unit 44 holds a similar table in which appearance frequencies are described for other accumulated files managed by the file management server 120.

【００５４】以上、実施の形態によれば、ファイルの特
徴的な部分を顕在化させることで精度の高い要約の作成
ができる。特に、ＸＭＬ形式のファイルに変換すること
で、ファイル中の各要素の属性をファイル中に盛り込む
ことができ要約作成のみならず、それら要素を利用した
検索機能を利用できる。As described above, according to the embodiment, it is possible to create a highly accurate abstract by revealing the characteristic part of the file. In particular, by converting to an XML file, the attributes of each element in the file can be included in the file, and not only the summary creation but also the search function using those elements can be used.

【００５５】以上、本発明を実施の形態をもとに説明し
た。この実施の形態は例示であり、それら各構成要素や
各処理プロセスの組合せにいろいろな変形例が可能なこ
と、またそうした変形例も本発明の範囲であることは当
業者に理解されるところである。そうした変形例を挙げ
る。The present invention has been described above based on the embodiments. It should be understood by those skilled in the art that this embodiment is an exemplification, and that various modifications can be made to the combinations of the respective constituent elements and the respective processing processes, and that such modifications are within the scope of the present invention. . An example of such a modification will be given.

【００５６】実施の形態では、要約作成の対象となる蓄
積ファイルと作成された要約は同一の装置に保持された
がこれに限らず、異なる装置に保持されてもよい。例え
ば図１１に示すように、ファイル管理サーバ１２０とは
別にデータベースサーバ１４０を設け、その中にファイ
ル格納部４２を備える構成にする。In the embodiment, the storage file to be created as a summary and the created summary are held in the same device, but the present invention is not limited to this, and may be held in different devices. For example, as shown in FIG. 11, a database server 140 is provided separately from the file management server 120, and a file storage unit 42 is provided therein.

【００５７】実施の形態では、要約ファイルを作成する
ためにＸＭＬ形式のファイルに変換したがこれに限る趣
旨ではない。ＸＭＬ形式のファイルは、そのファイルの
記述の際に、ファイルを要素に分割しその要素の属性を
フレキシブルに記述できる特徴を持つので、本発明を実
現するに非常に有効である。また、図８に示した、属性
が盛り込まれたタグを含むファイルは、「“メモリ事
業”という単語が“タイトル”に含まれるファイル」と
いった条件でファイル検索をする上で有効なので保持さ
れてもよい。In the embodiment, the file is converted into the XML format file to create the summary file, but the present invention is not limited to this. The XML format file is very effective in realizing the present invention because it has a feature that the file can be divided into elements and the attributes of the elements can be flexibly described when the file is described. Further, the file including the tag including the attribute shown in FIG. 8 is effective in performing the file search under the condition such as "the file in which the word" memory business "is included in the" title "" is effective, and thus the file is retained. Good.

【００５８】[0058]

【発明の効果】本発明によれば、利便性の高い方法で多
数のファイルを管理できる。According to the present invention, a large number of files can be managed by a highly convenient method.

[Brief description of drawings]

【図１】前提技術におけるファイル検索装置の構成を
示す機能ブロック図である。FIG. 1 is a functional block diagram showing a configuration of a file search device in the base technology.

【図２】前提技術における索引データの生成過程を示
すフローチャートである。FIG. 2 is a flowchart showing a process of generating index data in the base technology.

【図３】前提技術における検索過程を示すフローチャ
ートである。FIG. 3 is a flowchart showing a search process in the base technology.

【図４】本実施形態における検索システムの全体構成
を示す機能ブロック図である。FIG. 4 is a functional block diagram showing an overall configuration of a search system according to this embodiment.

【図５】本実施形態におけるファイル管理サーバの構
成を示す機能ブロック図である。FIG. 5 is a functional block diagram showing a configuration of a file management server in this embodiment.

【図６】要約作成の手順を示すフローチャートであ
る。FIG. 6 is a flowchart showing a procedure for creating an abstract.

【図７】要約作成の対象となる蓄積ファイルの原文の
一部を示す図である。FIG. 7 is a diagram showing a part of an original sentence of a storage file which is a target of summary creation.

【図８】単純にＸＭＬの形式に変換した蓄積ファイル
を示す図である。FIG. 8 is a diagram showing a storage file simply converted into an XML format.

【図９】重要度の判定結果を盛り込んだＸＭＬ形式の
蓄積ファイルを示す図である。FIG. 9 is a diagram showing an XML-format accumulation file including a determination result of importance.

【図１０】ファイル中の要素の出現頻度を表したテー
ブルを示す図である。FIG. 10 is a diagram showing a table showing appearance frequencies of elements in a file.

【図１１】本実施形態の変形例における検索システム
の全体構成を示す機能ブロック図である。FIG. 11 is a functional block diagram showing an overall configuration of a search system in a modified example of this embodiment.

[Explanation of symbols]

２０処理ユニット、２２解析処理部、２４生
成処理部、４２ファイル格納部、４４索引格納
部、６０重要度判定部、６２マーク付与部、
６４要約作成部、６６要約格納部、６８サイ
ズ決定部、１００更新監視部、１２０ファイル
管理サーバ。20 processing unit, 22 analysis processing unit, 24 generation processing unit, 42 file storage unit, 44 index storage unit, 60 importance determination unit, 62 mark giving unit,
64 summary creation unit, 66 summary storage unit, 68 size determination unit, 100 update monitoring unit, 120 file management server.

Claims

[Claims]

1. A device for managing a plurality of electronically recorded files that can be used by a computer, and a structure analysis unit for analyzing the structure of a file, and the result of the analysis indicates whether or not the structure is characteristic. If the structure of the file is determined to be characteristic by the importance determination unit that determines whether or not the structure of the file is characteristic, a mark indicating that the structure is characteristic is displayed in the file. A file management device comprising: a mark giving unit to be associated with.

2. A device for managing a plurality of electronically recorded files that can be used by a computer, and a structure analysis unit for decomposing the file into elements, and whether each of the decomposed elements is characteristic in the file. An importance determining unit that determines whether or not, to the element that is determined to be characteristic in the importance determining unit, a mark assigning unit that assigns a predetermined mark so that it becomes visible in the file, A file management device comprising:

3. The importance degree determination unit detects a position of the element that appears in the file, and refers to the position to determine whether or not the element is characteristic. File management device described in.

4. The importance level determination unit compares the frequency of appearance of the element in the file with the frequency of appearance in another file to determine whether or not the element is characteristic. 2. The method according to claim 2 or 3, wherein
File management device described in.

5. The file management device according to claim 1, further comprising an index storage unit that records, as an index, a combination of the mark-added object and the mark.

6. The index storage unit extracts an object determined to be characteristic by the importance degree determination unit, omits an object determined to be not characteristic, and compresses a file to create an index. The file management device according to claim 5, which is characterized in that:

7. The index storage unit determines the degree of compression when creating an index for the file according to the remaining storage capacity of a storage medium in which the created index is to be stored. The file management device according to claim 6.

8. The file management apparatus according to claim 6, wherein the index storage unit receives an instruction from a user regarding a degree of compression when creating an index of the file.

9. The method according to claim 5, further comprising an update checking unit that checks an update history of the file, and the index storage unit creates an index of the file again when the update of the file is confirmed. 8. The file management device according to any one of 8.

10. A program for managing a plurality of electronically recorded files that can be used by a computer, the step of analyzing the structure of a file, and the result of the analysis as to whether or not the structure is characteristic. A determining step, and when the importance degree determining section determines that the structure of the file is characteristic, a mark indicating that the structure is characteristic is indicated by a predetermined mark, and a mark giving section that associates it with the file. A computer program that causes a computer to execute.

11. A program for managing a plurality of electronically recorded files that can be used by a computer, the process of decomposing a file into elements, and whether or not each decomposed element is characteristic of the file. And a step of giving a predetermined mark to the element determined to be characteristic by the importance degree determination unit so that the element is manifested in the file. A characteristic computer program.