JP5836893B2

JP5836893B2 - File management apparatus, file management method, and program

Info

Publication number: JP5836893B2
Application number: JP2012148590A
Authority: JP
Inventors: 光晴大峡
Original assignee: Hitachi Solutions Ltd
Current assignee: Hitachi Solutions Ltd
Priority date: 2012-07-02
Filing date: 2012-07-02
Publication date: 2015-12-24
Anticipated expiration: 2032-07-02
Also published as: JP2014010758A

Description

本発明は、ファイル管理装置、ファイル管理方法、及びプログラムに関し、例えば、コンピュータ上のファイルを仮想的に分類するための技術に関するものである。 The present invention relates to a file management apparatus, a file management method, and a program, for example, to a technique for virtually classifying files on a computer.

近年コンピュータの発達により、ネットワークにより結合された複数のコンピュータにおいて、複数のユーザがファイルを共有することが日常的に行われている。例えば、ファイルサーバ上のファイルを複数ユーザで共有する場合がある。ファイルを管理する際には、固定的な階層構造のフォルダ（物理フォルダ）を用いることが一般的である。ファイルを格納する際には、組織内の運用ルールによって決められたフォルダに格納する場合がある。運用ルールとは、例えばファイルの種類毎や所属する部門毎に、決められたフォルダに格納するというものである。さらに、ファイルが作成された年度毎にフォルダを作成したり、製品毎にフォルダを作成するなど様々なパターンが考えられる。このようなフォルダ管理方法は、複数人でファイルを共有する場合だけでなく、１人のユーザがファイルを管理する場合でも行われている。 In recent years, with the development of computers, a plurality of users commonly share files in a plurality of computers connected by a network. For example, a file on a file server may be shared by multiple users. When managing files, it is common to use a fixed hierarchical folder (physical folder). When a file is stored, it may be stored in a folder determined by operational rules in the organization. The operation rule is, for example, storing in a predetermined folder for each file type or each department to which the user belongs. Furthermore, various patterns are conceivable, such as creating a folder for each year in which a file is created or creating a folder for each product. Such a folder management method is performed not only when a file is shared by a plurality of people but also when a single user manages the file.

ユーザの作業内容によっては、複数の物理フォルダに格納されたファイルをいくつかまとめて１つの用途で使用したいという場合がある。このような場合に、例えば各フォルダから必要なファイルを探し出し、１つのフォルダにコピーするという作業が必要になるためユーザの負担となる。また、このような作業を繰り返すと同一ファイルがファイルサーバ内に増え、ファイルサーバの容量を圧迫する。さらに、その中の一部のファイルにのみ変更を加えると類似したファイルがファイルサーバ内に散在することになり、最新のファイルがわからなくなるという問題も発生する。 Depending on the work contents of the user, there are cases where it is desired to use several files stored in a plurality of physical folders for one purpose. In such a case, for example, it is necessary to find a necessary file from each folder and copy it to one folder, which is a burden on the user. In addition, if such work is repeated, the same files increase in the file server, which reduces the capacity of the file server. Furthermore, if only some of the files are changed, similar files will be scattered in the file server, and the latest file will not be known.

そこで、文書（ファイル）のメタデータ（属性情報）を文書に対応付けて管理する方法が考えられている。例えば特許文献１では、仮想フォルダシステムが提案されている。仮想フォルダシステムとは、実際にファイルが存在する場所とは無関係に、条件に合致するファイルやフォルダを格納するフォルダ（仮想フォルダ）を提供するシステムである。例えば、ファイルにメタデータを設定しておき、仮想フォルダにはメタデータに対する検索条件を定義することで、検索条件に合致するファイルを仮想フォルダに格納することができる。仮想フォルダ参照時には、検索条件に基づいたファイルのみが表示される。例えば、営業文書を管理する場面では、まず「文書種別」（契約書・注文書・見積書など）を属性として定義しておく。属性とは、例えば「文書種別」や「取引先」などのメタデータの種類を表す語句である。全てのファイルについて文書種別を付与し、仮想フォルダに「文書種別が“契約書”であるもの」という検索条件を割り当てておけば、その仮想フォルダを参照すると契約書の一覧が取得できる。このように、仮想フォルダシステムでは、ファイルを意味的に分類するので、文書の効果的な活用が可能となる。また、物理的なフォルダ構造に関係なく、仮想的に様々なフォルダで管理できるため、ファイルの無駄なコピーによる容量圧迫や、最新版がわからなくなるという問題を解決できる。 Therefore, a method for managing metadata (attribute information) of a document (file) in association with the document is considered. For example, Patent Document 1 proposes a virtual folder system. The virtual folder system is a system that provides a folder (virtual folder) for storing files and folders that meet the conditions regardless of the location where the files actually exist. For example, by setting metadata in a file and defining a search condition for the metadata in the virtual folder, a file that matches the search condition can be stored in the virtual folder. When referring to the virtual folder, only files based on the search condition are displayed. For example, in the scene of managing sales documents, first, “document type” (contract, order, estimate, etc.) is defined as an attribute. The attribute is a phrase representing the type of metadata such as “document type” or “customer”. If a document type is assigned to all files and a search condition “a document type is“ contract ”” is assigned to a virtual folder, a list of contracts can be acquired by referring to the virtual folder. As described above, since the virtual folder system classifies files semantically, documents can be effectively used. In addition, since it can be managed in various folders virtually regardless of the physical folder structure, it is possible to solve the problem of capacity compression due to useless copying of files and the inability to know the latest version.

特開２００３−３２３３２６号公報JP 2003-323326 A

小山照夫, “日本語テキストからの複合語用語抽出”, 情報知識学会誌, vol.19, No.4, pp.306-315, 2010Teruo Koyama, “Extracting compound terms from Japanese text”, Journal of Information and Knowledge Society, vol.19, No.4, pp.306-315, 2010

しかしながら、特許文献１の技術によると、仮想フォルダの定義をユーザが行わなければならず、その作業がユーザの負担となってしまう。また、ファイルをどのような基準で分類するかをユーザが検討しなければならない。この作業を行うためには、ファイルサーバ内にどのようなファイルが存在するか知っておく必要があり、さらにどのような観点で分類すべきかをユーザが判断しなければならない。一般にファイルサーバ全体の内容を把握し、適切に分類を行う作業には困難が伴う。 However, according to the technique of Patent Document 1, the user must define the virtual folder, and this work is a burden on the user. In addition, the user must consider the criteria for classifying files. In order to perform this work, it is necessary to know what kind of file exists in the file server, and the user must determine from what point of view it should be classified. In general, it is difficult to grasp the contents of the entire file server and perform proper classification.

本発明はこのような状況に鑑みてなされたものであり、ファイルサーバに格納されたファイルに対して、精度良く、かつユーザにとって使い易いように、自動的に仮想分類を行うための技術を提供する。 The present invention has been made in view of such a situation, and provides a technique for automatically performing virtual classification on a file stored in a file server with high accuracy and ease of use for a user. To do.

上記目的を達成するために、本発明のファイル管理装置は、ファイルを仮想分類するための仮想フォルダを生成する装置である。当該装置は、ファイルのメタデータを構成する文字列や検索ログにおける検索クエリなどからキーワードを抽出し、記憶装置に登録する。また、当該装置は仮想フォルダを生成する際に用いるメタデータ群及び検索クエリ群において、出現頻度が多いキーワードを基に、仮想フォルダに格納されるファイルの条件を自動的に決定する。 In order to achieve the above object, a file management apparatus of the present invention is an apparatus that generates a virtual folder for virtually classifying files. The device extracts a keyword from a character string constituting file metadata, a search query in a search log, and the like, and registers the keyword in a storage device. In addition, in the metadata group and the search query group used when generating the virtual folder, the apparatus automatically determines the condition of the file stored in the virtual folder based on keywords that appear frequently.

即ち、本発明によるファイル管理装置は、複数の物理ファイルを分類するための仮想フォルダを生成するプログラムを実行するプロセッサと、複数の物理ファイルのメタデータを管理するためのメタデータ管理情報を格納する記憶装置と、を有している。ここで、仮想フォルダは、複数の物理ファイル或いはそれらを格納する複数の物理フォルダが存在する場所とは無関係に、複数の物理ファイル及び物理フォルダのリンク情報を管理するための仮想的なフォルダである。 That is, the file management apparatus according to the present invention stores a processor that executes a program for generating a virtual folder for classifying a plurality of physical files, and metadata management information for managing the metadata of the plurality of physical files. And a storage device. Here, the virtual folder is a virtual folder for managing link information of a plurality of physical files and physical folders irrespective of the locations where the plurality of physical files or the plurality of physical folders storing them exist. .

そして、プロセッサは、まず、メタデータ管理情報の複数のファイルのメタデータを構成する文字列から複数のキーワードを抽出し、当該抽出した各キーワードの出現頻度の情報を取得する。また、プロセッサは、検索ログデータに含まれる複数の検索クエリを構成する文字列から複数のキーワードを抽出し、当該抽出した各キーワードの出現頻度の情報を取得する。また、プロセッサは、出現頻度が所定値以上のキーワードを用いて規定数分の仮想上位フォルダを生成する。さらに、プロセッサは、仮想上位フォルダに対して用いたキーワードを含む別のキーワード、あるいは仮想上位フォルダに対して用いたキーワードと同時に検索したキーワード、あるいはメタデータを構成する文字列において、仮想上位フォルダに対して用いたキーワードと同時に使用されるキーワードを用いて、仮想上位フォルダに関連付けられる仮想下位フォルダを生成する。そして、プロセッサは、生成した仮想上位フォルダと仮想下位フォルダフォルダとの関係、及び仮想上位フォルダ及び仮想下位フォルダの内容を表示する仮想分類表示を出力する。 Then, the processor first extracts a plurality of keywords from the character strings constituting the metadata of the plurality of files of the metadata management information, and acquires information on the appearance frequency of each extracted keyword. Further, the processor extracts a plurality of keywords from character strings constituting a plurality of search queries included in the search log data, and acquires information on the appearance frequency of each of the extracted keywords. Further, the processor generates a specified number of virtual upper folders using keywords whose appearance frequency is equal to or higher than a predetermined value. In addition, the processor uses another keyword including a keyword used for the virtual upper folder, a keyword searched simultaneously with the keyword used for the virtual upper folder, or a character string constituting the metadata in the virtual upper folder. A virtual lower folder associated with the virtual upper folder is generated using a keyword used at the same time as the keyword used for the virtual upper folder. Then, the processor outputs a virtual classification display for displaying the relationship between the generated virtual upper folder and virtual lower folder folder, and the contents of the virtual upper folder and the virtual lower folder.

本発明によれば、ファイルサーバに格納されたファイル群を検索するための仮想フォルダを、精度よく、かつユーザにとって使い易いように、自動的に作成することができる。これにより、ユーザはファイルサーバ内に格納されているファイルの内容に関する知識が乏しくても、少ない作業工数で仮想フォルダを構築することが可能となる。頻出キーワード、検索キーワード、共起キーワードをもとに仮想フォルダを生成するため、よりユーザにとって利便性が高い仮想フォルダが生成される。 ADVANTAGE OF THE INVENTION According to this invention, the virtual folder for searching the file group stored in the file server can be created automatically so that it may be accurate and easy for the user to use. Thereby, even if the user has little knowledge about the contents of the file stored in the file server, the user can construct a virtual folder with a small number of work steps. Since the virtual folder is generated based on the frequently used keyword, the search keyword, and the co-occurrence keyword, a virtual folder that is more convenient for the user is generated.

本発明に関連する更なる特徴（課題、構成、効果）は、本明細書の記述、添付図面から明らかになるものである。また、本発明の態様は、要素及び多様な要素の組み合わせ及び以降の詳細な記述と添付される特許請求の範囲の様態により達成され実現される。 Further features (problems, configurations, and effects) related to the present invention will become apparent from the description of the present specification and the accompanying drawings. The embodiments of the present invention can be achieved and realized by elements and combinations of various elements and the following detailed description and appended claims.

本明細書の記述は典型的な例示に過ぎず、本発明の特許請求の範囲又は適用例を如何なる意味に於いても限定するものではないことを理解する必要がある。 It should be understood that the description herein is merely exemplary and is not intended to limit the scope of the claims or the application of the invention in any way.

本発明の実施形態に係るシステム（ファイル管理装置）の概略構成を示す図である。It is a figure which shows schematic structure of the system (file management apparatus) which concerns on embodiment of this invention. メタデータファイルの一例を示す図である。It is a figure which shows an example of a metadata file. 検索ログデータの一例を示す図である。It is a figure which shows an example of search log data. 仮想フォルダデータの一例を示す図である。It is a figure which shows an example of virtual folder data. 抽出キーワードデータの一例を示す図である。It is a figure which shows an example of extraction keyword data. 検索キーワード管理データの一例を示す図である。It is a figure which shows an example of search keyword management data. 共起キーワードデータの一例を示す図である。It is a figure which shows an example of co-occurrence keyword data. 仮想上位フォルダデータの一例を示す図である。It is a figure which shows an example of virtual high-order folder data. 仮想下位フォルダデータの一例を示す図である。It is a figure which shows an example of virtual subfolder data. キーワード登録処理を説明するためのフローチャートである。It is a flowchart for demonstrating a keyword registration process. 抽出キーワード登録処理を説明するためのフローチャートである。It is a flowchart for demonstrating an extraction keyword registration process. 検索キーワード登録処理を説明するためのフローチャートである。It is a flowchart for demonstrating a search keyword registration process. 共起キーワード登録処理を説明するためのフローチャートである。It is a flowchart for demonstrating a co-occurrence keyword registration process. 仮想フォルダ生成処理の全体を説明するためのフローチャートである。It is a flowchart for demonstrating the whole virtual folder production | generation process. 仮想フォルダ生成処理の一部を説明するためのフローチャートである。It is a flowchart for demonstrating a part of virtual folder production | generation process. 抽出キーワード登録処理で使用するデータの一例を示す図である。It is a figure which shows an example of the data used by an extraction keyword registration process. 仮想分類画面の一例を示す図である。It is a figure which shows an example of a virtual classification screen.

以下、添付図面を参照して本発明の実施形態について説明する。ただし、本実施形態は本発明を実現するための一例に過ぎず、本発明の技術的範囲を限定するものではないことに注意すべきである。また、各図において共通の構成については同一の参照番号が付されている。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. However, it should be noted that this embodiment is merely an example for realizing the present invention, and does not limit the technical scope of the present invention. In each drawing, the same reference numerals are assigned to common components.

なお、以後の説明では表（テーブル）形式によって本発明の情報を説明するが、これら情報は必ずしも表形式によるデータ構造で表現されていなくても良く、リスト、ＤＢ、キュー等のデータ構造やそれ以外で表現されていても良い。そのため、データ構造に依存しないことを示すために「テーブル」、「リスト」、「ＤＢ」、「キュー」等について単に「情報」と呼ぶことがある。 In the following description, the information of the present invention will be described in the form of a table (table). However, the information does not necessarily have to be expressed in a data structure in the form of a table. It may be expressed in other than. Therefore, “table”, “list”, “DB”, “queue”, etc. may be simply referred to as “information” to indicate that they do not depend on the data structure.

また、各情報の内容を説明する際に、「識別情報」、「識別子」、「名」、「名前」、「ＩＤ」という表現を用いることが可能であり、これらについてはお互いに置換が可能である。 In addition, when explaining the contents of each information, the expressions “identification information”, “identifier”, “name”, “name”, “ID” can be used, and these can be replaced with each other. It is.

以後の説明では「プログラム」を主語として説明を行うが、プログラムはプロセッサによって実行されることで定められた処理をメモリ及び通信ポート（通信制御装置）を用いながら行うため、プロセッサを主語とした説明としてもよい。また、プログラムを主語として開示された処理は管理サーバ等の計算機、情報処理装置が行う処理としてもよい。プログラムの一部または全ては専用ハードウェアで実現してもよく、また、モジュール化されていても良い。各種プログラムはプログラム配布サーバや記憶メディアによって各計算機にインストールされてもよい。 In the following description, “program” will be the subject, but the program is executed by the processor, and processing determined by using the memory and communication port (communication control device) will be performed. It is good. Further, the processing disclosed with the program as the subject may be processing performed by a computer such as a management server or an information processing apparatus. Part or all of the program may be realized by dedicated hardware, or may be modularized. Various programs may be installed in each computer by a program distribution server or a storage medium.

＜仮想分類装置の構成＞
図１は、本発明の実施形態による仮想分類装置（ファイル管理装置や文書処理装置ということもできる）の概略構成を示す機能ブロック図である。この仮想分類装置は、必要な演算処理及び制御処理等を行う中央処理装置（プロセッサ）１００と、データの入出力を行うための入出力装置１１０と、中央処理装置１００での処理に必要なプログラムを格納するプログラムメモリ１２０と、中央処理装置１００での処理後のデータを格納する記憶装置１３０と、中央処理装置１００での処理対象となるデータを格納するデータメモリ１４０と、を有している。 <Configuration of virtual classification device>
FIG. 1 is a functional block diagram showing a schematic configuration of a virtual classification device (also referred to as a file management device or a document processing device) according to an embodiment of the present invention. The virtual classification device includes a central processing unit (processor) 100 that performs necessary arithmetic processing and control processing, an input / output device 110 for inputting and outputting data, and a program necessary for processing in the central processing unit 100. A program memory 120 for storing the data, a storage device 130 for storing data after processing in the central processing unit 100, and a data memory 140 for storing data to be processed in the central processing unit 100. .

入出力装置１１０は、データを表示するための表示装置１１１やプリンタ（図示せず）等で構成される出力デバイスと、表示されたデータに対してメニューを選択するなどの操作を行うためのキーボード１１２、マウスなどのポインティングデバイス１１３と、を有している。 The input / output device 110 includes a display device 111 for displaying data, an output device including a printer (not shown), and a keyboard for performing operations such as selecting a menu for the displayed data. 112, a pointing device 113 such as a mouse.

プログラムメモリ１２０は、メタデータの検索を行う検索プログラム１２１と、メタデータ及び検索ログからキーワードを抽出するキーワード登録プログラム１２２と、キーワードを基に仮想フォルダを生成する仮想フォルダ生成プログラム１２３と、仮想フォルダを画面表示し、各仮想フォルダに格納されたファイルの内容の表示を行う仮想分類プログラム１２４と、を格納している。なお、各処理プログラムは、プログラムコードとしてプログラムメモリ１２０に格納されており、中央処理装置１００が各プログラムコードを実行することによって各処理が実現される。 The program memory 120 includes a search program 121 that searches for metadata, a keyword registration program 122 that extracts keywords from the metadata and the search log, a virtual folder generation program 123 that generates a virtual folder based on the keywords, and a virtual folder And a virtual classification program 124 for displaying the contents of files stored in each virtual folder. Each processing program is stored in the program memory 120 as a program code, and each processing is realized by the central processing unit 100 executing each program code.

記憶装置１３０は、各ファイルのメタデータファイル１３１と、検索クエリのログが格納される検索ログデータ１３２と、キーワードを基に生成される仮想フォルダの定義情報が格納される仮想フォルダデータ１３３と、を格納している。なお、記憶装置１３０は、ネットワークを介して遠隔的に配置されていているストレージシステムであってもよい。 The storage device 130 includes a metadata file 131 of each file, search log data 132 storing a search query log, virtual folder data 133 storing virtual folder definition information generated based on a keyword, Is stored. The storage device 130 may be a storage system that is remotely arranged via a network.

データメモリ１４０は、抽出キーワードデータ１４１と、検索キーワード管理データ１４２と、共起キーワードデータ１４３と、仮想上位フォルダデータ１４４と、仮想下位フォルダデータ１４５と、を格納している。これらのデータの詳細については後述する。 The data memory 140 stores extracted keyword data 141, search keyword management data 142, co-occurrence keyword data 143, virtual upper folder data 144, and virtual lower folder data 145. Details of these data will be described later.

以上に述べた処理プログラム・データ・各プログラム等は、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ＭＯ、フロッピー（登録商標）ディスク、ＵＳＢメモリ等の種々の記録媒体に格納して提供することもできる。 The processing program, data, each program, etc. described above can be provided by being stored in various recording media such as a CD-ROM, DVD-ROM, MO, floppy (registered trademark) disk, USB memory or the like.

＜メタデータ＞
図２は、記憶装置１３０内のメタデータファイル１３１の一例を示す図である。本発明の実施形態では、メタデータファイル１３１に登録された各ファイル（ファイル００１、００２、００３、・・・・）は、メタデータ２０２と共にメタデータファイル１３１内で管理されるものとする。従って、メタデータ２０２が未登録のファイルはここには登録されていないものとする。 <Metadata>
FIG. 2 is a diagram illustrating an example of the metadata file 131 in the storage device 130. In the embodiment of the present invention, it is assumed that each file (files 001, 002, 003,...) Registered in the metadata file 131 is managed in the metadata file 131 together with the metadata 202. Therefore, it is assumed that a file whose metadata 202 is not registered is not registered here.

メタデータファイル１３１は例えば表形式で管理され、１つのファイルが１行に対応している。メタデータファイル１３１は、ファイルを一意に示すＩＤ２０１と、ファイルに登録されたメタデータ２０２と、を構成項目として有している。 The metadata file 131 is managed in a table format, for example, and one file corresponds to one line. The metadata file 131 has an ID 201 that uniquely identifies the file and metadata 202 registered in the file as configuration items.

メタデータ２０２は、本システムで管理する属性毎に列を構成している。図２には、例えば、紙の営業文書をスキャナでスキャンして得られたファイルのメタデータが示されている。図２の例では、属性として、ファイルのファイルパス２０３、文書種別２０４、顧客名２０５等がある。なお、メタデータは図２で示したもの以外にも、様々なパターンが考えられる。例えば、ファイルのアクセス日、最終更新日など日付に関するメタデータや、ファイルの作成者、更新者などの人に関するメタデータも考えられる。 The metadata 202 constitutes a column for each attribute managed by this system. FIG. 2 shows, for example, file metadata obtained by scanning a paper business document with a scanner. In the example of FIG. 2, the file path 203 of the file, the document type 204, the customer name 205, etc. are included as attributes. In addition to the metadata shown in FIG. 2, various patterns are conceivable. For example, metadata regarding a date such as a file access date and a last update date, and metadata regarding a person such as a file creator or an updater can be considered.

＜検索ログデータ＞
図３は、記憶装置１３０内の検索ログデータ１３２の一例を示す図である。検索ログデータ１３２は、ユーザから入力された検索クエリとその日時３０３が記載されたデータである。検索クエリは、複数のデータを保持する。図３は、二種類のデータ（検索クエリＡ３０１、検索クエリＢ３０２）を保持している場合の例を示す。これはユーザが検索クエリＡ３０１と検索クエリＢ３０２でＡＮＤ検索を行ったログであることを示す。図３の例において、一行目のデータは、「契約書」と「文書管理システム」がＡＮＤ検索されたログであることを示す。検索クエリＢ３０２が空欄の場合は、検索クエリＡ３０１のみで検索を行ったログであることを示す。 <Search log data>
FIG. 3 is a diagram illustrating an example of the search log data 132 in the storage device 130. The search log data 132 is data in which a search query input by the user and its date and time 303 are described. The search query holds a plurality of data. FIG. 3 shows an example in which two types of data (search query A301 and search query B302) are held. This indicates a log in which the user performs an AND search using the search query A301 and the search query B302. In the example of FIG. 3, the data on the first line indicates that “contract” and “document management system” are logs obtained by AND search. When the search query B302 is blank, it indicates that the search is performed using only the search query A301.

なお、本実施形態では、検索ログデータ１３２が２つの検索クエリで構成される場合で説明するが、検索クエリの数は１つ或いは３つ以上であっても構わない。 In the present embodiment, the case where the search log data 132 includes two search queries will be described. However, the number of search queries may be one or three or more.

また、検索ログデータ１３２は、例えば、属性ごとに複数の登録情報ファイルを有している。従って、例えば、文書種別と顧客名を使った検索の場合、それぞれ属性ごと（文書種別と顧客名）に別々に重複して登録されるようにしても良い。 Further, the search log data 132 has, for example, a plurality of registration information files for each attribute. Therefore, for example, in the case of a search using a document type and a customer name, each attribute may be separately registered for each attribute (document type and customer name).

＜仮想フォルダデータ＞
図４は、記憶装置１３０内の仮想フォルダデータ１３３の一例を示す図である。この仮想フォルダデータは最終的に生成された仮想フォルダの情報を示している。 <Virtual folder data>
FIG. 4 is a diagram illustrating an example of the virtual folder data 133 in the storage device 130. This virtual folder data indicates information of the finally generated virtual folder.

仮想フォルダデータ１３３は属性単位で作成され、後述の仮想フォルダ生成プログラムによって生成される仮想フォルダの定義が記載されたデータである。ここで、仮想フォルダとは、実際にファイルやフォルダ（物理ファイル及び物理フォルダ）が存在する場所とは無関係に、条件に合致するファイルやフォルダを格納するフォルダをいう。また、仮想フォルダは、ファイルやフォルダの本体を格納しているのではなく、１つ又は複数のショートカットを格納することになる。そして、物理ファイル・フォルダが変更・新規作成・削除された場合、その結果が仮想フォルダに反映されて仮想フォルダの内容が変化する。なお、仮想フォルダは、単なるショートカットやエイリアスとは異なる概念である。より具体的には、ファイルのショートカットはフォルダではないので、複数のファイルをまとめることはできないし、フォルダのショートカットは、物理フォルダを別の場所から参照できるようにしているだけである。また、エイリアスは、ショートカットとほぼ同義だが、別名で別の場所から参照できるようにする技術である。ショートカット及びエイリアスはいずれも条件に合致するファイル（フォルダ）を格納するものではない。 The virtual folder data 133 is data in which a definition of a virtual folder generated by an attribute unit and generated by a virtual folder generation program described later is described. Here, the virtual folder refers to a folder that stores files and folders that meet the conditions, regardless of where the files and folders (physical files and physical folders) actually exist. The virtual folder does not store the main body of the file or folder but stores one or a plurality of shortcuts. When a physical file / folder is changed / created / deleted, the result is reflected in the virtual folder and the contents of the virtual folder change. Note that a virtual folder is a concept different from simple shortcuts and aliases. More specifically, since a file shortcut is not a folder, a plurality of files cannot be collected, and a folder shortcut only allows a physical folder to be referred to from another location. Alias is almost synonymous with shortcut, but it is a technology that makes it possible to refer to it from another place with another name. Neither shortcuts nor aliases store files (folders) that meet the conditions.

図４は、属性が文書種別であり、仮想上位フォルダ４０１と仮想下位フォルダ４０２の２階層の仮想フォルダが記載された例を示している。 FIG. 4 shows an example in which the attribute is a document type and a virtual folder having two layers of a virtual upper folder 401 and a virtual lower folder 402 is described.

仮想上位フォルダ４０１は、仮想下位フォルダ４０２の内容を包括する単一のキーワードで定義される。仮想上位フォルダ４０１に付与された文字列は検索条件を表す。より具体的には、メタデータファイルにおける対象の属性において、仮想上位フォルダに付与された文字列を含むファイルが検索対象となる。例えば、図４の１つ目のデータでは、文書種別に「契約」という文字列を含むファイルが検索対象となる。 The virtual upper folder 401 is defined by a single keyword that includes the contents of the virtual lower folder 402. A character string assigned to the virtual upper folder 401 represents a search condition. More specifically, a file including a character string assigned to a virtual upper folder in a target attribute in the metadata file is a search target. For example, in the first data in FIG. 4, a file including a character string “contract” in the document type is a search target.

仮想下位フォルダ４０２は、仮想上位フォルダ４０１の内容を、より詳細化したキーワードで定義され、以下の３パターンがある。 The virtual lower folder 402 is defined by a more detailed keyword for the contents of the virtual upper folder 401, and has the following three patterns.

１つは、仮想上位フォルダ４０１のキーワードを含む文字列で構成される場合である。例えば、仮想上位フォルダ４０１が「契約」、仮想下位フォルダ４０２が「契約書」となるような場合である。この場合、文書種別に「契約書」という文字列を含むファイルが検索対象となる。 One is a case of being composed of a character string including a keyword of the virtual upper folder 401. For example, the virtual upper folder 401 is “contract” and the virtual lower folder 402 is “contract”. In this case, a file including a character string “contract” in the document type is a search target.

２つ目は、仮想上位フォルダ４０１のキーワードが、検索ログデータ１３２における検索クエリＡ３０１である場合である。例えば、仮想上位フォルダ４０１が「契約」、仮想下位フォルダ４０２が「契約，法務」となるような場合である。この場合、文書種別に「契約」と「法務」を共に含むファイルが検索対象となる。つまり、上位仮想フォルダの文言（例：契約）とペアで検索に用いられることが多い文言（例：法務）を共に含むファイルの場合である。 The second is a case where the keyword of the virtual upper folder 401 is the search query A301 in the search log data 132. For example, the virtual upper folder 401 is “contract” and the virtual lower folder 402 is “contract, legal”. In this case, a file including both “contract” and “legal” in the document type is a search target. That is, it is a case of a file that includes both words (eg, legal affairs) that are often used for searching in pairs with words (eg, contracts) in the upper virtual folder.

３つ目は、２つ目の場合と同様に、２つのキーワードで構成される。一方のキーワードは仮想上位フォルダ４０１のキーワードである。もう一方のキーワードは、メタデータファイル１３１におけるメタデータにおいて、仮想上位フォルダ４０１のキーワードと同時に出現する別のキーワードである場合である。例えば、メタデータが「基本契約書作成依頼（製品ＡＢＣ）」の場合に、仮想下位フォルダ４０２が「契約，製品ＡＢＣ」となるような場合である。この場合、文書種別に「契約」と「製品ＡＢＣ」を共に含むファイルが検索対象となる。このパターンは、検索に関係なく、共に出現し易い文言のペアを含むファイルの場合である。 The third is composed of two keywords as in the second case. One keyword is a keyword of the virtual upper folder 401. The other keyword is a case in which metadata in the metadata file 131 is another keyword that appears at the same time as the keyword of the virtual upper folder 401. For example, when the metadata is “basic contract creation request (product ABC)”, the virtual lower folder 402 is “contract, product ABC”. In this case, a file including both “contract” and “product ABC” in the document type is a search target. This pattern is a case of a file including a pair of words that are likely to appear together regardless of search.

なお、仮想フォルダデータ１３３は、仮想分類処理実行の指示が入力されてから生成するようにしても良いし、所定数のファイルが蓄積された時点で自動的に生成したり、所定時間間隔で溜まったファイルに対して自動的に生成するようにしても良い。また、仮想フォルダデータ１３３を生成する際に、ユーザがフォルダ生成に用いるキーワードを指定するようにしても良い。 Note that the virtual folder data 133 may be generated after an instruction to execute the virtual classification process is input, or may be automatically generated when a predetermined number of files are accumulated or accumulated at predetermined time intervals. You may make it generate automatically for the file. Further, when generating the virtual folder data 133, the user may specify a keyword used for folder generation.

＜抽出キーワードデータ＞
図５は、データメモリ１４０内の抽出キーワードデータ１４１の一例を示す図である。抽出キーワードデータ１４１は、メタデータファイル１３１における文字列情報（例えば、ファイルパス２０３、文書種別２０４、顧客名２０５等）を基に、特徴的な単語（抽出キーワード）５０１と、その頻度５０２が記載されたデータである。頻度５０２は、記憶装置１３０におけるメタデータファイル１３１において、当該キーワードを含むファイルの件数を表している。図５の例では、メタデータファイル１３１に登録されているファイルには、「検収」というキーワードを含むファイルが２９２件存在することを表している。 <Extracted keyword data>
FIG. 5 is a diagram illustrating an example of the extracted keyword data 141 in the data memory 140. The extracted keyword data 141 describes a characteristic word (extracted keyword) 501 and its frequency 502 based on character string information (for example, file path 203, document type 204, customer name 205, etc.) in the metadata file 131. Data. The frequency 502 represents the number of files including the keyword in the metadata file 131 in the storage device 130. In the example of FIG. 5, the file registered in the metadata file 131 indicates that there are 292 files including the keyword “acceptance”.

抽出キーワードデータ１４１は、例えば、属性ごとに複数の登録情報ファイルを有している。なお、このような抽出キーワードデータ１４１は、非特許文献１に記載された方法によって生成することができるため、その生成方法についての説明は省略する。 The extracted keyword data 141 has, for example, a plurality of registration information files for each attribute. Note that such extracted keyword data 141 can be generated by the method described in Non-Patent Document 1, and thus the description of the generation method is omitted.

＜検索キーワード管理データ＞
図６は、データメモリ１４０内の検索キーワード管理データ１４２の一例を示す図である。検索キーワード管理データ１４２は、記憶装置１３０における検索ログデータ１３２をもとに生成されるデータである。検索クエリＡ６０１及び検索クエリＢ６０２は、検索ログデータ１３２の検索クエリＡ３０１及び検索クエリＢ３０２の組み合わせを表す。組み合わせを表すため、検索クエリＡ３０１と検索クエリＢ３０２の文字列が入れ替わっていた場合も同一データとなる。この際、文字コードでソートされ、検索クエリＡ６０１と検索クエリＢ６０２の順番は統一される。例えば、検索クエリＡ３０１と検索クエリＢ３０２が、「契約，書類」の場合と、「書類，契約」の場合は、検索クエリＡ６０１と検索クエリＢ６０２は、「契約，書類」のように統一される。出現頻度６０３は、検索ログデータ１３２において、検索クエリＡ３０１と検索クエリＢ３０２の組み合わせが出現した件数を表す。検索クエリＡ３０１とＢ３０２のいずれか一方が空欄の場合は、検索クエリＢ６０２は空欄となる。 <Search keyword management data>
FIG. 6 is a diagram illustrating an example of search keyword management data 142 in the data memory 140. The search keyword management data 142 is data generated based on the search log data 132 in the storage device 130. Search query A 601 and search query B 602 represent a combination of search query A 301 and search query B 302 in search log data 132. In order to represent a combination, the same data is obtained even when the character strings of the search query A301 and the search query B302 are interchanged. At this time, the characters are sorted by character codes, and the order of the search query A601 and the search query B602 is unified. For example, when the search query A301 and the search query B302 are “contract, document” and “document, contract”, the search query A601 and the search query B602 are unified as “contract, document”. The appearance frequency 603 represents the number of occurrences of the combination of the search query A301 and the search query B302 in the search log data 132. When either one of the search queries A301 and B302 is blank, the search query B602 is blank.

なお、検索キーワード管理データ１４２は、例えば、属性ごとに複数の登録情報ファイルを有している。また、検索キーワード管理データ１４２は、所定期間内の検索ログから得られる上記情報を管理するようにしても良い。 Note that the search keyword management data 142 has, for example, a plurality of registration information files for each attribute. Further, the search keyword management data 142 may manage the information obtained from the search log within a predetermined period.

＜共起キーワードデータ＞
図７は、データメモリ１４０内の共起キーワードデータ１４３の一例を示す図である。共起キーワードデータ１４３は、データメモリ１４０内の抽出キーワードデータ１４１から得られる抽出キーワード７０１と、記憶装置１３０内のメタデータファイル１３１において、抽出キーワード７０１の文字列と共に出現する別のキーワードである共起キーワード７０２と、抽出キーワード７０１と共起キーワード７０２の組み合わせの頻度７０３が記載されたデータである。頻度７０３は、メタデータファイル１３１において、当該キーワードの組み合わせを含むファイルの件数を表している。図７の例では、メタデータファイル１３１に登録されているファイルには、「納品書」というキーワードと、「検収」というキーワードを共に含むファイルが８０件存在することを表している。別の例を挙げると、図２における文書０１１や文書００８の文書種別２０４は、「納品書兼検収依頼書」や「契約書等審査票」となっている。このように独立した文言（キーワード）が「兼」や「等」、さらには「／」「＋」等の記号によって区切られている場合、この独立したキーワードは同一メタデータ内に一緒に出現する可能性が高く、共起キーワードとなる。 <Co-occurrence keyword data>
FIG. 7 is a diagram showing an example of co-occurrence keyword data 143 in the data memory 140. The co-occurrence keyword data 143 is an extracted keyword 701 obtained from the extracted keyword data 141 in the data memory 140 and another keyword that appears together with the character string of the extracted keyword 701 in the metadata file 131 in the storage device 130. This is data in which the starting keyword 702 and the frequency 703 of the combination of the extracted keyword 701 and the co-occurrence keyword 702 are described. The frequency 703 represents the number of files including the keyword combination in the metadata file 131. In the example of FIG. 7, the files registered in the metadata file 131 indicate that there are 80 files that include both the keyword “delivery note” and the keyword “acceptance”. As another example, the document type 204 of the document 011 and the document 008 in FIG. 2 is “delivery note and acceptance request document” or “contract document examination sheet”. When independent words (keywords) are separated by symbols such as “cum”, “etc.”, and “/” “+”, the independent keywords appear together in the same metadata. The possibility is high and it becomes a co-occurrence keyword.

なお、共起キーワードデータ１４３は、例えば、属性ごとに複数の登録情報ファイルを有している。また、共起キーワードデータ１４３は、非特許文献１に記載された方法によって生成することができるため、その生成方法についての説明は省略する。 The co-occurrence keyword data 143 has, for example, a plurality of registration information files for each attribute. Further, since the co-occurrence keyword data 143 can be generated by the method described in Non-Patent Document 1, description of the generation method is omitted.

＜仮想上位フォルダデータ＞
図８は、データメモリ１４０内の仮想上位フォルダデータ１４４の一例を示す図である。仮想上位フォルダデータ１４４は、仮想上位フォルダの検索条件となる文字列の候補であり、検索キーワード管理データ１４２から抽出されたキーワード８０１と、抽出キーワードデータ１４１において、当該キーワードの頻度を表す抽出キーワード頻度８０２と、検索キーワード管理データ１４２において、当該キーワードの頻度を表す検索頻度８０３と、抽出キーワード頻度８０２及び検索頻度８０３に基づいて算出されるスコア８０４が記載されたデータである。スコア８０４は、当該キーワードの仮想上位フォルダとしての適合の度合いを表している。仮想フォルダ生成プログラム１２３は、スコア８０４に基づいて仮想上位フォルダを決定する。 <Virtual upper folder data>
FIG. 8 is a diagram illustrating an example of the virtual upper folder data 144 in the data memory 140. The virtual upper folder data 144 is a candidate for a character string that becomes a search condition for the virtual upper folder. In the keyword 801 extracted from the search keyword management data 142 and the extracted keyword data 141, the extracted keyword frequency representing the frequency of the keyword is extracted. In the search keyword management data 142, the search frequency 803 representing the frequency of the keyword, and the score 804 calculated based on the extracted keyword frequency 802 and the search frequency 803 are described. The score 804 represents the degree of matching of the keyword as a virtual upper folder. The virtual folder generation program 123 determines a virtual upper folder based on the score 804.

なお、仮想上位フォルダデータ１４４は、例えば、属性ごとに複数の登録情報ファイルを有している。 Note that the virtual upper folder data 144 has, for example, a plurality of registration information files for each attribute.

＜仮想下位フォルダデータ＞
図９は、データメモリ１４０内の仮想下位フォルダデータ１４５の一例を示す図である。仮想下位フォルダデータ１４５は、仮想下位フォルダの検索条件の文字列の組み合わせとなる、キーワードＡ９０１及びキーワードＢ９０２と、抽出キーワードデータ１４１において、当該キーワードの件数を表す抽出キーワード頻度９０３と、検索キーワード管理データ１４２において、当該キーワードの組み合わせを含むデータの件数を表す検索頻度９０４と、共起キーワードデータ１４３において、当該キーワードの組み合わせを含むデータの件数を表す共起頻度９０５と、抽出キーワード頻度９０３及び検索頻度９０４及び共起頻度９０５に基づいて算出されるスコア９０６が記載されたデータである。 <Virtual subfolder data>
FIG. 9 is a diagram illustrating an example of the virtual lower folder data 145 in the data memory 140. The virtual lower folder data 145 includes a keyword A 901 and a keyword B 902 which are combinations of character strings of search conditions for the virtual lower folder, an extracted keyword frequency 903 indicating the number of keywords in the extracted keyword data 141, and search keyword management data. In 142, the search frequency 904 representing the number of data including the keyword combination, the co-occurrence keyword data 143, the co-occurrence frequency 905 representing the number of data including the keyword combination, the extracted keyword frequency 903, and the search frequency The score 906 calculated based on 904 and the co-occurrence frequency 905 is described.

キーワードＡ９０１には、仮想上位フォルダに含まれるキーワード（例：契約）が記入されている。キーワードＢ９０２には、キーワードＡ９０１に対して共起キーワードとなるキーワードが記入されている。キーワードＢ９０２に「−」が記入される場合は、共起キーワードが存在しないときである。 In keyword A901, a keyword (eg, contract) included in the virtual upper folder is entered. In keyword B902, a keyword that is a co-occurrence keyword with respect to keyword A901 is entered. The case where “-” is entered in the keyword B902 is when the co-occurrence keyword does not exist.

スコア９０６は、当該キーワードの組み合わせの仮想下位フォルダとしての適合の度合いを表している。仮想フォルダ生成プログラム１２３は、スコア９０６に基づいて仮想下位フォルダを決定する。 The score 906 represents the degree of matching of the keyword combination as a virtual lower folder. The virtual folder generation program 123 determines a virtual lower folder based on the score 906.

なお、仮想下位フォルダデータ１４５は、例えば、属性ごとに複数の登録情報ファイルを有している。 The virtual lower folder data 145 includes, for example, a plurality of registration information files for each attribute.

＜仮想分類画面＞
図１７は、仮想分類プログラム１２４が生成する仮想分類の表示画面（ＧＵＩ）の一例を示す図である。図１７に示されるように、ＧＵＩのウインドウでは、左側のペインに、ファイルを検索するための検索機能１７０１と、仮想フォルダによるツリー表示１７０２が表示され、右側のペインに、検索機能、あるいは仮想フォルダを選択されることによって、該当するファイルの検索結果１７０３が表示される。 <Virtual classification screen>
FIG. 17 is a diagram illustrating an example of a virtual classification display screen (GUI) generated by the virtual classification program 124. As shown in FIG. 17, in the GUI window, a search function 1701 for searching for a file and a tree display 1702 using a virtual folder are displayed in the left pane, and a search function or a virtual folder is displayed in the right pane. Is selected, a search result 1703 of the corresponding file is displayed.

検索プログラム１２１は、検索結果を表示する際、記憶装置１３０におけるメタデータファイル１３１を使用する。 The search program 121 uses the metadata file 131 in the storage device 130 when displaying the search result.

仮想分類プログラム１２４は、仮想フォルダをＧＵＩ画面に表示する際、記憶装置１３０における仮想フォルダデータ１３３を使用する。また、仮想分類プログラム１２４は、仮想フォルダが選択されると、検索プログラム１２１を実行する。すなわち、仮想フォルダに付与された文字列で検索プログラムを実行するのと同一の処理を行う。検索プログラムは、記憶装置１３０におけるメタデータファイル１３１から、検索クエリの文字列を含むファイルを検索結果１７０３に表示する。検索クエリが２つの場合には、２つの文字列を共に含むファイルが検索結果として表示される。本実施形態では、検索クエリが３つ以上の場合については、詳細な説明はしないが、検索クエリが２つの場合と同様に処理可能である。 The virtual classification program 124 uses the virtual folder data 133 in the storage device 130 when displaying the virtual folder on the GUI screen. The virtual classification program 124 executes the search program 121 when a virtual folder is selected. That is, the same processing as executing the search program with the character string assigned to the virtual folder is performed. The search program displays a file including a search query character string in the search result 1703 from the metadata file 131 in the storage device 130. When there are two search queries, a file including both of the two character strings is displayed as a search result. In the present embodiment, the case where there are three or more search queries is not described in detail, but can be processed in the same manner as in the case where there are two search queries.

更新ボタン１７０７が押下されると、キーワード登録プログラム１２２、仮想フォルダ生成プログラム１２３、仮想分類プログラム１２４が順に実行され、表示画面（ＧＵＩ）が更新される。 When an update button 1707 is pressed, the keyword registration program 122, the virtual folder generation program 123, and the virtual classification program 124 are executed in order, and the display screen (GUI) is updated.

検索機能部分には、検索対象の属性を選択するためのプルダウン１７０４、検索クエリを入力するテキストボックス１７０５、検索処理を実行するための検索実行ボタン１７０６がある。仮想分類表示部分には、仮想上位フォルダ１７０８と仮想下位フォルダ１７０９が表示される。 The search function part includes a pull-down 1704 for selecting a search target attribute, a text box 1705 for inputting a search query, and a search execution button 1706 for executing a search process. In the virtual classification display portion, a virtual upper folder 1708 and a virtual lower folder 1709 are displayed.

仮想フォルダによるツリー表示１７０２は、記憶装置１３０における仮想フォルダデータが定義されている場合のみ表示される。定義されていない場合には表示されない。 The virtual folder tree display 1702 is displayed only when virtual folder data in the storage device 130 is defined. Not displayed if not defined.

検索プログラム１２１によって実行された検索クエリは、記憶装置１３０における検索ログデータに記憶される。 The search query executed by the search program 121 is stored in the search log data in the storage device 130.

検索の方法には大きく２つある。属性をプロダウン１７０４によって選択し、キーワード１７０５を入力して該当文書を検索する方法と、属性をプルダウン１７０４によって選択し、表示される仮想分類１７０２から１つの仮想フォルダを選択して該当文書を検索する方法である。前者の方法は仮想フォルダとは無関係であり、その場合、仮想分類１７０２のツリー表示では、どのフォルダも開かれていない状態となっている。 There are two major search methods. A method for selecting an attribute with a pro-down 1704 and inputting a keyword 1705 to search the corresponding document, and selecting an attribute with a pull-down 1704 and selecting one virtual folder from the displayed virtual classification 1702 to search for the corresponding document. It is a method to do. The former method is irrelevant to the virtual folder. In this case, no folder is opened in the tree display of the virtual classification 1702.

図１７では、属性「文書種別」が選択され、また、仮想上位フォルダ「契約」の子フォルダ「契約，製品ＡＢＣ」が選択された状態が示されている。検索結果１７０３には、属性「文書種別」において、「契約」と「製品ＡＢＣ」を共に含むファイルが表示されている。また、ファイル名１７１１、文書種別１７１２、取引先名１７１３などの属性のメタデータが表示されている。また、仮想上位フォルダ１７０８が選択されると、検索結果１７０３には、仮想上位フォルダの文字列を含むファイルが表示される。検索結果１７０３において、ユーザによってファイルが選択されると、オペレーティングシステムによって関連付けられたアプリケーションが起動し、当該ファイルが開かれる。 FIG. 17 shows a state where the attribute “document type” is selected and the child folder “contract, product ABC” of the virtual upper folder “contract” is selected. The search result 1703 displays a file including both “contract” and “product ABC” in the attribute “document type”. Further, metadata of attributes such as a file name 1711, a document type 1712, and a supplier name 1713 are displayed. When the virtual upper folder 1708 is selected, the search result 1703 displays a file including the character string of the virtual upper folder. In the search result 1703, when a file is selected by the user, an application associated by the operating system is activated and the file is opened.

仮想分類処理によって、例えば図１７に示すようなるユーザインターフェース（ＧＵＩ）を表示し、ユーザはそれを用いることにより、物理的に異なるフォルダに格納されたファイルを、仮想フォルダ毎に参照することが可能となる。そして、ユーザは、ファイルの実体が保存された物理フォルダを考慮せずとも、意味的な分類によってファイルを参照できる。また、ユーザは、ＧＵＩ上で属性を選択することも可能であり、属性毎に異なる仮想フォルダツリーが構成され、探したい観点でファイルの検索が可能となる。 For example, a user interface (GUI) as shown in FIG. 17 is displayed by the virtual classification process, and the user can refer to files stored in physically different folders for each virtual folder. It becomes. The user can refer to the file by semantic classification without considering the physical folder in which the file entity is stored. In addition, the user can select an attribute on the GUI, and a different virtual folder tree is configured for each attribute, and a file can be searched from the viewpoint of searching.

＜文書処理装置における処理概要＞
上述の構成を有する文書処理装置において行われる処理（図１７のＧＵＩ上での操作に対応する処理）の概要についてまず説明する。この際の動作主体は、特に断らない限りは中央処理装置１００であり、中央処理装置１００が各種プログラムを読み込み、実行する。 <Outline of processing in document processing apparatus>
First, an outline of processing (processing corresponding to the operation on the GUI in FIG. 17) performed in the document processing apparatus having the above-described configuration will be described. The operating subject at this time is the central processing unit 100 unless otherwise specified, and the central processing unit 100 reads and executes various programs.

まず、仮想分類プログラム１２４が実行される。仮想分類プログラム１２４は、記憶装置１３０からメタデータファイル１３１と仮想フォルダデータ１３３を読み込み、仮想フォルダデータ１３３に記載された仮想フォルダの定義に基づいて仮想フォルダ（図１７参照）を表示する。 First, the virtual classification program 124 is executed. The virtual classification program 124 reads the metadata file 131 and the virtual folder data 133 from the storage device 130, and displays the virtual folder (see FIG. 17) based on the definition of the virtual folder described in the virtual folder data 133.

次に、仮想分類プログラム１２４は、ユーザからの入力を受け付け、検索処理または、仮想フォルダが選択されると、メタデータファイル１３１から該当するファイルを検索し、検索結果１７０３に表示する。この際、使用された検索クエリは、記憶装置１３０における検索ログデータ１３２として保存する。 Next, the virtual classification program 124 receives an input from the user, and when a search process or a virtual folder is selected, the corresponding file is searched from the metadata file 131 and displayed in the search result 1703. At this time, the used search query is stored as search log data 132 in the storage device 130.

更新ボタン１７０７が押下されると、キーワード登録プログラム１２２、仮想フォルダ生成プログラム１２３、仮想分類プログラム１２４が順に実行される。 When an update button 1707 is pressed, the keyword registration program 122, the virtual folder generation program 123, and the virtual classification program 124 are executed in order.

キーワード登録プログラム１２２は、記憶装置１３０におけるメタデータファイル１３１と検索ログデータ１３２を読み込み、メタデータファイルから特徴的な単語（キーワード）を抽出し、抽出キーワードデータ１４１としてデータメモリ１４０に格納する。また、使用された検索クエリの統計情報を検索キーワード管理データ１４２としてデータメモリ１４０に格納する。また、メタデータファイル１３１において、抽出キーワードデータ１４１に登録されているキーワードと共に使用されている別のキーワードの統計情報を、共起キーワードデータ１４３としてデータメモリ１４０に格納する。 The keyword registration program 122 reads the metadata file 131 and the search log data 132 in the storage device 130, extracts characteristic words (keywords) from the metadata file, and stores them as extracted keyword data 141 in the data memory 140. Further, the statistical information of the used search query is stored in the data memory 140 as the search keyword management data 142. Further, in the metadata file 131, statistical information of another keyword used together with the keyword registered in the extracted keyword data 141 is stored in the data memory 140 as co-occurrence keyword data 143.

仮想フォルダ生成プログラム１２３は、データメモリ１４０から、抽出キーワードデータ１４１、検索キーワード管理データ１４２、共起キーワードデータ１４３を読み込み、これらのキーワードの特徴に基づいて仮想フォルダの定義情報を生成し、記憶装置１３０に仮想フォルダデータ１３３として格納する。この際、仮想上位フォルダの候補となるキーワードが格納されたデータを仮想上位フォルダデータ１４４としてデータメモリ１４０に格納する。また、仮想下位フォルダの候補となるキーワードが格納されたデータを仮想下位フォルダデータ１４５としてデータメモリ１４０に格納する。 The virtual folder generation program 123 reads extracted keyword data 141, search keyword management data 142, and co-occurrence keyword data 143 from the data memory 140, generates virtual folder definition information based on the characteristics of these keywords, and stores the storage device. 130 is stored as virtual folder data 133. At this time, data in which keywords that are candidates for a virtual upper folder are stored is stored in the data memory 140 as virtual upper folder data 144. In addition, data in which keywords that are candidates for virtual lower folders are stored is stored in the data memory 140 as virtual lower folder data 145.

仮想分類プログラム１２４は、記憶装置１３０からメタデータファイル１３１と仮想フォルダデータ１３３を読みこみ、仮想フォルダデータ１３３に記載された仮想フォルダの定義に基づいて仮想フォルダを表示する。そして、仮想分類プログラム１２４は、ユーザからの入力を受け付け、仮想フォルダが選択されると、メタデータファイル１３１から仮想フォルダに格納されるファイルを検索し、該当するファイルを表示する。それぞれの処理について、以下詳細に説明する。 The virtual classification program 124 reads the metadata file 131 and the virtual folder data 133 from the storage device 130, and displays the virtual folder based on the definition of the virtual folder described in the virtual folder data 133. The virtual classification program 124 receives an input from the user. When a virtual folder is selected, the virtual classification program 124 searches for a file stored in the virtual folder from the metadata file 131 and displays the corresponding file. Each process will be described in detail below.

＜キーワード登録処理＞
図１０は、キーワード登録プログラム１２２が実行するキーワード登録処理を説明するためのフローチャートである。ここでは、動作主体がキーワード登録プログラム１２２であるとして説明する。 <Keyword registration process>
FIG. 10 is a flowchart for explaining the keyword registration process executed by the keyword registration program 122. Here, a description will be given assuming that the operation subject is the keyword registration program 122.

ステップ１００１において、キーワード登録プログラム１２２は、仮想フォルダ生成対象の属性を１つ選択する。以降、属性として「文書種別」を選択した場合で説明する。なお、仮想フォルダを生成しなくてもよい属性は読み込む必要はない。 In step 1001, the keyword registration program 122 selects one attribute for virtual folder generation target. Hereinafter, a case where “document type” is selected as an attribute will be described. Note that it is not necessary to read an attribute that does not require generation of a virtual folder.

ステップ１００２において、キーワード登録プログラム１２２は、後述する抽出キーワード登録処理を行い、抽出キーワードデータ１４１を生成する。 In step 1002, the keyword registration program 122 performs extracted keyword registration processing described later, and generates extracted keyword data 141.

ステップ１００３において、キーワード登録プログラム１２２は、後述する検索キーワード登録処理を行い、検索キーワード管理データ１４２を生成する。 In step 1003, the keyword registration program 122 performs a search keyword registration process, which will be described later, and generates search keyword management data 142.

ステップ１００４において、キーワード登録プログラム１２２は、後述する共起キーワード登録処理を行い、共起キーワードデータ１４３を生成する。 In step 1004, the keyword registration program 122 performs co-occurrence keyword registration processing, which will be described later, and generates co-occurrence keyword data 143.

ステップ１００５において、キーワード登録プログラム１２２は、仮想フォルダ生成対象の属性すべてについて処理を行ったか否かを判定し、まだ処理していない属性があればステップ１００２に戻り、すべて処理済であれば処理を終了する。 In step 1005, the keyword registration program 122 determines whether or not processing has been performed for all the attributes for which the virtual folder is to be generated. If there are attributes that have not yet been processed, the process returns to step 1002. finish.

＜抽出キーワード登録処理＞
図１１は、キーワード登録プログラムが実行する、抽出キーワード登録処理を説明するためのフローチャートである。ここでは、動作主体がキーワード登録プログラム１２２であるとして説明する。 <Extracted keyword registration process>
FIG. 11 is a flowchart for explaining extracted keyword registration processing executed by the keyword registration program. Here, a description will be given assuming that the operation subject is the keyword registration program 122.

ステップ１１０１において、キーワード登録プログラム１２２は、記憶装置１３０からメタデータファイル１３１（ステップ１００１で選択された属性のメタデータ）を全て読み込む。 In step 1101, the keyword registration program 122 reads all the metadata files 131 (the metadata of the attribute selected in step 1001) from the storage device 130.

ステップ１１０２において、キーワード登録プログラム１２２は、読み込んだメタデータファイルからファイルを１つ選択しメタデータを読み込む。例えば、属性「文書種別」の値が「検収通知書１」であるデータを読み込んだ場合を考える。 In step 1102, the keyword registration program 122 selects one file from the read metadata file and reads the metadata. For example, let us consider a case where data having the attribute “document type” value “acknowledgment notice 1” is read.

ステップ１１０３において、キーワード登録プログラム１２２は、ステップ１１０２で読み込んだデータに対して形態素解析を行う。形態素解析の詳細については非特許文献１に開示されている。図１６Ａは、「検収通知書１」に対して形態素解析を行った結果を表す。「検収通知書１」は、「検収」、「通知」、「書」、「１」の４つの文字列に分割される。また、品詞の行には、それぞれの文字列が、名詞または未知語であることと、付属的な内容が記載されている。未知語とは、形態素解析の結果、品詞が不明と判定された文字列である。形態素解析は、内部で使用している辞書を元にして、入力文字列の品詞を判定しているため、辞書に登録されていない文字列は未知語として判定される。具体的には、製品名や個人名などの固有名詞が未知語となり得る。また、形態素解析は日本語の解析に利用されるため、英数字や記号などが辞書登録されていない場合がある。前述した例では、「１」が未知語と判定された場合を示した。 In step 1103, the keyword registration program 122 performs morphological analysis on the data read in step 1102. Details of morphological analysis are disclosed in Non-Patent Document 1. FIG. 16A shows the result of morphological analysis performed on “acknowledgment notice 1”. The “acknowledgment notice 1” is divided into four character strings “acceptance”, “notification”, “book”, and “1”. In addition, in the part of speech line, each character string is a noun or an unknown word, and ancillary contents are described. An unknown word is a character string whose part of speech is determined to be unknown as a result of morphological analysis. In the morphological analysis, the part of speech of the input character string is determined based on the dictionary used internally. Therefore, the character string not registered in the dictionary is determined as an unknown word. Specifically, proper nouns such as product names and personal names can be unknown words. In addition, since morphological analysis is used for Japanese analysis, alphanumeric characters and symbols may not be registered in the dictionary. In the example described above, the case where “1” is determined to be an unknown word is shown.

ステップ１１０４において、キーワード登録プログラム１２２は、ステップ１１０３の形態素解析の結果をもとに、名詞または未知語が１つ以上連続した文字列を抽出し、この文字列を抽出キーワードとする。このような品詞パターンの文字列をキーワードとして抽出する手法は一般によく用いられている。抽出されたキーワードをさらに詳細に分析し、よりキーワードの抽出精度を高める技術も多数提案されている。 In step 1104, the keyword registration program 122 extracts a character string in which one or more nouns or unknown words are continued based on the result of the morphological analysis in step 1103, and uses the character string as an extracted keyword. A technique for extracting a character string of such a part-of-speech pattern as a keyword is generally used. Many techniques for analyzing extracted keywords in more detail and improving the keyword extraction accuracy have been proposed.

ステップ１１０５において、キーワード登録プログラム１２２は、予め定義されたルールに従って、抽出キーワードのフィルタリングを行う。仮想フォルダ生成プログラム１２３による仮想フォルダ生成処理では、抽出されたキーワードを基に仮想フォルダの生成が行われるため、仮想フォルダとして不適であるキーワードが含まれると、不適当な仮想フォルダが生成してしまう可能性がある。このため、仮想フォルダとして不適と考えられるキーワードをこの処理で除外する。例えば「検収通知書１」というキーワードから仮想フォルダを定義する場合、ユーザにとっての分類のわかりやすさの観点から、数字は除外した方が望ましいと考えられる。なお、フィルタリングを実現するには、予め除外すべき文字や特殊な名詞を辞書やＤＢに登録しておき、それを参照して除外すべき文字か否か判断する。除外すべき文字としては、米印、矢印等の記号や、数字である（ただし、数字はキーワードとして必要な場合もあるため、常に除外対象とするのは不適である。従って、最終的にユーザに除外するか否かについて確認するようにしても良い）。図１６Ｃは、図１６Ｂにおけるキーワードから数字を含むキーワードを除外した例を示している。また、名詞の中で特殊なパターンも除外すべきである。例えば、代名詞、ナイ形容詞語幹、一部の接尾辞などである。ナイ形容詞語幹とは、「申し訳」、「大人げ」などの「〜ない」の形をとる名詞である。また、除外すべき名詞接尾辞としては、例えば、「〜君」、「〜さん」などの人名に続く敬称や、「休みがち」の「がち」や、「勝ったも同然」の「同然」などの形容動詞語幹などがある。 In step 1105, the keyword registration program 122 performs filtering of extracted keywords according to a predefined rule. In the virtual folder generation process by the virtual folder generation program 123, a virtual folder is generated based on the extracted keyword. Therefore, if a keyword that is inappropriate as a virtual folder is included, an inappropriate virtual folder is generated. there is a possibility. For this reason, keywords that are considered inappropriate as virtual folders are excluded in this process. For example, when a virtual folder is defined from the keyword “acknowledgment notice 1”, it is desirable to exclude numbers from the viewpoint of easy understanding of classification for the user. In order to realize filtering, a character or special noun to be excluded is registered in a dictionary or DB in advance, and it is determined whether or not it is a character to be excluded. Characters to be excluded are symbols such as American signs and arrows, and numbers (however, since numbers may be necessary as keywords, it is not appropriate to always exclude them). You may ask if you want to exclude them). FIG. 16C shows an example in which keywords including numbers are excluded from the keywords in FIG. 16B. Also, special patterns in nouns should be excluded. For example, pronouns, ny adjective stems, some suffixes, etc. The Nai adjective stem is a noun that takes the form of “not” such as “sorry” and “adult”. The noun suffixes that should be excluded include, for example, honorific names that follow the names of people such as “~ Kimi” and “~ san”, “Gotchi” for “I tend to take a break”, and “Same” for “I won”. There are adjective verb stems.

ステップ１１０６において、キーワード登録プログラム１２２は、データメモリにおける抽出キーワードデータ１４１を更新する。すなわち、キーワード登録プログラム１２２は、ステップ１１０２からステップ１１０５の過程で取得した抽出キーワードを登録する。抽出キーワードデータ１４１に、すでに登録されている抽出キーワードがあった場合は、頻度を１加算する。抽出キーワードデータ１４１に、まだ登録されていない抽出キーワードであった場合は、その抽出キーワードを頻度１として登録する。 In step 1106, the keyword registration program 122 updates the extracted keyword data 141 in the data memory. That is, the keyword registration program 122 registers the extracted keyword acquired in the process from step 1102 to step 1105. If there is an extracted keyword already registered in the extracted keyword data 141, the frequency is incremented by one. If the extracted keyword is not yet registered in the extracted keyword data 141, the extracted keyword is registered as frequency 1.

ステップ１１０７において、キーワード登録プログラム１２２は、全メタデータに対してステップ１１０２からステップ１１０６までの処理を行ったか否かを判定し、まだ行っていないメタデータがある場合にはステップ１１０２に戻り、すべてのメタデータが処理済の場合は処理を終了する。 In step 1107, the keyword registration program 122 determines whether or not the processing from step 1102 to step 1106 has been performed on all the metadata. If there is metadata that has not been performed yet, the keyword registration program 122 returns to step 1102 and all the metadata is processed. If the metadata has been processed, the process is terminated.

＜検索キーワード登録処理＞
図１２は、キーワード登録プログラムが実行する、検索キーワード登録処理を説明するためのフローチャートである。ここでは、動作主体がキーワード登録プログラム１２２であるとして説明する。 <Search keyword registration process>
FIG. 12 is a flowchart for explaining search keyword registration processing executed by the keyword registration program. Here, a description will be given assuming that the operation subject is the keyword registration program 122.

ステップ１２０１において、キーワード登録プログラム１２２は、ステップ１００１で選択された属性について、記憶装置１３０から検索ログデータ１３２を読み込む。 In step 1201, the keyword registration program 122 reads the search log data 132 from the storage device 130 for the attribute selected in step 1001.

ステップ１２０２において、キーワード登録プログラム１２２は、読み込んだ検索ログデータ１３２からデータを１つ選択する。例えば、検索クエリＡが「契約書」、検索クエリＢが「文書管理システム」、日時が「２００９／０１／２２２３：１２：０５」の場合が考えられる。 In step 1202, the keyword registration program 122 selects one piece of data from the read search log data 132. For example, a case where the search query A is “contract”, the search query B is “document management system”, and the date and time is “2009/01/22 23:12:05” can be considered.

ステップ１２０３において、キーワード登録プログラム１２２は、読み込んだデータをもとに、データメモリ１４０内の検索キーワード管理データ１４２を更新する。具体的には、読み込んだデータにおける検索クエリＡと検索クエリＢの組み合わせが、検索キーワード管理データ１４２内に含まれていれば、検索キーワード管理データ１４２における該当データの頻度を１だけ加算する。含まれていなければ、読み込んだデータのエントリを追加し、頻度を１として登録する。 In step 1203, the keyword registration program 122 updates the search keyword management data 142 in the data memory 140 based on the read data. Specifically, if the combination of the search query A and the search query B in the read data is included in the search keyword management data 142, the frequency of the corresponding data in the search keyword management data 142 is incremented by one. If not included, an entry of the read data is added and the frequency is registered as 1.

ステップ１２０４において、キーワード登録プログラム１２２は、検索ログデータ１３２内の全データを処理したか否かを判定する。全データを処理していなければステップ１２０２に戻り、全データを処理済であれば処理を終了する。 In step 1204, the keyword registration program 122 determines whether all data in the search log data 132 has been processed. If all data has not been processed, the process returns to step 1202, and if all data has been processed, the process ends.

＜共起キーワード登録処理＞
図１３は、キーワード登録プログラムが実行する、共起キーワード登録処理を説明するためのフローチャートである。ここでは、動作主体がキーワード登録プログラム１２２であるとして説明する。 <Co-occurrence keyword registration process>
FIG. 13 is a flowchart for explaining the co-occurrence keyword registration process executed by the keyword registration program. Here, a description will be given assuming that the operation subject is the keyword registration program 122.

ステップ１３０１において、キーワード登録プログラム１２２は、ステップ１００１で選択された属性について、記憶装置１３０からメタデータファイル１３１をすべて読み込み、また、データメモリ１４０から抽出キーワードデータ１４１を読み込む。 In step 1301, the keyword registration program 122 reads all the metadata files 131 from the storage device 130 and the extracted keyword data 141 from the data memory 140 for the attribute selected in step 1001.

ステップ１３０２において、キーワード登録プログラム１２２は、読み込んだ抽出キーワードデータの中から、抽出キーワードをひとつ読み込む。ここでは、例えば、抽出キーワードを「契約書」として説明する。 In step 1302, the keyword registration program 122 reads one extracted keyword from the read extracted keyword data. Here, for example, the extracted keyword is described as “contract”.

ステップ１３０３において、キーワード登録プログラム１２２は、メタデータファイル１３１からメタデータを１つ読み込む。例えば、メタデータを「契約書（検索システム）」として説明する。 In step 1303, the keyword registration program 122 reads one piece of metadata from the metadata file 131. For example, the metadata is described as “contract (search system)”.

ステップ１３０４において、キーワード登録プログラム１２２は、抽出キーワードが、メタデータ内に含まれているか否かを判定する。含まれていない場合はステップ１３０８に進む。含まれている場合はステップ１３０５に進む。抽出キーワード「契約書」、メタデータ「契約書（検索システム）」の場合は、メタデータ内に、「契約書」という文字列が含まれるためステップ１３０５に進む。 In step 1304, the keyword registration program 122 determines whether the extracted keyword is included in the metadata. If not included, the process proceeds to step 1308. If it is included, the process proceeds to step 1305. In the case of the extracted keyword “contract” and metadata “contract (search system)”, since the character string “contract” is included in the metadata, the process proceeds to step 1305.

ステップ１３０５において、キーワード登録プログラム１２２は、メタデータを形態素解析する。上記の例の場合には、「契約」「書」「（」「検索」「システム」「）」のように分解され、それぞれについて品詞情報が付与される。 In step 1305, the keyword registration program 122 performs morphological analysis on the metadata. In the case of the above example, it is decomposed like “contract” “book” “(” “search” “system” “)”, and part-of-speech information is given to each.

ステップ１３０６において、キーワード登録プログラム１２２は、形態素解析後の各単語を基に、ステップ１３０２で選択された抽出キーワードを含まず、かつ含まれず、かつ隣接していないキーワードを抽出する。キーワードの抽出方法は、前述した抽出キーワード登録処理におけるキーワード抽出方法と同様であり、名詞または未知語が連続した文字列をキーワードとみなす。上記の例の場合には、「契約書」を含まず、かつ「契約書」に含まれず、かつ「契約書」と隣接していないキーワードは、「検索」「システム」「検索システム」の３パターンが考えられる。他の例として、抽出キーワード「契約書」、メタデータ「基本契約書」の場合がある。この場合、形態素解析の結果、「基本契約書」は、「基本」「契約」「書」のように分解される。抽出キーワード「契約書」を含まないキーワードとして、「基本」「契約」「書」がある。この中で、「契約」と「書」は「契約書」に含まれるため不適である。また、「基本」は「契約書」と隣接したキーワードであるため不適である。さらに、他の例として、抽出キーワード「納品書」、メタデータ「納品書兼検収依頼書」の場合、「納品書兼検収依頼書」は、形態素解析の結果、「納品」「書」「兼」「検収」「依頼」「書」のように分解される。この中で、「納品書」と「検収依頼書」では、「検収依頼書」が「納品書」の文言を含まず、２つが「兼」で区切られているため、共起キーワードとして適していると判断される。 In step 1306, the keyword registration program 122 extracts a keyword that does not include and does not include the extracted keyword selected in step 1302 based on each word after morphological analysis. The keyword extraction method is the same as the keyword extraction method in the extracted keyword registration process described above, and a character string in which nouns or unknown words are continued is regarded as a keyword. In the case of the above example, keywords that do not include “contract”, are not included in “contract”, and are not adjacent to “contract” are “search”, “system”, and “search system”. Possible patterns. Other examples include the extracted keyword “contract” and metadata “basic contract”. In this case, as a result of the morphological analysis, the “basic contract” is decomposed into “basic”, “contract”, and “book”. Keywords that do not include the extracted keyword “contract” include “basic”, “contract”, and “document”. Among these, “contract” and “book” are inappropriate because they are included in “contract”. “Basic” is not suitable because it is a keyword adjacent to “contract”. Furthermore, as another example, in the case of the extracted keyword “delivery note” and metadata “delivery note / acceptance request document”, the “delivery note / acceptance request document” ”“ Acceptance ”“ Request ”“ Book ”. Among these, “delivery request” and “acceptance request form” are not suitable for co-occurrence keywords because “acceptance request form” does not include the word “delivery form” and two are separated by “cum”. It is judged that

ステップ１３０７において、キーワード登録プログラム１２２は、データメモリ１４０における共起キーワードデータ１４３を更新する。具体的には、ステップ１３０２で選択した抽出キーワードと、ステップ１３０６で抽出したキーワードの組み合わせを登録する。上記例の場合には、３パターンのデータを登録する。１つは、「契約書」と「検索」、２つ目は、「契約書」と「システム」、３つ目は、「契約書」と「検索システム」である。これらのデータが、共起キーワードデータ１４３内に含まれていれば、共起キーワードデータ１４３における該当データの頻度を１だけ加算して登録する。含まれていなければ、そのデータのエントリを新たに追加し、頻度を１として登録する。 In step 1307, the keyword registration program 122 updates the co-occurrence keyword data 143 in the data memory 140. Specifically, the combination of the extracted keyword selected in step 1302 and the keyword extracted in step 1306 is registered. In the case of the above example, three patterns of data are registered. One is “contract” and “search”, the second is “contract” and “system”, and the third is “contract” and “search system”. If these data are included in the co-occurrence keyword data 143, the frequency of the corresponding data in the co-occurrence keyword data 143 is added and registered. If not included, a new entry for the data is added and the frequency is registered as 1.

ステップ１３０８において、キーワード登録プログラム１２２は、全メタデータに対して処理を行ったか否かを判定する。全メタデータを処理していなければ、ステップ１３０３に戻り、全メタデータを処理済であればステップ１３０９に進む。 In step 1308, the keyword registration program 122 determines whether or not processing has been performed on all metadata. If all the metadata has not been processed, the process returns to step 1303, and if all the metadata has been processed, the process proceeds to step 1309.

ステップ１３０９において、キーワード登録プログラム１２２は、全抽出キーワードを処理したか否かを判定する。全抽出キーワードを処理していなければ、ステップ１３０２に戻り、全抽出キーワードを処理済であれば処理を終了する。 In step 1309, the keyword registration program 122 determines whether all extracted keywords have been processed. If all the extracted keywords have not been processed, the process returns to step 1302, and if all the extracted keywords have been processed, the process ends.

＜仮想フォルダ生成処理＞
図１４は、仮想フォルダ生成プログラム１２３が実行する仮想フォルダ生成処理を説明するためのフローチャートである。仮想フォルダ生成処理では、データメモリ１４０における抽出キーワードデータ１４１、検索キーワード管理データ１４２、共起キーワードデータ１４３を基に、仮想上位フォルダデータ１４４と仮想下位フォルダデータ１４５を生成し、さらにそのデータを基に、記憶装置１３０における仮想フォルダデータ１３３を生成する。 <Virtual folder generation process>
FIG. 14 is a flowchart for explaining virtual folder generation processing executed by the virtual folder generation program 123. In the virtual folder generation process, the virtual upper folder data 144 and the virtual lower folder data 145 are generated based on the extracted keyword data 141, the search keyword management data 142, and the co-occurrence keyword data 143 in the data memory 140. Then, virtual folder data 133 in the storage device 130 is generated.

ステップ１４０１において、仮想フォルダ生成プログラム１２３は、仮想上位フォルダデータ（図８参照）を生成し、スコアの降順にソートする。具体的には、仮想フォルダ生成プログラム１２３は、まずデータメモリ１４０における抽出キーワードデータ１４１と検索キーワード管理データ１４２を読み込み、それらのデータをマージする。マージする際には、抽出キーワードデータ１４１における抽出キーワード、または検索キーワード管理データ１４２における検索クエリＡ、または検索クエリＢ、の文字列を１つのエントリとして合算して登録する。例えば、抽出キーワード「契約書」が頻度１００であり、検索クエリＡと検索クエリＢの組み合わせが、「契約書」と「Ａ社」で頻度が８０であった場合は、図８における１行目のデータのように登録する。その際のスコア８０４については後述する。なお、検索クエリＡ、検索クエリＢのいずれか一方に該当する文字列があれば、合算の対象となる。例えば、検索クエリＡと検索クエリＢの組み合わせが、「契約書」及び「Ａ社」で頻度が１００の場合と、「契約書」及び「Ｂ社」で頻度が５０のデータがあった場合には、仮想上位フォルダデータ１４４における検索頻度８０３は１５０となる。スコア８０４は、抽出キーワード頻度８０２と検索頻度８０３を重み付き加算することにより求める。図８の例では、メタデータ頻度の重みを１、検索頻度の重みを５として、加算した結果を表している。重み付け加算を行う理由は、どのようなデータから取得したかによってユーザにとっての重要の度合いが異なるためである。図８の例では、検索頻度の重みを、メタデータ頻度の５倍の重みを与えている。これは、検索に使用された文字列はユーザによって意図的に指定された文字列であるため重要度が高いと考えられるためである。全データに関してスコアの算出まで終了したら、スコアの大きい順にソートする。なお、仮想上位フォルダデータの生成は、仮想フォルダ生成対象の全属性について行う。 In step 1401, the virtual folder generation program 123 generates virtual upper folder data (see FIG. 8) and sorts it in descending order of score. Specifically, the virtual folder generation program 123 first reads the extracted keyword data 141 and the search keyword management data 142 in the data memory 140, and merges these data. When merging, the character strings of the extracted keyword in the extracted keyword data 141 or the search query A or the search query B in the search keyword management data 142 are added and registered as one entry. For example, if the extracted keyword “contract” has a frequency of 100, and the combination of search query A and search query B is “contract” and “Company A” with a frequency of 80, the first line in FIG. Register like the data. The score 804 at that time will be described later. If there is a character string corresponding to either one of the search query A and the search query B, the character string is added. For example, when the combination of the search query A and the search query B is “contract” and “company A” with a frequency of 100, and there is data with a frequency of 50 for “contract” and “company B”. The search frequency 803 in the virtual upper folder data 144 is 150. The score 804 is obtained by adding the extracted keyword frequency 802 and the search frequency 803 with weights. In the example of FIG. 8, the metadata frequency weight is 1 and the search frequency weight is 5, and the result of addition is shown. The reason for performing the weighted addition is that the degree of importance for the user differs depending on what data is obtained. In the example of FIG. 8, the search frequency is weighted five times as much as the metadata frequency. This is because the character string used for the search is a character string intentionally designated by the user, and is considered to be highly important. When the calculation of the scores for all data is completed, the data are sorted in descending order of scores. Note that the virtual upper folder data is generated for all the attributes of the virtual folder generation target.

ステップ１４０２からステップ１４０９までの処理では、仮想上位フォルダデータ１４４に基づいて仮想フォルダの検索条件となるキーワードを決定し、仮想フォルダデータ１３３として生成する。 In the processing from step 1402 to step 1409, a keyword serving as a search condition for the virtual folder is determined based on the virtual upper folder data 144 and generated as virtual folder data 133.

ステップ１４０２において、仮想フォルダ生成プログラム１２３は、仮想フォルダ生成対象の属性を１つ選択する。 In step 1402, the virtual folder generation program 123 selects one attribute for generating a virtual folder.

ステップ１４０３において、仮想フォルダ生成プログラム１２３は、仮想上位フォルダデータ１４４からキーワードを１つ選択する。この際、未処理のキーワードの中から頻度が最も高いキーワードを選択する。 In step 1403, the virtual folder generation program 123 selects one keyword from the virtual upper folder data 144. At this time, a keyword having the highest frequency is selected from unprocessed keywords.

ステップ１４０５において、仮想フォルダ生成プログラム１２３は、ステップ１４０４で採用されたキーワードを仮想上位フォルダにおけるキーワードとして仮想上位フォルダデータ１４４に登録する。この際、前述したように必要に応じて登録済みの仮想上位フォルダと、その仮想下位フォルダのデータを削除する。 In step 1405, the virtual folder generation program 123 registers the keyword adopted in step 1404 in the virtual upper folder data 144 as a keyword in the virtual upper folder. At this time, as described above, the registered virtual upper folder and the data of the virtual lower folder are deleted as necessary.

ステップ１４０６において、仮想フォルダ生成プログラム１２３は、ステップ１４０５で登録した仮想上位フォルダのキーワードを基にして仮想下位フォルダを生成、即ち仮想下位フォルダデータ１４５を登録する。仮想下位フォルダデータの生成については後述する。 In step 1406, the virtual folder generation program 123 generates a virtual lower folder based on the keyword of the virtual upper folder registered in step 1405, that is, registers the virtual lower folder data 145. The generation of the virtual lower folder data will be described later.

ステップ１４０７において、仮想フォルダ生成プログラム１２３は、仮想上位フォルダデータにおける全キーワードを処理したか否かを判定する。処理していないキーワードがあれば処理はステップ１４０８に進み、全キーワードが処理済であれば処理はステップ１４０９に進む。 In step 1407, the virtual folder generation program 123 determines whether all keywords in the virtual upper folder data have been processed. If there is an unprocessed keyword, the process proceeds to step 1408, and if all keywords have been processed, the process proceeds to step 1409.

ステップ１４０８において、仮想フォルダ生成プログラム１２３は、仮想上位フォルダデータ１４４における仮想上位フォルダの数が規定値に達したか否かを判定する。既定値に達していなければ処理はステップ１４０３に進み、既定値に達していれば処理はステップ１４０９に進む。 In step 1408, the virtual folder generation program 123 determines whether or not the number of virtual upper folders in the virtual upper folder data 144 has reached a specified value. If the predetermined value has not been reached, the process proceeds to step 1403. If the predetermined value has been reached, the process proceeds to step 1409.

ステップ１４０９において、仮想フォルダ生成プログラム１２３は、仮想フォルダ生成対象の全属性を処理したか否かを判定する。全属性を処理していなければステップ１４０２に進み、全属性を処理済であれば処理を終了する。 In step 1409, the virtual folder generation program 123 determines whether all the attributes of the virtual folder generation target have been processed. If all attributes have not been processed, the process proceeds to step 1402, and if all attributes have been processed, the process ends.

以上のような処理によって、図４に示されるような仮想フォルダデータ１３３が生成される。 Through the processing as described above, virtual folder data 133 as shown in FIG. 4 is generated.

＜仮想下位フォルダ生成処理＞
図１５は、仮想フォルダ生成プログラム１２３が実行する仮想下位フォルダ生成処理を説明するためのフローチャートである。仮想下位フォルダ生成処理では、仮想上位フォルダに指定されたキーワードを基に仮想下位フォルダとなるキーワードを選定する。ここでは、動作主体が仮想フォルダ生成プログラム１２３であるとして説明する。 <Virtual subfolder generation process>
FIG. 15 is a flowchart for explaining a virtual lower folder generation process executed by the virtual folder generation program 123. In the virtual lower folder generation process, a keyword to be a virtual lower folder is selected based on the keyword specified for the virtual upper folder. Here, a description will be given assuming that the operation subject is the virtual folder generation program 123.

ステップ１５０１において、仮想フォルダ生成プログラム１２３は、仮想上位フォルダのキーワードを基に仮想下位フォルダデータを生成し、スコアの降順にソートする。具体的には、まずデータメモリ１４０における抽出キーワードデータ１４１と、検索キーワード管理データ１４２と、共起キーワードデータ１４３を読み込み、仮想上位フォルダデータ１４４を生成する際と同様に、それらのデータをマージする。マージする際には、抽出キーワードデータ１４１、検索キーワード管理データ１４２、共起キーワードデータ１４３における、抽出キーワード、検索クエリＡ、検索クエリＢ、共起キーワード、の文字列を１つのエントリとして合算して登録する。この際の合算には２パターンある。１つ目のパターンは、仮想下位フォルダデータ１４５において、キーワードＢがＮｕｌｌ値となる場合であり、もう１つのパターンは、キーワードＢがＮｕｌｌ値とならないパターンである。それぞれのパターンについて説明する。まず、キーワードＢがＮｕｌｌ値となる場合について説明する。この場合は、抽出キーワードデータ１４１における抽出キーワード、あるいは検索キーワード管理データ１４２における検索クエリＢがＮｕｌｌ値であるデータの検索クエリＡ内に、仮想上位フォルダのキーワードを含むデータを合算する。例えば、図９における１つ目のデータが該当する。これは、仮想上位フォルダのキーワード「契約」を含む文字列「契約書」のデータを合算した結果を表している。この場合、抽出キーワード頻度９０３が１００で、検索頻度９０４が８０となった場合を表している。キーワードＢ９０２、及び共起頻度９０５は、使用していない。また、スコア９０６は仮想下位フォルダとしての、適合の度合いを示している。スコア９０６が高いほど仮想下位フォルダとしての適合の度合いが高いことを表す。スコア９０６の算出方法については後述する。次にキーワードＢがＮｕｌｌ値とならないパターンについて説明する。この場合は、検索キーワード管理データ１４２における検索クエリＡと検索クエリＢの組み合わせのどちらかが仮想上位フォルダのキーワードであるデータと、共起キーワードデータ１４３における抽出キーワードが仮想上位フォルダのキーワードであるデータを合算する。例えば、図９における２つ目のデータが該当する。スコア９０６は、抽出キーワード頻度９０３、検索頻度９０４、共起頻度９０５を重み付き加算することにより求める。図９の例では、メタデータ頻度の重みを２、検索頻度の重みを１０、共起頻度の重みを１として、加算した結果を表している。重み付け加算を行う理由は、仮想上位フォルダ生成の際と同様である。全データに関してスコア算出まで終了したら、スコアの大きい順にソートする。 In step 1501, the virtual folder generation program 123 generates virtual lower folder data based on the keywords of the virtual upper folder and sorts them in descending order of score. Specifically, the extracted keyword data 141, the search keyword management data 142, and the co-occurrence keyword data 143 in the data memory 140 are first read and merged in the same manner as when the virtual upper folder data 144 is generated. . When merging, the character strings of the extracted keyword, search query A, search query B, and co-occurrence keyword in the extracted keyword data 141, search keyword management data 142, and co-occurrence keyword data 143 are added together as one entry. sign up. There are two patterns of summation at this time. The first pattern is a case in which the keyword B has a null value in the virtual lower folder data 145, and the other pattern is a pattern in which the keyword B does not have a null value. Each pattern will be described. First, a case where the keyword B has a null value will be described. In this case, the data including the keyword of the virtual upper folder is added to the search keyword A in which the extracted keyword in the extracted keyword data 141 or the search query B in the search keyword management data 142 has a null value. For example, the first data in FIG. This represents the result of adding up the data of the character string “contract” including the keyword “contract” in the virtual upper folder. In this case, the extracted keyword frequency 903 is 100, and the search frequency 904 is 80. The keyword B902 and the co-occurrence frequency 905 are not used. A score 906 indicates the degree of matching as a virtual lower folder. The higher the score 906, the higher the degree of matching as a virtual lower folder. A method for calculating the score 906 will be described later. Next, a pattern in which the keyword B does not become a null value will be described. In this case, data in which one of the combinations of search query A and search query B in the search keyword management data 142 is a keyword in the virtual upper folder, and data in which the extracted keyword in the co-occurrence keyword data 143 is a keyword in the virtual upper folder. Add together. For example, the second data in FIG. The score 906 is obtained by weighted addition of the extracted keyword frequency 903, search frequency 904, and co-occurrence frequency 905. In the example of FIG. 9, the metadata frequency weight is 2, the search frequency weight is 10, and the co-occurrence frequency weight is 1. The reason for performing the weighted addition is the same as that when generating the virtual upper folder. When all the data has been calculated, the score is sorted in descending order.

ステップ１５０２からステップ１５０８までの処理では、ステップ１５０１で生成した仮想下位フォルダデータ１４５に基づいて、仮想上位フォルダの仮想下位フォルダを決定し、仮想フォルダデータ１３３として記憶装置１３０に格納する処理である。以降、仮想下位フォルダデータ１４５においてスコアが上位のデータから順に処理が行われる。 The processing from step 1502 to step 1508 is processing for determining the virtual lower folder of the virtual upper folder based on the virtual lower folder data 145 generated in step 1501 and storing it as the virtual folder data 133 in the storage device 130. Thereafter, in the virtual lower folder data 145, processing is performed in order from the data with the higher score.

ステップ１５０２において、仮想フォルダ生成プログラム１２３は、仮想下位フォルダデータ１４５内のすべてのキーワードを処理したか否かを判定する。すべてのキーワードを処理済であれば処理を終了する。未処理のキーワードが含まれていればステップ１５０３に進む。 In step 1502, the virtual folder generation program 123 determines whether all keywords in the virtual lower folder data 145 have been processed. If all keywords have been processed, the process ends. If an unprocessed keyword is included, the process proceeds to step 1503.

ステップ１５０３において、仮想フォルダ生成プログラム１２３は、仮想下位フォルダデータ１４５（図９参照）から未処理のデータ（Ｐとする）を１つ選択する。 In step 1503, the virtual folder generation program 123 selects one piece of unprocessed data (P) from the virtual lower folder data 145 (see FIG. 9).

ステップ１５０４において、仮想フォルダ生成プログラム１２３は、スコアが一定値以上であるか否かを判定する。条件を満たす場合はステップ１５０５に進み、満たさない場合は処理を終了する。 In step 1504, the virtual folder generation program 123 determines whether or not the score is a certain value or more. If the condition is satisfied, the process proceeds to step 1505. If not satisfied, the process ends.

ステップ１５０５において、仮想フォルダ生成プログラム１２３は、ＰにおけるキーワードＡ・Ｂの文字列の長さが共に一定以上であるか否かを判定する。この条件を満たす場合はステップ１５０５に進み、満たさない場合はステップ１５０２に戻る。 In step 1505, the virtual folder generation program 123 determines whether or not the lengths of the character strings of the keywords A and B in P are both greater than or equal to a certain value. If this condition is satisfied, the process proceeds to step 1505. Otherwise, the process returns to step 1502.

ステップ１５０６において、仮想フォルダ生成プログラム１２３は、仮想下位フォルダデータ１４５内に、ＰのキーワードＡ・Ｂの文字列を共に含み、かつ、スコアが同程度以上のより長いキーワードが存在するか否かを判定する。ここで、「同程度以上」とは、当該スコア値をＳＣとすると、ＳＣよりも多少小さいスコア値も許容する意味であり、スコア値が（ＳＣ−α）以上であると表現可能である。例えば、ＰのキーワードＡ・Ｂが（契約，委託）でスコア９０６が６１２であり、仮想下位フォルダデータ１４５内に、キーワードＡ・Ｂが（契約，業務委託）でスコア９０６が６４５であった場合が該当する。この場合、ステップ１５０６の条件を満たすためステップ１５０２に戻る。すなわち、Ｐは仮想下位フォルダとして生成されないことになる。スコアがほぼ同程度であった場合、キーワードの文字列がより長い方が仮想下位フォルダの性質上適しているからである。ステップ１５０６の条件を満たさない場合はステップ１５０７に進む。ステップ１５０６の処理は、同じようなキーワードの仮想下位フォルダが何個も生成されないようにするための処理である。上記の例で、キーワードＡ・Ｂが（契約，委託）と（契約，業務委託）だとスコア値が６１２と６４５で同程度となっている。この場合、（契約，委託）で仮想下位フォルダは生成されず、（契約，業務委託）で仮想下位フォルダが生成される。後者の方が前者よりも具体的で分かり易いからである。そして、（契約，委託）では仮想下位フォルダが生成されないため、キーワード（契約，委託）を含む文書（業務委託を含む文書は除かれる）は、キーワード「契約」の仮想上位フォルダの直下に分類されることになる。 In step 1506, the virtual folder generation program 123 determines whether or not there is a longer keyword that includes both the character strings of the keywords A and B of P and has a score equal to or higher in the virtual lower folder data 145. judge. Here, “same level or higher” means that if the score value is SC, a score value slightly smaller than SC is allowed, and the score value can be expressed as (SC−α) or higher. For example, when the keyword A / B of P is (contract, consignment) and the score 906 is 612, the keyword A / B is (contract, business consignment) and the score 906 is 645 in the virtual lower folder data 145. Is applicable. In this case, since the condition of step 1506 is satisfied, the process returns to step 1502. That is, P is not generated as a virtual lower folder. This is because, when the scores are almost the same, a longer keyword character string is more suitable for the nature of the virtual lower folder. If the condition of step 1506 is not satisfied, the process proceeds to step 1507. The processing in step 1506 is processing for preventing a number of virtual subfolders having similar keywords from being generated. In the above example, if the keywords A and B are (contract, consignment) and (contract, business consignment), the score values are the same at 612 and 645. In this case, a virtual subfolder is not created in (contract, consignment), but a virtual subfolder is created in (contract, business consignment). This is because the latter is more specific and easier to understand than the former. Since (contract, consignment) does not generate a virtual subfolder, documents that include the keyword (contract, consignment) (excluding documents that include business consignment) are classified directly under the virtual upper folder of the keyword “contract”. Will be.

ステップ１５０７において、仮想フォルダ生成プログラム１２３は、Ｐのキーワードを仮想下位フォルダとして、記憶装置１３０における仮想フォルダデータ１３３に格納する。 In step 1507, the virtual folder generation program 123 stores the P keyword as a virtual lower folder in the virtual folder data 133 in the storage device 130.

ステップ１５０８において、仮想フォルダ生成プログラム１２３は、対象の仮想上位フォルダに対する仮想下位フォルダ数が規定値に達したか否かを判定する。条件を満たさない場合はステップ１５０２に戻り、条件を満たす場合は処理を終了する。 In step 1508, the virtual folder generation program 123 determines whether or not the number of virtual lower folders for the target virtual upper folder has reached a specified value. If the condition is not satisfied, the process returns to step 1502, and if the condition is satisfied, the process ends.

＜まとめ＞
以上説明したように、本実施の形態によれば、メタデータファイル上で管理されている複数の物理的なファイル（物理ファイル）のメタデータ（特に、各属性情報）を構成する文字列から形態素解析等を用いて複数のキーワードを抽出し、また、ユーザが検索時に使用した検索クエリのログデータから検索クエリのキーワードを抽出し、また、メタデータから抽出したキーワードと共に出現することが多いキーワードを、メタデータから抽出し、それらのキーワードから、仮想フォルダとしての適合の度合いを示すスコアを算出する。そして、スコアが所定値以上のキーワードを用いて規定数分の仮想上位フォルダを生成し、さらに、仮想上位フォルダのキーワードを含む別のキーワードを用いて、その仮想上位フォルダと関連付けられる仮想下位フォルダを生成する。また、生成した仮想上位フォルダと仮想下位フォルダとの関係、及び仮想上位フォルダ及び仮想下位フォルダの内容を表示する仮想分類表示（図１７）を画面上にＧＵＩとして表示する。このようにすることにより、ファイルを自動的に仮想分類すことが容易になり、仮想フォルダを使用したファイル管理を効率的に実現できる。一般に、人間が仮想フォルダを生成する場合、出現頻度が多い文字列や、検索に頻繁に使用する文字列や、出現頻度が多い文字列と共に出現することが多い文字列で仮想フォルダを定義する傾向があると考えられる。そのため、本発明は、人間の思考に沿った処理を行っていると考えられ、人間の分類方針に近い分類が可能になる。また、仮想上位フォルダは様々な文字列を包括するように生成されるため、類似した仮想上位フォルダが生成されることが少なく、洗練された処理結果となる。さらに、仮想上位フォルダに含まれるファイルの中で、頻出するキーワードや検索に頻繁に使用されるキーワードを基に仮想下位フォルダを生成するため、より細かい粒度でも効率良くファイルを検索することが可能である。 <Summary>
As described above, according to the present embodiment, morphemes are generated from character strings that constitute metadata (particularly, each attribute information) of a plurality of physical files (physical files) managed on the metadata file. Extract multiple keywords using analysis, etc., extract search query keywords from the log data of search queries used by the user, and search for keywords that often appear with keywords extracted from metadata Then, the metadata is extracted from the metadata, and a score indicating the degree of matching as a virtual folder is calculated from the keywords. Then, a specified number of virtual upper folders are generated using keywords whose score is equal to or greater than a predetermined value, and a virtual lower folder associated with the virtual upper folder is further generated using another keyword including the keyword of the virtual upper folder. Generate. Further, a virtual classification display (FIG. 17) for displaying the relationship between the generated virtual upper folder and the virtual lower folder and the contents of the virtual upper folder and the virtual lower folder is displayed as a GUI on the screen. By doing so, it becomes easy to automatically classify files automatically, and file management using virtual folders can be realized efficiently. Generally, when a human creates a virtual folder, the tendency to define a virtual folder with a character string that frequently appears, a character string that is frequently used for search, or a character string that often appears with a character string that frequently appears It is thought that there is. Therefore, it is considered that the present invention performs processing according to human thought, and classification close to human classification policy is possible. In addition, since the virtual upper folder is generated so as to include various character strings, similar virtual upper folders are rarely generated, resulting in a refined processing result. In addition, because the virtual subfolder is generated based on the keywords that appear frequently and the keywords that are frequently used for searching among the files contained in the virtual upper folder, it is possible to search for files efficiently even with finer granularity. is there.

また、仮想上位フォルダを生成する場合には、文字列長が所定値以上のキーワードを用いている。これにより、過度に多数の仮想上位フォルダを生成してしまうことを防止することができるようになる。この文字列長については、ユーザが指定することができるようになっており、自動仮想分類処理後に、自分が想定していた数とは異なる数の仮想フォルダが生成されてしまった場合に、変更可能なようになっている。 Further, when generating a virtual upper folder, a keyword having a character string length of a predetermined value or more is used. As a result, it is possible to prevent an excessive number of virtual upper folders from being generated. This character string length can be specified by the user, and can be changed when a number of virtual folders different from the number expected by the user are generated after the automatic virtual classification process. It is possible.

また、メタデータの属性情報毎に複数のキーワードを抽出し、当該抽出した各キーワードの仮想フォルダとしての適合の度合いを示すスコアの情報を算出する。そして、複数の属性情報毎に、仮想上位フォルダ及び仮想下位フォルダを作成し、ユーザの属性の選択入力に応答して、当該選択された属性に対応する仮想分類表示（図１７）を出力するようにしている。これにより、属性毎に仮想フォルダを生成して、ユーザによって統一感のある仮想分類表示を提示することができる。よって、この仮想分類表示は、ユーザによって非常に使い勝手が良いものとなる。 Also, a plurality of keywords are extracted for each piece of metadata attribute information, and score information indicating the degree of matching of each extracted keyword as a virtual folder is calculated. Then, a virtual upper folder and a virtual lower folder are created for each of a plurality of attribute information, and a virtual classification display (FIG. 17) corresponding to the selected attribute is output in response to a user attribute selection input. I have to. Thereby, a virtual folder can be generated for each attribute, and a virtual classification display with a sense of unity can be presented by the user. Therefore, this virtual classification display is very convenient for the user.

さらに、仮想下位フォルダを生成する際、仮想上位フォルダ名に相当するキーワードを含み、当該キーワードよりも文字列長が長い複数のキーワードの中で、出現頻度の比が所定範囲内にあるキーワードが複数存在した場合、文字列長が最長のキーワードをフォルダ名として用いて仮想下位フォルダを生成するようにする。これにより、フォルダの特徴をより具体的に示した仮想下位フォルダを生成することができ、ユーザはファイル分類の傾向を把握しやすくなる。 Furthermore, when generating a virtual lower folder, a plurality of keywords including a keyword corresponding to the virtual upper folder name and having a character string length longer than the keyword and having an appearance frequency ratio within a predetermined range are included. If it exists, a virtual subfolder is generated using the keyword with the longest character string length as the folder name. This makes it possible to generate a virtual lower folder that more specifically shows the characteristics of the folder, and the user can easily grasp the tendency of file classification.

なお、本発明は、実施形態そのままに限定されるものではなく、実施段階では、その要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 Note that the present invention is not limited to the embodiments as they are, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiments. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

また、実施形態で示された各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現しても良い。また、上記各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現しても良い。各機能等を実現するプログラム、テーブル、ファイル等の情報は、メモリやハードディスク、ＳＳＤ（Solid State Drive）等の記録或いは記憶装置、またはＩＣカード、ＳＤカード、ＤＶＤ等の記録或いは記憶媒体に格納することができる。 In addition, each configuration, function, processing unit, processing unit, and the like described in the embodiments may be realized in hardware by designing a part or all of them with, for example, an integrated circuit. Further, each of the above-described configurations, functions, etc. may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files for realizing each function is stored in a recording or storage device such as a memory, hard disk, or SSD (Solid State Drive), or in a recording or storage medium such as an IC card, SD card, or DVD. be able to.

さらに、上述の実施形態において、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。全ての構成が相互に接続されていても良い。 Furthermore, in the above-described embodiment, control lines and information lines are those that are considered necessary for explanation, and not all control lines and information lines on the product are necessarily shown. All the components may be connected to each other.

１００・・・中央処理装置（プロセッサ）
１１０・・・入出力装置
１１１・・・表示装置
１１２・・・キーボード
１１３・・・ポインティングデバイス（マウス）
１２０・・・プログラムメモリ
１２１・・・検索プログラム
１２２・・・キーワード登録プログラム
１２３・・・仮想フォルダ生成プログラム
１２４・・・仮想分類プログラム
１３０・・・記憶装置
１３１・・・メタデータファイル
１３２・・・検索ログデータ
１３３・・・仮想フォルダデータ
１４０・・・データメモリ
１４１・・・抽出キーワードデータ
１４２・・・検索キーワード管理データ
１４３・・・共起キーワードデータ
１４４・・・仮想上位フォルダデータ
１４５・・・仮想下位フォルダデータ 100: Central processing unit (processor)
110 ... Input / output device 111 ... Display device 112 ... Keyboard 113 ... Pointing device (mouse)
120 ... Program memory 121 ... Search program 122 ... Keyword registration program 123 ... Virtual folder generation program 124 ... Virtual classification program 130 ... Storage device 131 ... Metadata file 132 ... Search log data 133 ... Virtual folder data 140 ... Data memory 141 ... Extracted keyword data 142 ... Search keyword management data 143 ... Co-occurrence keyword data 144 ... Virtual upper folder data 145 ..Virtual lower folder data

Claims

A file management device for classifying and managing a plurality of physical files,
A processor for executing a program for generating a virtual folder for classifying the plurality of physical files;
A storage device for storing metadata management information for managing metadata of the plurality of physical files and search log information for managing search history;
The virtual folder is a substantive folder for managing link information of the plurality of physical files and physical folders regardless of the location of the plurality of physical files or the plurality of physical folders storing them. Yes,
The processor is
A plurality of metadata keywords are extracted from the constituent character strings constituting the metadata of the plurality of files of the metadata management information, and first appearance frequency information indicating the appearance frequency of the extracted metadata keywords is obtained. And
Extracting a plurality of search keywords from each search character string constituting the plurality of search histories of the search log information, obtaining second appearance frequency information indicating an appearance frequency of each of the extracted search keywords,
Calculating a first score that is a score of each keyword by weighted addition of the frequency of each keyword indicated by the first and second appearance frequency information;
Generating a specified number of virtual upper folders using keywords whose first score is equal to or greater than a predetermined value;
Displaying the created virtual upper folder on a display screen;
A file management apparatus.

In claim 1,
The processor is
An inclusion keyword which is another keyword including the keyword used for generating the virtual upper folder, a combination keyword which is another keyword included in the same metadata as the keyword used for generating the virtual upper folder, and the virtual upper Using at least one of the co-occurrence keywords that are searched simultaneously with the keyword used to generate the folder, to generate a virtual lower folder associated with the virtual upper folder;
The virtual classification display for displaying the relationship between the generated virtual upper folder and the virtual lower folder and the contents of the virtual upper folder and the virtual lower folder is performed on the display screen.
A file management apparatus.

In claim 2,
The processor is
Obtaining third appearance frequency information indicating an appearance frequency in which a combination of the keyword used for generating the virtual upper folder and the co-occurrence keyword is used in the search;
The first and second appearance frequency information about the inclusion keyword, and the second and third appearance frequency information about the combined keyword and the co-occurrence keyword, for each keyword used for virtual subfolder generation Calculating a second score that is a score of each keyword used for generating the virtual subfolder by weighting and adding the frequency;
Generating a specified number of virtual subfolders using a keyword having the second score equal to or greater than a predetermined value;
A file management apparatus.

In claim 3,
The file management apparatus, wherein the processor generates the virtual subfolder with a keyword having a character string having a predetermined length or more.

In claim 3,
When there is a lower concept keyword composed of a longer character string including the target keyword that is the target of the generation process of the virtual lower folder, the processor determines that the second score of the lower concept keyword is (the A file management device that generates the virtual subfolder with the subordinate concept keyword without using the original keyword when the target keyword is equal to or greater than the second score-a predetermined value.

In claim 1,
When the processor determines whether to generate a virtual upper folder for a lower concept keyword including a character string constituting an existing virtual upper folder, the first score of the character string constituting the existing virtual upper folder is determined. And the first score of the lower concept keyword, and a larger one is used to construct a virtual upper folder.

In claim 6,
As a result of the comparison between the first score of the character string constituting the existing virtual upper folder and the first score of the lower concept keyword, the processor determines that the first score of the lower concept keyword is the A file management device that deletes the existing virtual upper folder and configures a virtual upper folder with the lower concept keywords when the character string constituting the existing virtual upper folder is larger than the first score .

A file management method for classifying and managing a plurality of physical files into virtual folders,
The virtual folder is a substantive folder for managing link information of the plurality of physical files and physical folders regardless of the location of the plurality of physical files or the plurality of physical folders storing them. Yes,
A processor that executes processing for generating the virtual folder reads metadata management information for managing metadata of the plurality of physical files from a storage device, and metadata of the plurality of files of the metadata management information Extracting a plurality of metadata keywords from each constituent character string that constitutes the first character, and obtaining first appearance frequency information indicating an appearance frequency of each extracted metadata keyword;
The processor reads search log information for managing a search history from the storage device, extracts a plurality of search keywords from each search character string constituting the plurality of search histories of the search log information, and extracts the search Obtaining second appearance frequency information indicating the appearance frequency of each search keyword,
The processor calculates a first score that is a score of each keyword by weighted addition of the frequency of each keyword indicated by the first and second appearance frequency information;
The processor generates a predetermined number of virtual upper folders using a keyword having the first score equal to or greater than a predetermined value;
The processor displaying the created virtual upper folder on a display screen;
A file management method characterized by comprising:

The claim 8, further comprising:
An inclusion keyword that is another keyword including the keyword used by the processor to generate the virtual upper folder, a combined keyword that is another keyword included in the same metadata as the keyword used to generate the virtual upper folder, And generating a virtual lower folder associated with the virtual upper folder using at least one of the co-occurrence keywords that are searched simultaneously with the keyword used for generating the virtual upper folder;
The processor performs a virtual classification display on the display screen for displaying the relationship between the generated virtual upper folder and the virtual lower folder and the contents of the virtual upper folder and the virtual lower folder;
A file management method characterized by comprising:

A program for classifying and managing multiple physical files into virtual folders,
The virtual folder is a substantive folder for managing link information of the plurality of physical files and physical folders regardless of the location of the plurality of physical files or the plurality of physical folders storing them. Yes,
In a processor that executes processing for generating the virtual folder,
Metadata management information for managing the metadata of the plurality of physical files is read from the storage device, and a plurality of metadata keywords are formed from the constituent character strings constituting the metadata of the plurality of files of the metadata management information. And obtaining first appearance frequency information indicating the appearance frequency of each extracted metadata keyword;
The search log information for managing the search history is read from the storage device, a plurality of search keywords are extracted from the search character strings constituting the plurality of search histories of the search log information, and the extracted search keywords Processing for obtaining second appearance frequency information indicating the appearance frequency of
A process of calculating a first score that is a score of each keyword by weighted addition of the frequency of each keyword indicated by the first and second appearance frequency information;
A process of generating a predetermined number of virtual upper folders using a keyword having the first score equal to or greater than a predetermined value;
A process of displaying the created virtual upper folder on a display screen;
A program characterized by having executed.