JP2009237979A

JP2009237979A - Information processing device and method, and program

Info

Publication number: JP2009237979A
Application number: JP2008084252A
Authority: JP
Inventors: Yasuhiro Kirihata; 康裕桐畑
Original assignee: Hitachi Software Engineering Co Ltd
Current assignee: Hitachi Software Engineering Co Ltd
Priority date: 2008-03-27
Filing date: 2008-03-27
Publication date: 2009-10-15

Abstract

<P>PROBLEM TO BE SOLVED: To suppress an increase in storage cost caused by an increase in files, and to manage all the files on the existing file server unitarily about file storage in a plurality of file servers. <P>SOLUTION: A proxy server for managing files on a plurality of file servers is installed between the plurality of file servers and a user terminal connected through a network. A management table about all the files on the file servers is stored, and an entry linked with the same entity file is registered about a file of the same data contents, and handled as a logically different file on the management table to thereby achieve data size saving for overlapping data. In addition, file access logs are managed unitarily, and a user is urged to perform file arrangement in accordance with an access frequency. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、情報処理装置及び方法、並びにプログラムに関し、例えば、複数のファイルサーバに保存されたデータの増加量を低減し、ファイル保管のためのストレージコストを削減する構成に関する。 The present invention relates to an information processing apparatus and method, and a program, for example, a configuration for reducing an increase in data stored in a plurality of file servers and reducing storage costs for file storage.

情報爆発時代の到来とともに、データセンターや企業の情報システム部門等、組織内や企業内における電子データを保管、管理している組織・部門にとって、データ量の増加に伴うストレージコストの増大は、大きな問題となりつつある。 With the advent of the information explosion era, for organizations and departments that store and manage electronic data within organizations and companies, such as data centers and corporate information system departments, the increase in storage costs associated with the increase in data volume is significant. It is becoming a problem.

電子データの増大を軽減させるための方式として、以前より様々な方式が考えられている。例えば、限られたストレージリソースに対して、効率的にデータを保存する方法として、ＺＩＰやＬＺＡ等のデータ圧縮技術を用いて圧縮保存する方法がある。なお、データ圧縮については、ＺＩＰやＬＺＡの他に、例えばＣＡＢ、ｇｚｉｐ、ＳｔｕｆｆＩｔ等、多くのアルゴリズムがある。 Various methods have been considered as methods for reducing the increase in electronic data. For example, as a method for efficiently storing data in a limited storage resource, there is a method for compressing and storing data using a data compression technique such as ZIP or LZA. There are many algorithms for data compression, such as CAB, gzip, StuffIt, in addition to ZIP and LZA.

また、データセンター等ではデータのバックアップが必要となってくる。そのバックアップ方式としては、例えば特許文献１に記載されているように、対象データをすべてバックアップするフルバックアップに加え、そのフルバックアップからの差分データを保存する、差分バックアップ方式がある。 Also, data backups are required at data centers and the like. As the backup method, for example, as described in Patent Document 1, there is a differential backup method that stores differential data from the full backup in addition to a full backup that backs up all target data.

特開２００６−９２５５３号公報JP 2006-92553 A

しかしながら、ストレージを複数人で共有している関係上、同一ファイルが同じストレージ内に複数保存して、データ重複している無駄が発生するケースがある。このようなファイルの重複排除については、上述のファイル圧縮方式や差分バックアップでは対応できない。 However, because the storage is shared by a plurality of people, a plurality of identical files may be stored in the same storage, resulting in a waste of data duplication. Such file deduplication cannot be handled by the above-described file compression method or differential backup.

一方、重複ファイル排除機能を持ったストレージもある。しかし、わざわざ重複ファイル排除機能をファイルサーバに持たせると既存のファイルサーバが再利用できなくなるため、導入コストの面で考えた場合、必ずしも現実的な選択肢とは言えない。つまり、コストを考慮すると、既存のファイルサーバをそのまま再利用して重複ファイルの排除を可能とする方式が求められている。 On the other hand, some storages have a duplicate file elimination function. However, if the file server is purposely provided with a duplicate file elimination function, the existing file server cannot be reused, so it is not always a realistic option in terms of introduction cost. In other words, when cost is considered, there is a need for a method that allows reuse of an existing file server as it is and eliminates duplicate files.

本発明はこのような状況に鑑みてなされたものであり、既存のファイルサーバに変更を加えずにファイルサーバ内で重複ファイルの発生を排除することのできる構成を提供するものである。 The present invention has been made in view of such a situation, and provides a configuration capable of eliminating the occurrence of duplicate files in a file server without changing the existing file server.

上記課題を解決するために、本発明では、複数の既存のファイルサーバが並立する環境において、これらのファイルサーバ群を中継する位置にファイルサーバ管理用のプロキシサーバ（情報処理装置）を設置する。このプロキシサーバは、統括するファイルサーバに格納されたファイルの中で、重複するファイルがあった場合、利用者端末からは複数個のファイルに見せるが、保管されたファイルの実態は１つとすることで、重複ファイルの削減を図る。更に、定期的にファイルのアクセスログをチェックし、一定期間アクセスのないファイルについては、利用者にファイル整理を促すメールを出し、ファイル整理を促す、ないしは強制する仕組みを提供する。 In order to solve the above problems, in the present invention, in an environment where a plurality of existing file servers are arranged side by side, a proxy server (information processing apparatus) for file server management is installed at a position where these file server groups are relayed. This proxy server shows multiple files from the user terminal when there are duplicate files among the files stored in the central file server, but the actual number of stored files should be one. In order to reduce duplicate files. In addition, a file access log is periodically checked, and for files that have not been accessed for a certain period of time, a mail that prompts the user to organize the file is sent out, and a mechanism for encouraging or forcing the file to be organized is provided.

即ち、本発明による情報処理装置は、異なるファイル名を有する複数のリンクファイルであって、それぞれがファイルサーバに保存された一のファイル（ファイルＦ００１という）実体にアクセスするためのリンクファイルを管理するファイル管理手段と、複数のリンクファイルのそれぞれに利用者端末からアクセスがあったとき、ファイルサーバから一のファイル（ファイルＦ００１）実体を取得し、利用者端末に提供するファイルアクセス管理手段と、を備える。 That is, the information processing apparatus according to the present invention manages a plurality of link files having different file names, each of which is a link file for accessing one file (file F001) stored in the file server. A file management means, and a file access management means for obtaining one file (file F001) entity from the file server and providing the file terminal to the user terminal when each of the plurality of linked files is accessed from the user terminal. Prepare.

ファイル管理手段は、ファイルサーバに保存された全てのファイルの登録情報として、リンクファイルとしての論理的アドレスと、各ファイルのファイル実体の格納場所を示す実アドレスと、各ファイルのファイル実体のハッシュ値と、各ファイルのファイル実体の重複数を管理する。 The file management means includes, as registration information for all files stored in the file server, a logical address as a link file, a real address indicating the storage location of the file entity of each file, and a hash value of the file entity of each file And the duplication of the file entity of each file is managed.

ファイルアクセス管理手段は、利用者端末からのファイル操作要求に対応して、ファイルオープン機能、読み込み機能、書き込み機能、保存機能、及び削除機能を提供する。 The file access management means provides a file open function, a read function, a write function, a save function, and a delete function in response to a file operation request from the user terminal.

また、ファイルアクセス管理手段は、利用者端末からのファイル保存要求の際に、当該保存要求されたファイルのハッシュ値を取得し、当該ハッシュ値に基づいて同一ファイルの有無を確認し、ファイル管理手段は、保存要求されたファイルと同一ファイルがあれば保存要求されたファイルの登録情報のみを管理し、保存要求されたファイルと同一ファイルがなければ、保存要求されたファイルの登録情報とファイルデータを管理する。 Further, the file access management means obtains a hash value of the file requested to be saved at the time of file saving request from the user terminal, confirms the presence or absence of the same file based on the hash value, and the file management means If there is an identical file to the file requested to be saved, only the registration information of the file requested to be saved is managed. If there is no file identical to the file requested to be saved, the registration information and file data of the requested file are saved. to manage.

さらに、ファイル管理手段は、利用者端末からのファイルコピー要求の際に、コピー先ファイルに対応する新規ファイル情報の登録を行う。 Further, the file management means registers new file information corresponding to the copy destination file when a file copy request is made from the user terminal.

他の形態による本発明の情報処理装置は、利用者端末からのファイル保存要求に対して、複数のファイルサーバの何れかに保存対象のファイルと同一ファイル（ファイルＦ００１）が既に保存されているか否かを判定する手段と、保存対象の同一ファイルが存在する場合には、別の利用者端末からの同一ファイルに対する第１のファイルリンク（ａａａ．ｔｘｔ）とは異なる第２のファイルリンクであって、同一ファイルへのファイルパスである第２のファイルリンク（ｂｂｂ．ｔｘｔ）を管理する手段と、を備える。 In an information processing apparatus according to another aspect of the present invention, in response to a file save request from a user terminal, whether or not the same file (file F001) as the save target file has already been saved in any of a plurality of file servers. And a second file link different from the first file link (aaa.txt) for the same file from another user terminal when the same file to be stored exists. And means for managing a second file link (bbb.txt) which is a file path to the same file.

さらなる本発明の特徴は、以下本発明を実施するための最良の形態および添付図面によって明らかになるものである。 Further features of the present invention will become apparent from the best mode for carrying out the present invention and the accompanying drawings.

本発明によれば、既存のファイルサーバに変更を加えずにファイルサーバ内で重複ファイルの発生を排除することができる。よって、既存のファイルサーバを流用でき、システム構築におけるコストを低く抑えることができる。 According to the present invention, it is possible to eliminate the occurrence of duplicate files in the file server without changing the existing file server. Therefore, the existing file server can be used and the cost for system construction can be kept low.

以下、添付図面を参照して本発明の実施形態について説明する。ただし、本実施形態は本発明を実現するための一例に過ぎず、本発明の技術的範囲を限定するものではないことに注意すべきである。また、各図において共通の構成については同一の参照番号が付されている。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. However, it should be noted that this embodiment is merely an example for realizing the present invention, and does not limit the technical scope of the present invention. In each drawing, the same reference numerals are assigned to common components.

＜ファイル管理システムの構成＞
図１は、本発明の実施形態によるファイル管理（情報処理）システムの概略構成を示す図である。システムは利用者端末１０１と、複数台のファイルサーバ１０２と、ファイル管理プロキシ１０３とを備え、これらがネットワーク１０４で接続された構成となっている。 <Configuration of file management system>
FIG. 1 is a diagram showing a schematic configuration of a file management (information processing) system according to an embodiment of the present invention. The system includes a user terminal 101, a plurality of file servers 102, and a file management proxy 103, which are connected via a network 104.

利用者端末１０１には、一般のアプリケーション１０５と、ファイルブラウザ１０６がインストールされている。ファイルブラウザ１０６は、利用者端末１０１のローカルディスクにアクセスする機能を有するとともに、ファイル管理プロキシ１０３にアクセスする機能も持つ。例えば、Ｗｉｎｄｏｗｓの場合、標準で組み込まれたエクスプローラにファイル管理プロキシ１０３へのアクセス機能を提供するシェルプラグインを組み合わせた、専用ファイルブラウザとして実装することが可能である。 A general application 105 and a file browser 106 are installed on the user terminal 101. The file browser 106 has a function of accessing the local disk of the user terminal 101 and also has a function of accessing the file management proxy 103. For example, in the case of Windows, it can be implemented as a dedicated file browser in which a shell plug-in that provides an access function to the file management proxy 103 is combined with an explorer built in as a standard.

接続された複数台のファイルサーバ１０２には、利用者端末からアクセスして利用するファイル群が格納されている。ただし、利用者端末１０１から各ファイルサーバ１０２には直接アクセスできないネットワーク構成となっている。唯一アクセスできるのは、ファイル管理プロキシ１０３である。このようなネットワーク構成は、例えばルータによる適切なルーティング設定で実現可能である。 A plurality of connected file servers 102 store a file group that is accessed and used from a user terminal. However, the network configuration is such that the user terminal 101 cannot directly access each file server 102. Only the file management proxy 103 can be accessed. Such a network configuration can be realized by an appropriate routing setting by a router, for example.

ファイル管理プロキシ１０３には、これらのファイルサーバ１０２を統括管理するサービスであるファイル管理サービス１０７と、これらのファイルサーバに格納されたファイル群を管理するためのデータベースである管理ＤＢ１０８が稼動している。 The file management proxy 103 operates a file management service 107 which is a service for managing these file servers 102 and a management DB 108 which is a database for managing a group of files stored in these file servers. .

＜ファイルサーバでのファイル管理＞
図２は、ファイル管理プロキシ１０３におけるファイルサーバ管理方式の概念図を示している。ファイル管理プロキシ１０３は、ａａａ．ｔｘｔ、ｂｂｂ．ｔｘｔ、ｃｃｃ．ｔｘｔ他、複数のファイルをディレクトリ構造で管理している。 <File management on file server>
FIG. 2 shows a conceptual diagram of a file server management method in the file management proxy 103. The file management proxy 103 is aaa. txt, bbb. txt, ccc. In addition to txt, a plurality of files are managed in a directory structure.

図２では、ａａａ．ｔｘｔは￥ｄｏｃフォルダ直下、ｂｂｂ．ｔｘｔとｃｃｃ．ｔｘｔは￥ｄｏｃ￥ｔｍｐフォルダ直下に格納されているディレクトリ構成が示されている。また、ａａａ．ｔｘｔとｂｂｂ．ｔｘｔは全く同じバイナリの重複ファイルであり、その実体ファイルＦ００１はファイルサーバＡ２０２に格納されている。つまり、ａａａ．ｔｘｔ及びｂｂｂ．ｔｘｔは、ファイル名は異なるが、共にＦ００１という実体ファイルへのポインタとして作用している。ｃｃｃ．ｔｘｔの実体ファイルＦ００２は、ファイルサーバＢ２０３に格納されている。 In FIG. txt is directly under the \ doc folder, bbb. txt and ccc. txt indicates the directory structure stored immediately below the \ doc \ tmp folder. Also, aaa. txt and bbb. txt is the same binary duplicate file, and the entity file F001 is stored in the file server A202. That is, aaa. txt and bbb. txt has a different file name, but both act as a pointer to an entity file F001. ccc. The txt entity file F002 is stored in the file server B203.

ファイル管理プロキシ１０３が管理している論理的なディレクトリ構成では、ａａａ．ｔｘｔとｂｂｂ．ｔｘｔは別ファイルとして扱われるが、実質的にはファイルサーバＡ２０２に格納されたファイルＦ００１にリンクしている。このように重複ファイルを実質的には１つの実体ファイルに集約することで、複数のファイルサーバ全体で格納されたファイル群の格納サイズの削減を図ることができる。 In the logical directory structure managed by the file management proxy 103, aaa. txt and bbb. Although txt is handled as a separate file, it is substantially linked to the file F001 stored in the file server A202. As described above, the duplicate files are substantially collected into one entity file, so that the storage size of the file group stored in the plurality of file servers can be reduced.

＜ファイル管理テーブル＞
図３は、ファイル管理プロキシ１０３の管理ＤＢ１０８に格納されたファイル管理テーブルの構成を示す図である。ファイル管理テーブルのタプル（一行）は、ファイル管理サービス１０７が管理する論理的ファイル群の１ファイルに対応しており、６つの属性で構成されている。 <File management table>
FIG. 3 is a diagram showing the configuration of the file management table stored in the management DB 108 of the file management proxy 103. A tuple (one line) in the file management table corresponds to one file of a logical file group managed by the file management service 107, and is composed of six attributes.

仮想ファイルパス３０１は、ファイル管理サービス内で管理する論理的なファイルパスであり、ファイルサーバ群のすべてのファイルに対してユニークに設定される。 The virtual file path 301 is a logical file path managed in the file management service, and is uniquely set for all files in the file server group.

実ファイルパス３０２は、実際にファイルサーバ内に格納されているファイルの実ファイルパスである。ファイル管理サービスは、この実ファイルパスを利用して、ファイルサーバ上のファイルにアクセスすることができる。 An actual file path 302 is an actual file path of a file actually stored in the file server. The file management service can access the file on the file server by using the actual file path.

重複カウント３０３は、当該ファイル以外に重複するファイルの総数を表す属性である。例えば、重複カウントが０の場合は重複ファイルが無いことを意味し、重複カウントがｎ（＞０）の場合は、当該ファイルを含め、重複するファイルの総数はｎ＋１であることを意味する。 The duplication count 303 is an attribute representing the total number of duplicate files other than the file. For example, when the duplicate count is 0, it means that there is no duplicate file, and when the duplicate count is n (> 0), it means that the total number of duplicate files including the file is n + 1.

ハッシュ値３０４は、ファイルのハッシュ値を示している。重複するファイルの場合は、ハッシュ値は同じ値になる。作成日時３０５はファイルの作成日時を示す属性であり、最終アクセス日時３０６は、最後にアクセスのあった日時を示す属性である。 A hash value 304 indicates a hash value of the file. In the case of duplicate files, the hash value is the same value. The creation date / time 305 is an attribute indicating the creation date / time of the file, and the last access date / time 306 is an attribute indicating the date / time when the file was last accessed.

＜オープンファイル管理テーブル＞
オープンファイル管理テーブルは、利用者端末１０１からファイルアクセスがあった場合、ファイルアクセスに必要なファイルＩＤ、及びファイルのオープン状況・条件を管理するためのテーブルである。図４は、ファイル管理サービスが保持するオープンファイル管理テーブルの構成を示す図である。 <Open file management table>
The open file management table is a table for managing the file ID necessary for file access and the open status / condition of the file when there is a file access from the user terminal 101. FIG. 4 is a diagram showing a configuration of an open file management table held by the file management service.

一般に、利用者端末とファイルサーバのＯＳがＷｉｎｄｏｗｓの場合、ファイルサーバへのアクセスにＣＩＦＳプロトコルが利用される。この場合、ファイルにアクセスするために、ファイルオープンすると、オープンされたファイルを特定するための識別子であるファイルＩＤが取得される。そのファイルＩＤを基に、ファイルの先頭からのオフセット値と読み取り、書き込みサイズを指定することで、利用者端末上のアプリケーションは、ファイルサーバ上のファイルの読み書きができる。本実施形態では、このＣＩＦＳプロトコルによるファイルサーバアクセスを前提し、以下にオープンファイル管理テーブルを定義するが、この仮定が本発明を限定するものではない。 Generally, when the OS of the user terminal and the file server is Windows, the CIFS protocol is used for accessing the file server. In this case, when the file is opened to access the file, a file ID that is an identifier for specifying the opened file is acquired. Based on the file ID, the application on the user terminal can read and write the file on the file server by specifying the offset value from the beginning of the file, reading, and specifying the writing size. In the present embodiment, the file server access based on the CIFS protocol is assumed, and an open file management table is defined below. However, this assumption does not limit the present invention.

図４に示されるように、オープンファイル管理テーブルでは、例えば、各タプルは６つの属性で構成されている。仮想ファイルパス４０１は、利用者端末１０１から見える論理的なファイルパスである。仮想ファイルＩＤ４０２は、利用者端末１０１上のアプリケーション１０５に返すファイルＩＤである。実ファイルＩＤ４０３は、ファイルサーバ１０２上のファイルを開いたときに、ファイルサーバ１０２より送られてくるファイルＩＤである。アクセスモード４０４は、ファイルを開いたときのアクセスモードを示す属性値であり、ＲＥＡＤ＿ＷＲＩＴＥ、ＲＥＡＤ＿ＯＮＬＹ、ＡＰＰＥＮＤ等の値をとる。共有モード４０５は、ファイルをオープン中に他のプロセスによるオープンを許可するか否かを定める属性である。ＳＨＡＲＥ＿ＲＥＡＤ、ＳＨＡＲＥ＿ＷＲＩＴＥ、ＮＯＮ＿ＳＨＡＲＥの属性値をとり、順に「共有読み込み許可」、「共有書き込み許可」、「共有禁止」を意味する。更新有無４０６は、現在ファイルが更新されたかどうかを意味するフラグであり、ＴＲＵＥだと「更新有り」、ＦＡＬＳＥだと「更新無し」を示す。 As shown in FIG. 4, in the open file management table, for example, each tuple is composed of six attributes. The virtual file path 401 is a logical file path that can be seen from the user terminal 101. The virtual file ID 402 is a file ID returned to the application 105 on the user terminal 101. The real file ID 403 is a file ID sent from the file server 102 when the file on the file server 102 is opened. The access mode 404 is an attribute value indicating the access mode when the file is opened, and takes values such as READ_WRITE, READ_ONLY, APPEND, and the like. The sharing mode 405 is an attribute that determines whether or not opening by another process is permitted while the file is being opened. The attribute values of SHARE_READ, SHARE_WRITE, and NON_SHARE are taken to mean “shared read permission”, “shared write permission”, and “sharing prohibition” in order. The update presence / absence 406 is a flag indicating whether or not the current file has been updated, and indicates “updated” if TRUE, and “not updated” if FALSE.

＜ファイル管理システムの動作＞
図５は、ファイルオープン時の処理を説明するためのフローチャートである。まず、利用者がファイルブラウザ１０６を使って、あるファイルを開こうとすると、ファイルブラウザ１０６は、当該ファイルオープン要求をファイル管理プロキシ１０３で稼動しているファイル管理サービス１０７に送信する（ステップＳ５０１）。ファイル管理サービス１０７は、この要求を受信後、要求内に含まれるファイルへの仮想ファイルパス４０１をキーとしてオープンファイル管理テーブル（図４参照）を検索し、対象ファイルのタプル値を取得する（ステップＳ５０２）。もし、タプル値が存在しない場合は、ファイルはまだオープンされておらず、ファイルオープンすることは可能である。また、すでにオープンされていた場合でも、共有モードの属性値がＳＨＡＲＥ＿ＲＥＡＤやＳＨＡＲＥ＿ＷＲＩＴＥなど、共有オープンできる値となっていた場合も、ファイルオープンすることが可能である。ファイル管理サービス１０７は、このように対象ファイルがオープン可能かどうかをチェックする（ステップＳ５０３）。 <Operation of file management system>
FIG. 5 is a flowchart for explaining processing when a file is opened. First, when a user attempts to open a file using the file browser 106, the file browser 106 transmits the file open request to the file management service 107 operating on the file management proxy 103 (step S501). . After receiving this request, the file management service 107 searches the open file management table (see FIG. 4) using the virtual file path 401 to the file included in the request as a key, and acquires the tuple value of the target file (step). S502). If the tuple value does not exist, the file has not been opened and the file can be opened. Even when the file has already been opened, the file can be opened even if the shared mode attribute value is a value that can be shared open, such as SHARE_READ and SHARE_WRITE. The file management service 107 checks whether the target file can be opened in this way (step S503).

オープンできない場合は、ファイル管理サービス１０７は、ファイルオープンエラーをファイルブラウザ１０６に返す（ステップＳ５０９）。オープンできる場合、ファイル管理サービス１０７は、仮想ファイルパス３０１をキーとして管理ＤＢ１０８にあるファイル管理テーブル（図３参照）を検索し、実ファイルパス３０２の取得を試みる（ステップＳ５０４）。 If the file cannot be opened, the file management service 107 returns a file open error to the file browser 106 (step S509). If the file can be opened, the file management service 107 searches the file management table (see FIG. 3) in the management DB 108 using the virtual file path 301 as a key, and tries to acquire the actual file path 302 (step S504).

そして、ファイル管理サービス１０７は、この検索を実行し、対応するタプルが存在するか否かに基づいて実ファイルが存在するかどうかをチェックする（ステップＳ５０５）。実ファイルが存在しない場合、対応するタプルが存在せず、実ファイルパスは取得できない。その場合、ファイルオープンエラーをファイルブラウザに返して処理を終了する（ステップＳ５０６）。 Then, the file management service 107 executes this search and checks whether an actual file exists based on whether the corresponding tuple exists (step S505). If the real file does not exist, the corresponding tuple does not exist and the real file path cannot be obtained. In that case, a file open error is returned to the file browser, and the process is terminated (step S506).

ステップＳ５０５で実ファイルが存在すると判断された場合、ファイル管理サービス１０７は、取得した実ファイルパス３０２を基に、ファイルサーバ１０２上のファイルをオープンする（ステップＳ５０７）。そして、ファイル管理サービス１０７は、オープンしたファイルの実ファイルＩＤ４０３と、ファイル管理プロキシ１０３で保持している仮想ディレクトリに対応した管理用のファイルＩＤである仮想ファイルＩＤ４０２、アクセスモード４０４や共有モード４０５、更新有無といった属性４０６をオープンファイル管理テーブル（図４参照）に登録し、仮想ファイルＩＤ４０２と共に、ファイルオープンの成功をファイルブラウザ１０６に返す（ステップＳ５０８）。なお、仮想ファイルＩＤ４０２については、ファイル管理サービス１０７が、オープンファイル管理テーブル内でユニークになるように任意に付与する。なお、オープンファイル管理テーブルに仮想ファイルＩＤの領域を設けず、登録されたタプル番号（行番号）をそのまま仮想ファイルＩＤとして管理するようにしても良い。 If it is determined in step S505 that the actual file exists, the file management service 107 opens the file on the file server 102 based on the acquired actual file path 302 (step S507). The file management service 107 then opens the real file ID 403 of the opened file, the virtual file ID 402 that is the management file ID corresponding to the virtual directory held by the file management proxy 103, the access mode 404 and the sharing mode 405, An attribute 406 such as whether to update is registered in the open file management table (see FIG. 4), and the file open success is returned to the file browser 106 together with the virtual file ID 402 (step S508). The virtual file ID 402 is arbitrarily given by the file management service 107 so as to be unique in the open file management table. The open file management table may not be provided with the virtual file ID area, and the registered tuple number (line number) may be managed as it is as the virtual file ID.

図６は、ファイルオープン後の読み込み時の処理を説明するためのフローチャートである。ファイルオープン後はアイル管理サービス１０７からアプリケーション１０５に仮想ファイルＩＤ４０２が渡される（ステップＳ６０１）。アプリケーション１０５は、仮想ファイルＩＤ４０２を引数としてファイルの読み込み要求を生成し、ファイル管理サービス１０７に送信する（ステップＳ６０２）。 FIG. 6 is a flowchart for explaining processing at the time of reading after opening a file. After the file is opened, the virtual file ID 402 is passed from the aisle management service 107 to the application 105 (step S601). The application 105 generates a file read request using the virtual file ID 402 as an argument, and transmits the file read request to the file management service 107 (step S602).

読み込み要求を受け取ったファイル管理サービス１０７は、仮想ファイルＩＤ４０２に対応する実ファイルＩＤ４０３を利用し、ファイルサーバ１０２からファイルデータを読み込んでキャッシュする（ステップＳ６０３）。そして、ファイル管理サービス１０７は、随時、読み込み処理が発生したタイミングで、キャッシュデータをアプリケーション１０５に返す（ステップ６０４）。なお、キャッシュのアルゴリズムについては、利用者端末からの要求通りのまま、必要なデータだけをキャッシュする方式や、すべてのファイルをキャッシュする方式、要求領域の適当な近傍のファイル領域をキャッシュする方式など、様々な方式が考えられるが、本実施形態では特に規定しない。一般にプロキシサーバが行う既存のアルゴリズムを利用するものとする。以上のように、ファイル管理プロキシ１０３が仲介する形で、ファイル実体がアプリケーション１０５に引き渡される。 The file management service 107 that has received the read request uses the real file ID 403 corresponding to the virtual file ID 402 to read and cache the file data from the file server 102 (step S603). Then, the file management service 107 returns the cache data to the application 105 at any time when the reading process occurs (step 604). As for the caching algorithm, only the necessary data is cached as requested by the user terminal, all the files are cached, the file area in the vicinity of the requested area is cached, etc. Various schemes are conceivable, but are not particularly defined in the present embodiment. In general, an existing algorithm performed by a proxy server is used. As described above, the file entity is delivered to the application 105 in a form mediated by the file management proxy 103.

図７は、ファイルクローズ時の処理を説明するためのフローチャートである。アプリケーション１０５が、オープンしているファイルをクローズすると、クローズ要求がファイル管理サービス１０７に送信される（ステップＳ７０１）。クローズ要求を受信したファイル管理サービス１０７は、対応するファイルのキャッシュをフラッシュ（ファイルが更新されていたときにメモリ中の更新されたファイルをファイルサーバに格納）し、クローズ処理を行う（ステップＳ７０２）。最後に、ファイル管理サービス１０７はオープンファイル管理テーブル（図４参照）から、対応するエントリを削除する（ステップＳ７０３）。 FIG. 7 is a flowchart for explaining processing at the time of closing a file. When the application 105 closes the open file, a close request is transmitted to the file management service 107 (step S701). The file management service 107 that has received the close request flushes the cache of the corresponding file (stores the updated file in the memory in the file server when the file has been updated), and performs the close process (step S702). . Finally, the file management service 107 deletes the corresponding entry from the open file management table (see FIG. 4) (step S703).

図８は、ファイル保存時の処理を説明するためのフローチャートである。ファイルブラウザ１０６がファイル保存要求をファイル管理プロキシ１０３に送信すると（ステップＳ８０１）、ファイル管理サービス１０７は保存するファイルのハッシュを計算する（ステップＳ８０２）。 FIG. 8 is a flowchart for explaining processing at the time of saving a file. When the file browser 106 transmits a file save request to the file management proxy 103 (step S801), the file management service 107 calculates a hash of the file to be saved (step S802).

次に、ファイル管理サービス１０７は、計算したハッシュ値をキーとして、ファイル管理テーブル（図３参照）を検索し（ステップＳ８０３）、検索により同一ファイルが存在するかどうかをチェックする（ステップＳ８０４）。 Next, the file management service 107 searches the file management table (see FIG. 3) using the calculated hash value as a key (step S803), and checks whether the same file exists by the search (step S804).

同一ハッシュ値のファイルが存在する場合、ファイル管理サービス１０７は、ファイルサーバ１０２に保存された既存のファイルのパスを実ファイルパス属性として持つ新規エントリを生成して、ファイル管理テーブルに登録し、重複するすべてのファイルの重複カウント属性を１つ増やす（ステップＳ８０５）。存在しない場合、ファイル管理サービス１０７は、ファイル管理テーブルに新規登録するとともに、空き領域のあるファイルサーバ１０２にファイルを保存する（ステップＳ８０６）。 If files with the same hash value exist, the file management service 107 creates a new entry having the path of the existing file stored in the file server 102 as an actual file path attribute, registers it in the file management table, and duplicates The duplication count attribute of all the files to be added is incremented by one (step S805). If the file does not exist, the file management service 107 newly registers it in the file management table and saves the file in the file server 102 having a free area (step S806).

図９は、ファイルコピー時の処理を説明するためのフローチャートである。ファイルブラウザ１０６がファイル管理サービス１０７にファイルコピー要求を送信する（ステップＳ９０１）と、コピー要求を受信したファイル管理サービス１０７は、ファイルサーバ１０２に保存されたコピー対象のファイルの実ファイルパスを属性に持つ、新規エントリを作成して、ファイル管理テーブルに登録する。そして、ファイル管理サービス１０７は、重複するすべてのファイルの重複カウント属性を１つ増やす（ステップＳ９０２）。実際のファイルに関しては、コピー処理は実行されない。 FIG. 9 is a flowchart for explaining processing at the time of file copying. When the file browser 106 transmits a file copy request to the file management service 107 (step S901), the file management service 107 that has received the copy request uses the actual file path of the file to be copied stored in the file server 102 as an attribute. Create a new entry and register it in the file management table. Then, the file management service 107 increases the duplication count attribute of all duplicate files by one (step S902). Copy processing is not executed for actual files.

図１０は、ファイル削除時の処理を説明するためのフローチャートである。ファイルブラウザ１０６がファイル管理サービス１０７にファイル削除要求を送信する（ステップＳ１００１）。削除要求を受信したファイル管理サービス１０７は、ファイル管理テーブルを検索し（ステップＳ１００２）、削除対象ファイルが重複しているかどうかをチェックする（ステップＳ１００３）。 FIG. 10 is a flowchart for explaining processing at the time of file deletion. The file browser 106 transmits a file deletion request to the file management service 107 (step S1001). The file management service 107 that has received the deletion request searches the file management table (step S1002) and checks whether or not the files to be deleted overlap (step S1003).

重複ファイルの場合、ファイル管理サービス１０７は、削除すべきファイルに対応するファイル管理テーブルのエントリを削除し、他の重複ファイルの重複カウント属性を１つ減らす（ステップＳ１００４）。重複ファイルでない場合、ファイル管理サービス１０７は、削除すべきファイルに対応するファイル管理テーブルのエントリとファイルサーバ１０２上のファイルを削除する（ステップＳ１００５）。 In the case of a duplicate file, the file management service 107 deletes the entry in the file management table corresponding to the file to be deleted and decrements the duplicate count attribute of the other duplicate file by one (step S1004). If it is not a duplicate file, the file management service 107 deletes the entry in the file management table corresponding to the file to be deleted and the file on the file server 102 (step S1005).

図１１は、ファイルアクセス監査処理を説明するためのフローチャートである。ファイル管理サービス１０７は、定期的にファイル管理テーブルの最終アクセス日時属性をチェックしている（ステップＳ１１０１）。ここで、ファイル管理サービス１０７は、一定期間アクセスのないファイルが存在するかどうかをチェックする（ステップＳ１１０２）。存在する場合、ファイル管理サービス１０７は、そのファイル群をアクセス頻度が低いファイルとしてリストアップし、利用者に整理を促すメールを送信する（ステップＳ１１０３）。存在しなければ、そのまま処理は終了する。このように定期的に利用頻度のファイル整理を利用者に促すことで、定期的なデータの棚卸しを行うことができ、ファイルサーバの効率的な利用を実現することができる。 FIG. 11 is a flowchart for explaining the file access audit process. The file management service 107 periodically checks the last access date attribute of the file management table (step S1101). Here, the file management service 107 checks whether there is a file that has not been accessed for a certain period of time (step S1102). If it exists, the file management service 107 lists the file group as a file with a low access frequency, and transmits an email for prompting the user to organize (step S1103). If it does not exist, the process ends. In this way, by regularly prompting the user to organize files with a frequency of use, it is possible to perform a regular inventory of data and to realize efficient use of the file server.

＜まとめ＞
以上に説明したように、本発明によれば、既存の複数のファイルサーバを一元的にプロキシで管理し、ファイルサーバ間のファイルの重複を排除し、かつ利用者からは透過的なファイルアクセスの仕組みを提供する。よって、ファイルサーバにおけるストレージ領域の節約ができるとともに、利用者からは複数のファイルサーバが１台のファイルサーバに見え、複数のファイルサーバにおけるファイルの位置透過性を実現することができる。なお、従来から利用されているデータ圧縮を併用すれば、更なるファイルサーバのストレージ領域の効率的な利用が可能となる。 <Summary>
As described above, according to the present invention, a plurality of existing file servers are centrally managed by a proxy, file duplication between file servers is eliminated, and file access that is transparent to the user is performed. Provide a mechanism. Therefore, the storage area in the file server can be saved, and a plurality of file servers can be seen as one file server from the user, and the location transparency of the files in the plurality of file servers can be realized. In addition, if the data compression conventionally used is used together, the storage area of the file server can be further efficiently used.

また、本発明によれば、ファイルサーバの構成を変更する必要はないので、既存のファイルサーバをそのまま利用できる。よって、導入コストを抑えられる点が大きなメリットとなる。 Further, according to the present invention, it is not necessary to change the configuration of the file server, so that the existing file server can be used as it is. Therefore, the point which can suppress introduction cost becomes a big merit.

さらに、セキュリティの面からも、プロキシでアクセスログを一括管理することができ、アクセス権制御機能を追加することも可能である。これにより、従来では個別のファイルサーバごとにアクセス権設定をしなければいけなかったが、プロキシで一括管理できることで、管理コストの削減も期待できる。 Furthermore, from the security aspect, access logs can be managed collectively by proxy, and an access right control function can be added. As a result, the access right must be set for each individual file server in the past, but the management cost can be expected to be reduced by the collective management by the proxy.

また、プロキシで集中管理するため、管理コストの低減が見込めるだけでなく、アクセスログ収集やファイルアクセス制御の一極集中化を行うことができ、セキュリティの向上にも繋がる。更に、既存のファイルサーバをそのまま利用できるため、低コストで導入することが可能である。 In addition, since centralized management is performed using a proxy, not only can management costs be reduced, but access log collection and file access control can be centralized, leading to improved security. Furthermore, since an existing file server can be used as it is, it can be introduced at low cost.

なお、本発明は、実施形態の機能を実現するソフトウェアのプログラムコードによっても実現できる。この場合、プログラムコードを記録した記憶媒体をシステム或は装置に提供し、そのシステム或は装置のコンピュータ（又はＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出す。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコード自体、及びそれを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、フロッピィ（登録商標）ディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどが用いられる。 The present invention can also be realized by a program code of software that implements the functions of the embodiments. In this case, a storage medium in which the program code is recorded is provided to the system or apparatus, and the computer (or CPU or MPU) of the system or apparatus reads the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the program code itself and the storage medium storing the program code constitute the present invention. As a storage medium for supplying such a program code, for example, a floppy (registered trademark) disk, CD-ROM, DVD-ROM, hard disk, optical disk, magneto-optical disk, CD-R, magnetic tape, non-volatile A memory card, ROM, or the like is used.

また、プログラムコードの指示に基づき、コンピュータ上で稼動しているＯＳ（オペレーティングシステム）などが実際の処理の一部又は全部を行い、その処理によって前述した実施の形態の機能が実現されるようにしてもよい。さらに、記憶媒体から読み出されたプログラムコードが、コンピュータ上のメモリに書きこまれた後、そのプログラムコードの指示に基づき、コンピュータのＣＰＵなどが実際の処理の一部又は全部を行い、その処理によって前述した実施の形態の機能が実現されるようにしてもよい。 Also, based on the instruction of the program code, an OS (operating system) running on the computer performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing. May be. Further, after the program code read from the storage medium is written in the memory on the computer, the computer CPU or the like performs part or all of the actual processing based on the instruction of the program code. Thus, the functions of the above-described embodiments may be realized.

また、実施の形態の機能を実現するソフトウェアのプログラムコードをネットワークを介して配信することにより、それをシステム又は装置のハードディスクやメモリ等の記憶手段又はＣＤ-ＲＷ、ＣＤ-Ｒ等の記憶媒体に格納し、使用時にそのシステム又は装置のコンピュータ(又はＣＰＵやＭＰＵ)が当該記憶手段や当該記憶媒体に格納されたプログラムコードを読み出して実行するようにしても良い。 In addition, by distributing the program code of the software that realizes the functions of the embodiment via a network, it is stored in a storage means such as a hard disk or memory of a system or apparatus or a storage medium such as a CD-RW or CD-R. The program code stored in the storage means or the storage medium may be read out and executed by the computer (or CPU or MPU) of the system or apparatus during storage.

本発明の実施形態におけるファイル管理システムの概略構成を示す図である。It is a figure which shows schematic structure of the file management system in embodiment of this invention. ファイル管理プロキシにおけるファイルサーバ管理方式の概念図である。It is a conceptual diagram of the file server management system in a file management proxy. 管理ＤＢに格納されたファイル管理テーブルの構成図である。It is a block diagram of the file management table stored in management DB. ファイル管理サービスが保持するオープンファイル管理テーブルの構成図である。It is a block diagram of the open file management table which a file management service hold | maintains. ファイルオープン時の処理を説明するためのフローチャートである。It is a flowchart for demonstrating the process at the time of file opening. ファイルオープン後の読み込み時の処理を説明するためのフローチャートである。It is a flowchart for demonstrating the process at the time of reading after file opening. ファイルクローズ時の処理を説明するためのフローチャートである。It is a flowchart for demonstrating the process at the time of a file close. ファイル保存時の処理を説明するためのフローチャートである。It is a flowchart for demonstrating the process at the time of a file preservation | save. ファイルコピー時の処理を説明するためのフローチャートである。It is a flowchart for demonstrating the process at the time of file copy. ファイル削除時の処理を説明するためのフローチャートである。It is a flowchart for demonstrating the process at the time of file deletion. ファイルアクセス監査処理を説明するためのフローチャートである。It is a flowchart for demonstrating a file access audit process.

Explanation of symbols

１０１…利用者端末
１０２…ファイルサーバ
１０３…ファイル管理プロキシ
１０４…ネットワーク
１０５…アプリケーション
１０６…ファイルブラウザ
１０７…ファイル管理サービス
１０８…管理ＤＢ
３０１…仮想ファイルパス
３０２…実ファイルパス
３０３…重複カウント
３０４…ハッシュ値
３０５…作成日時
３０６…最終アクセス日時
４０１…仮想ファイルパス
４０２…仮想ファイルＩＤ
４０３…実ファイルＩＤ
４０４…アクセスモード
４０５…共有モード
４０６…更新有無 DESCRIPTION OF SYMBOLS 101 ... User terminal 102 ... File server 103 ... File management proxy 104 ... Network 105 ... Application 106 ... File browser 107 ... File management service 108 ... Management DB
301 ... Virtual file path 302 ... Real file path 303 ... Duplicate count 304 ... Hash value 305 ... Creation date / time 306 ... Last access date / time 401 ... Virtual file path 402 ... Virtual file ID
403 ... Real file ID
404 ... access mode 405 ... sharing mode 406 ... presence or absence of update

Claims

An information processing apparatus that manages a plurality of network-connected file servers,
A plurality of link files having different file names, each managing a link file for accessing one file entity stored in the file server;
A file access management means for obtaining the one file entity from the file server and providing it to the user terminal when each of the plurality of link files is accessed from a user terminal;
An information processing apparatus comprising:

The file management means includes, as registration information of all files stored in the file server, a logical address as the link file, a real address indicating a storage location of the file entity of each file, and a file entity of each file. The information processing apparatus according to claim 1, wherein the hash value of each file and a plurality of overlapping file entities of each file are managed.

2. The file access management means provides a file open function, a read function, a write function, a save function, and a delete function in response to a file operation request from the user terminal. Information processing device.

The file access management means obtains a hash value of the file requested to be saved at the time of file saving request from the user terminal, confirms the presence or absence of the same file based on the hash value,
The file management means manages only registration information of the file requested to be saved if there is the same file as the file requested to save, and if there is no file identical to the file requested to save, the file management means 2. The information processing apparatus according to claim 1, wherein file registration information and file data are managed.

The information processing apparatus according to claim 1, wherein the file management unit registers new file information corresponding to a copy destination file when a file copy request is made from the user terminal.

The file management means periodically monitors the usage frequency of the file from the last access date and time, and the usage frequency of the file whose usage frequency is lower than the predetermined usage count is low in the user terminal of the user who created the file. The information processing apparatus according to claim 1, wherein notification is made.

An information processing apparatus that manages a plurality of network-connected file servers,
Means for determining whether or not the same file as the file to be stored is already stored in any of the plurality of file servers in response to a file storage request from the user terminal;
When the same file to be saved exists, the second file link is different from the first file link for the same file from another user terminal, and is a file path to the same file. Means for managing a second file link;
An information processing apparatus comprising:

An information processing method for managing a plurality of network-connected file servers,
The file management means is a plurality of link files having different file names, each managing a link file for accessing one file entity stored in the file server,
The file access management means acquires the one file entity from the file server and provides it to the user terminal when each of the plurality of link files is accessed from the user terminal. Information processing method.

An information processing method for managing a plurality of network-connected file servers,
In response to a file save request from the user terminal, the file management means determines whether the same file as the save target file has already been saved in any of the plurality of file servers, and the same save target If the file exists, a second file link that is different from the first file link for the same file from another user terminal, and is a file path to the same file. An information processing method characterized by managing information.

A program for causing a computer to execute the information processing method according to claim 8 or 9.