JP2006350470A

JP2006350470A - Data management device and method

Info

Publication number: JP2006350470A
Application number: JP2005172873A
Authority: JP
Inventors: Osami Takebe; 修見建部; Tomohiro Kudo; 知宏工藤; Yuetsu Kodama; 祐悦児玉; Tomotsugu Sekiguchi; 智嗣関口
Original assignee: National Institute of Advanced Industrial Science and Technology AIST
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2005-06-13
Filing date: 2005-06-13
Publication date: 2006-12-28
Anticipated expiration: 2025-06-13
Also published as: JP4784854B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a data management device capable of efficiently distributing resources, securing safety and reliability in data files, and raising access performance. <P>SOLUTION: A data processing system performs access via a network to the plurality of data files which are respectively recorded in a plurality of recording media respectively included in a plurality of nodes. The system comprises: a replicating part 106 for generating the replication files of the data files; a storage destination selecting part 128 for selecting a storage destination node for storing the replication file in response to a prescribed condition; and a storage part 120 for storing the replication files in the selected storage destinations. When the data file is replicated and the replication file is stored in the storage destination, a file for management including path information of the replication file is generated. Then the access is performed to the data file recorded in the recording medium, based on the path information which is included in the file for management in a hierarchical structure having a plurality of directories. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、データ管理装置および方法に関し、特に、ネットワークを介して複数のノードに分散して記録されたデータを管理するデータ管理装置および方法に関する。 The present invention relates to a data management apparatus and method, and more particularly to a data management apparatus and method for managing data recorded in a distributed manner on a plurality of nodes via a network.

近年、コンピュータ技術の進歩に伴い、その性能は飛躍的に向上している。さらに、ネットワーク技術も発展し続けており、高速かつ大容量な通信が広範囲で可能となってきている。このような背景のもと、ネットワーク上の複数の資源を仮想化し統合して効率よく利用するための技術の一つとしてグリッド技術が注目されてきている。このグリッド技術は、広域ネットワーク上に分散して配置された資源の存在場所および所有者などをユーザに意識させずに、単にネットワークに接続するだけでそれらの資源を利用できるようにする。グリッド技術を使用すれば、ネットワークに接続された多くのストレージに分散して格納されているデータを統一的に利用できる。 In recent years, with the advancement of computer technology, its performance has improved dramatically. Furthermore, network technology continues to develop, and high-speed and large-capacity communication has become possible over a wide range. Against this background, grid technology has attracted attention as one of the technologies for virtualizing, integrating, and efficiently using a plurality of resources on a network. This grid technology makes it possible to use these resources simply by connecting to the network without making the user aware of the locations and owners of the resources distributed on the wide area network. If grid technology is used, data distributed and stored in many storages connected to the network can be used uniformly.

このようなグリッド技術を利用して、ネットワーク上の複数のノードの記録媒体へのアクセスを可能としたデータ管理装置として、たとえば特許文献１に記載されたものがある。同文献に記載されたデータ管理装置は、複数のデータファイルが格納されているストレージの違いをユーザに意識させずに、それらのデータファイルを統一的に使用可能にしている。
特開２００５−６３２１４号公報 For example, Patent Document 1 discloses a data management apparatus that makes it possible to access recording media of a plurality of nodes on a network using such grid technology. The data management apparatus described in the document makes it possible to use these data files uniformly without making the user aware of the difference in storage in which a plurality of data files are stored.
JP-A-2005-63214

ところで、このようにネットワーク上のノードの記録媒体に格納されているデータファイルは、一つの場所にしか存在しないと、重要な情報が災害や故障などによって完全に消失してしまう危険性がある。さらに、一定のデータファイルへのアクセスが集中すると、処理速度や通信速度が低下してしまう。また、アクセス頻度が一定の記録媒体やデータファイルに集中すると、記録媒体の寿命が短命になってしまう危険性がある。 By the way, if the data file stored in the recording medium of the node on the network as described above exists only in one place, there is a risk that important information may be completely lost due to a disaster or a failure. Furthermore, when access to certain data files is concentrated, the processing speed and communication speed are reduced. Also, if the access frequency is concentrated on a recording medium or data file, there is a risk that the life of the recording medium will be shortened.

本発明は上記事情に鑑みてなされたものであり、その目的とするところは、リソースを効率よく分散させ、データファイルの安全性および信頼性の確保およびアクセス性能を向上させることができるデータ管理装置を提供することにある。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a data management apparatus capable of efficiently distributing resources, ensuring safety and reliability of data files, and improving access performance. Is to provide.

本発明によれば、複数のノードがそれぞれ含む複数の記録媒体にそれぞれ記録された複数のデータファイルに、ネットワークを介してアクセスするアクセス部と、
前記複数のデータファイルにアクセスするためのパス情報を含んだ複数の管理用ファイルをそれぞれ生成する生成部と、
前記複数のデータファイルを管理するために、前記複数のデータファイルのそれぞれに対応した前記管理用ファイルを所定のディレクトリに配置させ、さらに前記ディレクトリを含んだ複数のディレクトリを階層構造によって論理的に管理する管理部と、
前記データファイルの複製ファイルを生成する複製部と、
前記複製ファイルを格納する格納先のノードを所定の条件に従って選択する格納先選択部と、
前記格納先選択部が選択した前記格納先に前記複製ファイルを格納する格納部と、
を備え、
前記生成部は、前記複製部が前記データファイルを複製し、前記格納部が前記格納先に前記複製ファイルを格納した時、当該複製ファイルのパス情報を含む前記管理用ファイルを生成し、
前記アクセス部は、前記複数のディレクトリからなる前記階層構造の中の前記管理用ファイルに含まれた前記パス情報に基づいて、前記記録媒体に記録された前記データファイルにアクセスすることを特徴とするデータ管理装置が提供される。 According to the present invention, an access unit that accesses a plurality of data files respectively recorded on a plurality of recording media included in a plurality of nodes via a network;
A generating unit that generates a plurality of management files each including path information for accessing the plurality of data files;
In order to manage the plurality of data files, the management file corresponding to each of the plurality of data files is arranged in a predetermined directory, and the plurality of directories including the directory are logically managed by a hierarchical structure. A management department to
A duplicating unit for creating a duplicating file of the data file;
A storage destination selection unit for selecting a storage destination node for storing the duplicate file according to a predetermined condition;
A storage unit for storing the duplicate file in the storage destination selected by the storage destination selection unit;
With
The generation unit, when the replication unit replicates the data file, and the storage unit stores the replication file in the storage destination, generates the management file including path information of the replication file,
The access unit accesses the data file recorded on the recording medium based on the path information included in the management file in the hierarchical structure including the plurality of directories. A data management device is provided.

ここで、データファイルとは、クライアントからの演算要求に応じて複数のリソースが演算処理に使用する情報や、その演算処理結果を含む。また、記録媒体とは、フロッピー（登録商標）ディスク、ＣＤ、ＤＶＤ、ＭＯ、磁気テープ、ＵＳＢメモリ、各種メモリカードなどの各種メディアと、ハードディスク装置などの記憶装置とを含む。 Here, the data file includes information used by a plurality of resources for calculation processing in response to a calculation request from the client and the calculation processing result. The recording medium includes various media such as a floppy (registered trademark) disk, CD, DVD, MO, magnetic tape, USB memory, and various memory cards, and a storage device such as a hard disk device.

パス情報は、ＩＰ（Internet Protocol）アドレスなどのネットワークアドレス、記録媒体内のアドレス、およびデータファイルをアクセスするためのアクセス手段、たとえば、ｈｔｔｐ（HyperText Transfer Protocol）やｆｔｐ（File Transfer Protocol）などのプロトコルによって指定可能であるが、これらに限らずデータファイルの位置が特定できればよいものとする。たとえば、ＵＲＬ（Uniform Resource Locators）やＥＰＲ（WS-Addressing EndPoint Reference）などで表すことができる。 The path information includes a network address such as an IP (Internet Protocol) address, an address in a recording medium, and an access means for accessing a data file, for example, a protocol such as http (HyperText Transfer Protocol) or ftp (File Transfer Protocol). However, the present invention is not limited to this, and it is sufficient that the position of the data file can be specified. For example, it can be expressed by URL (Uniform Resource Locators) or EPR (WS-Addressing EndPoint Reference).

階層構造は、別のファイルによって、階層構造が管理されていてもよいが、それに限らず、管理用ファイルやディレクトリの一部にその上位と下位のディレクトリ等の情報が付加されて、論理的にのみ形成されていてもよい。 The hierarchical structure may be managed by another file. However, the present invention is not limited to this, and information such as the upper and lower directories is added to a part of the management file or directory to logically. It may be formed only.

この発明によれば、ネットワーク上の複数のノードの記録媒体にデータファイルの複製を所定の条件に従って分散させて格納することができるので、効率よく、資源の安全性および信頼性の確保およびアクセス性能の向上を図ることができる。 According to the present invention, since a copy of a data file can be distributed and stored in a recording medium of a plurality of nodes on a network according to a predetermined condition, it is possible to efficiently secure resource safety and reliability and access performance. Can be improved.

上記データ管理装置において、前記複製部が複製する前記データファイルの属性情報を取得する属性情報取得部と、前記属性情報に基づいて、所定の複製数決定条件に従って前記複製ファイルの複製数を決定する複製数決定部と、をさらに含むことができる。 In the data management device, an attribute information acquisition unit that acquires attribute information of the data file to be replicated by the replication unit, and determines the replication number of the replication file according to a predetermined replication number determination condition based on the attribute information A replication number determination unit.

ここで、データファイルの属性情報とは、たとえば、ファイルサイズ、生成時刻、更新時刻、オーナーなどの属性である。さらに、格納情報の種類を含むこともできる。すなわち、失われては困る情報を含むかどうかを示すファイルの重要度、個人情報、著作権、機密情報など必要な安全性の指標、人気度などにより格納情報の種類を分類し、ランク付けすることができる。なお、属性情報は、複製する都度取得して使用してもよいし、定期的または随時取得して記憶しておいた情報を使用してもよい。 Here, the attribute information of the data file is, for example, attributes such as a file size, a generation time, an update time, and an owner. Furthermore, the type of stored information can also be included. In other words, the type of stored information is classified and ranked according to the importance of the file that indicates whether it contains information that would not be lost, personal information, copyright, confidential information, necessary safety indicators, popularity, etc. be able to. The attribute information may be acquired and used every time it is copied, or information acquired and stored periodically or as needed may be used.

この構成によれば、データファイルの属性やファイル格納情報の種類によって複製数を決定することができるので、効率よく、資源の安全性および信頼性の確保およびアクセス性能の向上を図ることが可能となる。 According to this configuration, the number of replicas can be determined according to the attribute of the data file and the type of file storage information. Therefore, it is possible to efficiently ensure the safety and reliability of resources and improve the access performance. Become.

上記データ管理装置において、前記格納先選択部は、前記複数のノードの前記記録媒体の性能を示す性能情報を前記ネットワークを介して問い合わせて取得する性能情報取得部と、前記性能情報に基づいて、所定の格納条件を満たしている前記格納先のノードを選択する選択部と、を含むことができる。 In the data management apparatus, the storage destination selection unit, based on the performance information, a performance information acquisition unit that inquires and acquires performance information indicating the performance of the recording medium of the plurality of nodes via the network, And a selection unit that selects the storage destination node that satisfies a predetermined storage condition.

ここで、性能情報とは、たとえば、ノードの記録媒体の空き容量、アクセス性能、アクセスＣＰＵの性能、利用者とのネットワーク的な設置場所などである。これらの性能情報は、たとえば、記録媒体の空き容量であれば、記録媒体を使用してデータ処理を行うデータ処理装置にネットワークを介して問い合わせることにより取得することができる。また、アクセス性能は、実際にノードの記録媒体にネットワークを介してアクセスし、その応答時間を計測し、取得することができる。また、利用者とのネットワーク的な設置場所は、利用者の装置からネットワークを介して問い合わせ信号を送信し、その応答時間を計測することにより取得することができる。選択部は、これらの性能情報が所定の条件を満たしているノードの記録媒体を格納先として選択する。 Here, the performance information is, for example, the free capacity of the recording medium of the node, the access performance, the performance of the access CPU, the network installation location with the user, and the like. Such performance information can be acquired, for example, by making an inquiry over a network to a data processing device that performs data processing using the recording medium if the recording medium has free space. The access performance can be acquired by actually accessing the recording medium of the node via the network, measuring the response time. Also, the network location with the user can be acquired by transmitting an inquiry signal from the user's device via the network and measuring the response time. The selection unit selects a recording medium of a node whose performance information satisfies a predetermined condition as a storage destination.

性能情報に対する条件は、複数の条件を組み合わせたり、優先順位を設けたりすることもできる。さらに、上述のデータファイルの属性を条件として、格納先を決定することができる。なお、性能情報は、複製する都度取得して使用してもよいし、定期的または随時取得して記憶しておいた情報を使用してもよい。 The condition for the performance information can be a combination of a plurality of conditions or a priority order. Furthermore, the storage destination can be determined on the condition of the attribute of the data file described above. The performance information may be acquired and used every time it is copied, or information acquired and stored periodically or as needed may be used.

この構成によれば、所定の格納条件に従って格納先を選択することができるので、データファイルの複製を所定の条件に従って分散させて適切な格納先に格納することができるので、効率よく、資源の安全性および信頼性の確保およびアクセス性能の向上を図ることができる。 According to this configuration, the storage destination can be selected according to the predetermined storage condition, so that the copy of the data file can be distributed according to the predetermined condition and stored in an appropriate storage destination. It is possible to ensure safety and reliability and improve access performance.

上記データ管理装置において、前記複数のノードの前記記録媒体の特性に関する特性情報を登録する登録部をさらに含み、前記格納先選択部は、前記特性情報を前記登録部に問い合わせて取得する特性情報取得部と、前記特性情報に基づいて、所定の格納条件を満たしている前記格納先のノードを選択する選択部と、を含むことができる。 The data management apparatus further includes a registration unit that registers characteristic information regarding the characteristics of the recording medium of the plurality of nodes, and the storage destination selection unit acquires the characteristic information by inquiring and acquiring the characteristic information from the registration unit And a selection unit that selects the storage destination node that satisfies a predetermined storage condition based on the characteristic information.

特性情報とは、ノードの記録媒体の信頼性、そのファイルシステムのＭＴＢＦ（Mean Time Between Failures：平均故障間隔）、ＭＴＴＲ（Mean Time To Repair：平均復旧時間）、ＵＰＳ（Uninterruptable Power Supply：無停電電源装置）の有無などである。さらに、地理的、ネットワーク的な設置距離間隔、設置場所の安全性、たとえば、設置状況、構造、災害頻度、物理的な安全性、たとえば、警備状況、ストレージの容量当たりのコストパフォーマンスなどを含むことができる。 The characteristic information includes the reliability of the recording medium of the node, MTBF (Mean Time Between Failures) of the file system, MTTR (Mean Time To Repair), UPS (Uninterruptable Power Supply) The presence or absence of a device). In addition, geographical and network installation distance intervals, installation site safety, such as installation status, structure, disaster frequency, physical safety, eg security status, cost performance per storage capacity, etc. Can do.

たとえば、地理的な設置距離間隔が所定距離以上のノードや、安全性の高い設置場所のもの、構造上の強度が高いもの、コストが安い、利用者とのネットワーク的な設置場所が便利などを条件とすることができる。 For example, a node with a geographical distance of more than a predetermined distance, a highly secure installation location, a high structural strength, a low cost, a convenient network installation location with users, etc. It can be a condition.

特性情報に対する条件は、複数の条件を組み合わせたり、優先順位を設けたりすることもできる。さらに、上述のデータファイルの属性を条件として、格納先を決定することができる。なお、特性情報は、複製する都度取得して使用してもよいし、定期的または随時取得して記憶しておいた情報を使用してもよい。 The condition for the characteristic information can be a combination of a plurality of conditions or a priority order. Furthermore, the storage destination can be determined on the condition of the attribute of the data file described above. The characteristic information may be acquired and used every time it is copied, or information acquired and stored periodically or as needed may be used.

この構成によれば、所定の格納条件に従って格納先を選択することができるので、データファイルの複製を所定の条件に従って分散させて適切な格納先に格納することができ、効率よく、資源の安全性および信頼性の確保およびアクセス性能の向上を図ることができる。 According to this configuration, the storage destination can be selected according to the predetermined storage condition, so that the copy of the data file can be distributed according to the predetermined condition and stored in an appropriate storage destination. Secure and reliable and improve access performance.

上記データ管理装置において、所定のトリガ条件を満たしているか否かを判定する判定部と、前記トリガ条件を満たしていると判定された時、前記複製部に前記データファイルの複製を生成させる制御部と、を含むことができる。 In the data management device, a determination unit that determines whether or not a predetermined trigger condition is satisfied, and a control unit that causes the replication unit to generate a copy of the data file when it is determined that the trigger condition is satisfied And can be included.

ここで、判定部が判定に用いるトリガ条件とは、たとえば、データファイルの格納時、更新時、定期的、保守作業発生時、災害、警報、ネットワークやストレージの故障などのイベント発生時などを複製のトリガとした複製ファイルを生成するタイミングを示す。また、ストレージおよびネットワークの利用状況、たとえば、性能に対する負荷、累積、短期間、ファイル単位、ディスク単位、組織または地域単位などのアクセス頻度、ストレージ容量の空きの増減の変化量などを条件とすることもできる。 Here, the trigger conditions used by the determination unit for determination include, for example, data file storage, update, periodic, maintenance work, events such as disasters, alarms, network and storage failures, etc. Indicates the timing to generate a duplicate file that was triggered. In addition, the usage conditions of storage and network, for example, load on performance, accumulation, short term, access frequency of file unit, disk unit, organization or region unit, change amount of increase / decrease in storage capacity, etc. You can also.

この構成によれば、所定の条件を満たした時のタイミングでデータファイルを複製することが可能となり、データファイルの破損や消滅の危険性の回避などが可能となり、信頼性が向上する。 According to this configuration, the data file can be duplicated at a timing when a predetermined condition is satisfied, the risk of damage or disappearance of the data file can be avoided, and reliability is improved.

上記データ管理装置において、前記データファイルの属性情報を提示する属性情報提示部と、複製するデータファイルの指定を受け付けるファイル受付部と、前記ファイル受付部が受け付けた前記データファイルの複製数を受け付ける複製数受付部と、を含むことができ、前記複製部は、前記ファイル受付部が受け付けた前記データファイルの複製ファイルを前記複製数受付部が受け付けた前記複製数分、生成することができる。 In the data management device, an attribute information presentation unit that presents attribute information of the data file, a file reception unit that receives designation of a data file to be copied, and a copy that receives the number of copies of the data file received by the file reception unit A copy number of the data file received by the file reception unit, and the copy unit can generate a copy file of the number of copies received by the copy number reception unit.

この構成によれば、提示されたデータファイルの属性情報に基づいて、複製数を使用者が手動で指定することができる複製ファイルのデータ管理を支援可能なデータ管理装置が提供される。 According to this configuration, there is provided a data management device capable of supporting data management of a replicated file that allows the user to manually specify the number of replicas based on the attribute information of the presented data file.

上記データ管理装置において、前記性能情報取得部が取得した前記性能情報または前記特性情報取得部が取得した前記特性情報を提示する情報提示部と、前記複製ファイルの前記格納先を受け付ける格納先受付部と、を含むことができ、前記格納部は、前記格納先受付部が受け付けた前記格納先に前記複製ファイルを格納することができる。 In the data management apparatus, an information presentation unit that presents the performance information acquired by the performance information acquisition unit or the characteristic information acquired by the characteristic information acquisition unit, and a storage destination reception unit that receives the storage destination of the duplicate file The storage unit can store the duplicate file in the storage destination received by the storage destination reception unit.

この構成によれば、提示された性能情報に基づいて、複製ファイルの格納先を使用者が手動で指定することができる複製ファイルのデータ管理を支援可能なデータ管理装置が提供される。 According to this configuration, there is provided a data management apparatus capable of supporting data management of a replicated file that allows a user to manually specify a storage location of the replicated file based on the presented performance information.

上記データ管理装置において、前記判定部が前記トリガ条件を満たしていると判定した時、前記データファイルの複製の生成を促すメッセージをユーザに報知する報知部と、複製指示を受け付ける指示受付部と、を含むことができ、前記複製部は、前記複製指示に呼応して前記データファイルを複製し、前記格納部が前記格納先に格納することができる。 In the data management apparatus, when the determination unit determines that the trigger condition is satisfied, a notification unit that notifies a user of a message that prompts generation of a copy of the data file, an instruction reception unit that receives a copy instruction, The duplication unit may duplicate the data file in response to the duplication instruction, and the storage unit may store the data file in the storage destination.

この構成によれば、データファイルの複製タイミングを知らせることができるので、適切なタイミングで効率よく資源の安全性および信頼性の確保およびアクセス性能の向上を図ることが可能となる。 According to this configuration, it is possible to notify the duplication timing of the data file, so that it is possible to efficiently ensure the safety and reliability of resources and improve the access performance at an appropriate timing.

本発明によれば、複数のノードがそれぞれ含む複数の記録媒体にそれぞれ記録された複数のデータファイルに、ネットワークを介してアクセスするステップと、
前記複数のデータファイルにアクセスするためのパス情報を含んだ複数の管理用ファイルをそれぞれ生成するステップと、
前記複数のデータファイルを管理するために、前記複数のデータファイルのそれぞれに対応した前記管理用ファイルを所定のディレクトリに配置させ、さらに前記ディレクトリを含んだ複数のディレクトリを階層構造によって論理的に管理するステップと、
前記データファイルの複製ファイルを生成するステップと、
前記複製ファイルを格納する格納先のノードを所定の条件に従って選択するステップと、
前記選択するステップで選択した前記格納先に前記複製ファイルを格納するステップと、
前記複製するステップで前記データファイルが複製され、前記格納するステップで前記格納先に前記複製ファイルを格納された時、当該複製ファイルのパス情報を含む前記管理用ファイルを生成するステップと、
前記複数のディレクトリからなる前記階層構造の中の前記管理用ファイルに含まれた前記パス情報に基づいて、前記記録媒体に記録された前記データファイルにアクセスするステップと、を含むことを特徴とするデータ管理方法が提供される。 According to the present invention, accessing a plurality of data files respectively recorded on a plurality of recording media respectively included in a plurality of nodes via a network;
Generating a plurality of management files each including path information for accessing the plurality of data files;
In order to manage the plurality of data files, the management file corresponding to each of the plurality of data files is arranged in a predetermined directory, and the plurality of directories including the directory are logically managed by a hierarchical structure. And steps to
Generating a duplicate file of the data file;
Selecting a storage destination node for storing the duplicate file according to a predetermined condition;
Storing the duplicate file in the storage location selected in the selecting step;
Generating the management file including path information of the duplicate file when the data file is duplicated in the duplicating step and the duplicate file is stored in the storage destination in the storing step;
Accessing the data file recorded on the recording medium based on the path information included in the management file in the hierarchical structure comprising the plurality of directories. A data management method is provided.

上記データ管理方法において、所定のトリガ条件を満たしているか否かを判定するステップと、前記トリガ条件を満たしていると判定された時、前記データファイルの複製を生成するステップと、を含むことができる。 The data management method may include a step of determining whether or not a predetermined trigger condition is satisfied, and a step of generating a copy of the data file when it is determined that the trigger condition is satisfied. it can.

本発明によれば、複数のノードがそれぞれ含む複数の記録媒体にそれぞれ記録された複数のデータファイルに、ネットワークを介してアクセスするデータ管理用コンピュータに、
前記複数のデータファイルに、前記ネットワークを介してアクセスする手段と、
前記複数のデータファイルにアクセスするためのパス情報を含んだ複数の管理用ファイルをそれぞれ生成する手段と、
前記複数のデータファイルを管理するために、前記複数のデータファイルのそれぞれに対応した前記管理用ファイルを所定のディレクトリに配置させ、さらに前記ディレクトリを含んだ複数のディレクトリを階層構造によって論理的に管理する手段と、
前記データファイルの複製ファイルを生成する手段と、
前記複製ファイルを格納する格納先のノードを所定の条件に従って選択する手段と、
前記選択する手段が選択した前記格納先に前記複製ファイルを格納する手段と、
前記複製する手段が前記データファイルを複製し、前記格納する手段が前記格納先に前記複製ファイルを格納した時、当該複製ファイルのパス情報を含む前記管理用ファイルを生成する手段と、
前記複数のディレクトリからなる前記階層構造の中の前記管理用ファイルに含まれた前記パス情報に基づいて、前記記録媒体に記録された前記データファイルにアクセスする手段と、
として機能することを特徴とするプログラムが提供される。 According to the present invention, a data management computer that accesses a plurality of data files respectively recorded on a plurality of recording media included in a plurality of nodes via a network,
Means for accessing the plurality of data files via the network;
Means for generating a plurality of management files each including path information for accessing the plurality of data files;
In order to manage the plurality of data files, the management file corresponding to each of the plurality of data files is arranged in a predetermined directory, and the plurality of directories including the directory are logically managed by a hierarchical structure. Means to
Means for generating a duplicate file of the data file;
Means for selecting a storage destination node for storing the duplicate file according to a predetermined condition;
Means for storing the duplicate file in the storage location selected by the means for selecting;
Means for generating the management file including path information of the duplicate file when the means for duplicating duplicates the data file and the means for storing stores the duplicate file in the storage destination;
Means for accessing the data file recorded on the recording medium based on the path information included in the management file in the hierarchical structure comprising the plurality of directories;
A program characterized by functioning as is provided.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、記録媒体、コンピュータプログラムなどの間で変換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements and a conversion of the expression of the present invention between a method, an apparatus, a system, a recording medium, a computer program, etc. are also effective as an aspect of the present invention.

本発明によれば、リソースを効率よく分散させ、データファイルの安全性および信頼性の確保およびアクセス性能を向上させることができるデータ管理装置が提供される。 According to the present invention, there is provided a data management apparatus capable of efficiently distributing resources, ensuring the safety and reliability of data files, and improving access performance.

以下、本発明の実施の形態について、図面を用いて説明する。尚、すべての図面において、同様な構成要素には同様の符号を付し、適宜説明を省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In all the drawings, the same reference numerals are given to the same components, and the description will be omitted as appropriate.

図１は、本発明の実施の形態に係るデータ処理システムの構成を示す図である。本実施形態は、グリッド技術を利用し、記録媒体とＣＰＵを含んだ処理装置（ノード）がネットワークを介して複数接続され、それらが互いに連動しながら並列計算処理を実行する並列計算システムに関し、特にその中でも複数の処理装置の記録媒体にそれぞれ記録された複数のデータファイルを管理するデータ管理システムに関する。 FIG. 1 is a diagram showing a configuration of a data processing system according to an embodiment of the present invention. The present embodiment relates to a parallel computing system that uses grid technology, a plurality of processing devices (nodes) including a recording medium and a CPU are connected via a network, and performs parallel computing processing in conjunction with each other. In particular, the present invention relates to a data management system for managing a plurality of data files respectively recorded on recording media of a plurality of processing apparatuses.

ここで、データファイルとは、クライアントからの演算要求に応じて複数のリソースが演算処理に使用する情報や、その演算処理結果を含む。あるいは、ビジネス分野などで利用されるディジタルデータを含む。また、本実施形態において、記録媒体はハードディスク装置などの記憶装置である。 Here, the data file includes information used by a plurality of resources for calculation processing in response to a calculation request from the client and the calculation processing result. Alternatively, digital data used in the business field or the like is included. In the present embodiment, the recording medium is a storage device such as a hard disk device.

また、本実施形態において、処理装置が処理する演算処理とは、たとえば、メールなどを含むディジタルデータのアーカイブ、バックアップ処理、あるいは、科学技術やエンジニアリング、例として、自動車設計分野におけるＣＡＥ（Computer Aided Engineering）や衝突解析、製薬企業におけるドラッグスクリーニング解析、航空機設計分野における流体解析や構造解析、金融業におけるクレジットリスク分析、材料開発分野における分子・原子のシミュレーション、素粒子物理学分野、天文学分野のデータ解析、生命情報学分野におけるたんぱく質の機能予測、地球惑星科学分野における衛星、地上観測データによる標高データ作成などに関する設計・計算・解析などの処理が挙げられる。その他、ビジネス分野などにおいて格納されたディジタルデータの処理を含むことができる。 Further, in the present embodiment, the arithmetic processing processed by the processing device is, for example, digital data archiving including email, backup processing, or science and engineering, for example, CAE (Computer Aided Engineering in the automotive design field) ) And collision analysis, drug screening analysis in pharmaceutical companies, fluid analysis and structural analysis in aircraft design field, credit risk analysis in financial industry, molecular / atom simulation in material development field, particle physics field, astronomy field data analysis These include design, calculation, and analysis related to protein function prediction in the field of bioinformatics, satellites in the earth and planetary science field, and elevation data creation from ground observation data. In addition, processing of digital data stored in a business field or the like can be included.

図１に示すように、データ処理システム１００は、クライアント装置１０、サーバ装置１２、記録部１４、処理装置１６と総称される第１処理装置１６ａ、第２処理装置１６ｂ、第３処理装置１６ｃ、第４処理装置１６ｄ、第５処理装置１６ｅ、第６処理装置１６ｆ、ネットワーク１８を含む。また、サーバ装置１２は、アクセス部２０、受付部２２、管理部２４、生成部２６、決定部２８、指示部３０を含み、処理装置１６は、処理部４０と総称される第１処理部４０ａ、第２処理部４０ｂ、第３処理部４０ｃ、第４処理部４０ｄ、第５処理部４０ｅ、第６処理部４０ｆ、記録部４２と総称される第１記録部４２ａ、第２記録部４２ｂ、第３記録部４２ｃ、第４記録部４２ｄ、第５記録部４２ｅ、第６記録部４２ｆをそれぞれ含む。なお、本実施形態において、各処理装置１６は、一つの記録部４２を含む構成としているが、これに限定されない。各処理装置１６が複数の記録部４２を含むこともできる。 As shown in FIG. 1, the data processing system 100 includes a client device 10, a server device 12, a recording unit 14, and a first processing device 16a, a second processing device 16b, a third processing device 16c, which are collectively referred to as a processing device 16. A fourth processing device 16d, a fifth processing device 16e, a sixth processing device 16f, and a network 18 are included. The server device 12 includes an access unit 20, a reception unit 22, a management unit 24, a generation unit 26, a determination unit 28, and an instruction unit 30. The processing device 16 is a first processing unit 40 a that is collectively referred to as a processing unit 40. , Second processing unit 40b, third processing unit 40c, fourth processing unit 40d, fifth processing unit 40e, sixth processing unit 40f, first recording unit 42a, second recording unit 42b, collectively referred to as recording unit 42, A third recording unit 42c, a fourth recording unit 42d, a fifth recording unit 42e, and a sixth recording unit 42f are included. In the present embodiment, each processing device 16 includes a single recording unit 42, but the present invention is not limited to this. Each processing device 16 may include a plurality of recording units 42.

複数の処理装置１６がネットワーク１８を介して接続され、クライアント装置１０からの演算要求に応じて、サーバ装置１２によって処理装置１６を連動させて要求された演算処理を実行し、得られた演算結果をクライアント装置１０に返すものである。 A plurality of processing devices 16 are connected via the network 18, and in response to a calculation request from the client device 10, the server device 12 performs the requested calculation processing in conjunction with the processing device 16, and the obtained calculation result Is returned to the client device 10.

ネットワーク１８は、所定のプロトコルによって、情報信号を通信する。ネットワーク１８は、主に同軸ケーブル、ツイストペアケーブル、光ケーブル等の信号を伝達するためのケーブルを、ルータ、交換機等の信号の伝達経路を制御する装置で接続して構成されている。図１では、ひとつのネットワーク１８のみを示しているが、これに限らず、複数のネットワークの結合によって構成されていてもよく、複数のネットワークのバンド幅がそれぞれ異なっていてもよい。 The network 18 communicates information signals according to a predetermined protocol. The network 18 is configured by connecting cables for transmitting signals, such as coaxial cables, twisted pair cables, and optical cables, with devices that control signal transmission paths such as routers and exchanges. Although only one network 18 is shown in FIG. 1, the present invention is not limited to this, and the network 18 may be configured by combining a plurality of networks, and the bandwidths of the plurality of networks may be different from each other.

クライアント装置１０は、ユーザが操作するパーソナルコンピュータ等である。クライアント装置１０は、ネットワーク１８を介して後述のサーバ装置１２に対して、データファイルの検索や計算処理等の指示を出力し、サーバ装置１２からその結果を受け付け、図示しないディスプレイに表示する。 The client device 10 is a personal computer or the like operated by a user. The client device 10 outputs an instruction such as data file search or calculation processing to the server device 12 described later via the network 18, receives the result from the server device 12, and displays the result on a display (not shown).

本実施形態において、処理装置１６の記録部４２は、データファイルを記録するハードディスク等の記憶装置である。ここでデータファイル「ＤＡＴＡＡ」は、第１記録部４２ａと第６記録部４２ｆに複製され記録されている。また、データファイル「ＤＡＴＡＢ」は、データ容量が大きいため、データファイル「ＤＡＴＡＢ−１」、「ＤＡＴＡＢ−２」、「ＤＡＴＡＢ−３」に分割されており、データファイル「ＤＡＴＡＢ−１」は、第１記録部４２ａと第４記録部４２ｄに複製されて記録され、データファイル「ＤＡＴＡＢ−２」は、第２記録部４２ｂと第５記録部４２ｅに複製されて記録され、データファイル「ＤＡＴＡＢ−３」は、第３記録部４２ｃと第６記録部４２ｆに複製され記録されている。また、予め記録していたデータファイルだけではなく、後述の処理部４０の処理によって生成されたデータファイルも記録する。 In the present embodiment, the recording unit 42 of the processing device 16 is a storage device such as a hard disk that records data files. Here, the data file “DATA A” is duplicated and recorded in the first recording unit 42a and the sixth recording unit 42f. Further, since the data file “DATA B” has a large data capacity, it is divided into data files “DATA B-1”, “DATA B-2”, and “DATA B-3”. 1 ”is duplicated and recorded in the first recording unit 42a and the fourth recording unit 42d, and the data file“ DATA B-2 ”is duplicated and recorded in the second recording unit 42b and the fifth recording unit 42e. The data file “DATA B-3” is duplicated and recorded in the third recording unit 42c and the sixth recording unit 42f. Further, not only the data file recorded in advance, but also the data file generated by the processing of the processing unit 40 described later is recorded.

処理装置１６の処理部４０は、計算やデータファイル検索等の処理を実行するＣＰＵ、データファイルを一時的に記録するＲＡＭ（Random Access Memory）等を含む。ここで、計算やデータファイル検索等の対象となるデータファイルや処理は後述のサーバ装置１２から指示される。 The processing unit 40 of the processing device 16 includes a CPU that executes processing such as calculation and data file search, a RAM (Random Access Memory) that temporarily records data files, and the like. Here, a data file and processing to be subjected to calculation, data file search, and the like are instructed from the server device 12 described later.

サーバ装置１２において、アクセス部２０は、ネットワーク１８と接続し、データや所定の指示のための信号を入出力する。受付部２２は、ネットワーク１８とアクセス部２０を介してクライアント装置１０や処理装置１６から出力されたデータ等を受け付ける。クライアント装置１０からはデータファイルの検索、計算処理、管理しているデータファイルの表示等の指示を受け付ける。処理装置１６からはそれぞれの記録部４２で記録しているデータファイルのパス情報を受け付けたり、予めサーバ装置１２から出力した指示に応じた結果を受け付けたりする。なお、パス情報の具体例については、後述する。 In the server device 12, the access unit 20 is connected to the network 18 and inputs / outputs data and signals for predetermined instructions. The accepting unit 22 accepts data output from the client device 10 or the processing device 16 via the network 18 and the access unit 20. The client device 10 receives instructions such as data file search, calculation processing, and management data file display. From the processing device 16, the path information of the data file recorded in each recording unit 42 is received, or the result according to the instruction output from the server device 12 in advance is received. A specific example of the path information will be described later.

生成部２６は、受付部２２で受け付けたデータファイルのパス情報にもとづいて、記録部４２で記録されているデータファイルのパス情報を含んだ管理用ファイルを生成する。生成する際、例えば、管理用ファイルの名称は、それに対応するデータファイルに一致するように決定する。ここで、データファイル「ＤＡＴＡＡ」のように複数の記録部４２に記録されている場合は、複数のパス情報を管理用ファイルに記載する。また、データファイルの生成段階あるいは生成前に、あらかじめデータファイルに対応付けられた管理用ファイルを生成してもよい。なお、生成段階とは、データファイルを生成している場合だけではなく、データファイルの生成前も含み、データファイルが完成されていない段階を示す。 The generating unit 26 generates a management file including the path information of the data file recorded by the recording unit 42 based on the path information of the data file received by the receiving unit 22. At the time of generation, for example, the name of the management file is determined so as to match the corresponding data file. Here, when data is recorded in a plurality of recording units 42 like the data file “DATA A”, a plurality of path information is described in the management file. Further, a management file associated with the data file in advance may be generated before or at the generation stage of the data file. The generation stage indicates not only the case where the data file is generated but also the stage where the data file is not completed, including before the data file is generated.

管理部２４は、管理用ファイルを所定のディレクトリに配置させ、複数のディレクトリを階層構造によって管理する。なお、管理すべき階層構造を形成する複数のディレクトリの情報および管理用ファイルは記録部１４に記録される。また、クライアント装置１０からの指示によって所定のデータファイルを検索する場合、管理部２４は、記録部１４にアクセスし、階層構造を形成する複数のディレクトリの情報から、検索対象のデータファイルに対応した管理用ファイルを検索する。 The management unit 24 arranges management files in a predetermined directory, and manages a plurality of directories in a hierarchical structure. Information of a plurality of directories forming a hierarchical structure to be managed and management files are recorded in the recording unit 14. Further, when searching for a predetermined data file according to an instruction from the client device 10, the management unit 24 accesses the recording unit 14 and corresponds to the data file to be searched from information of a plurality of directories forming a hierarchical structure. Search for administrative files.

また、管理部２４は、後述するようにデータファイルが複製されたとき、生成部２６に、その複製ファイルの管理用ファイルの生成を指示し、その管理用ファイルを所定のディレクトリに配置させ、複数のディレクトリを階層構造によって管理する。この管理すべき階層構造を形成する複数のディレクトリの情報および管理用ファイルは記録部１４に記録される。 Further, when the data file is duplicated as will be described later, the management unit 24 instructs the generation unit 26 to generate a management file for the duplicate file, arranges the management file in a predetermined directory, and Are managed in a hierarchical structure. Information of a plurality of directories and management files forming the hierarchical structure to be managed are recorded in the recording unit 14.

決定部２８は、クライアント装置１０からの指示に応じた処理を複数の処理部４０が実行する場合に、処理を実際に実行すべき処理部４０を決定する。ここでは、既に実行されている処理を無視すれば、処理対象のデータファイルを記録している記録部４２と同一の処理装置１６に含まれている処理部４０に処理を優先的に割り当てる。例えば、クライアント装置１０からの指示がデータファイル「ＤＡＴＡＡ」の検索処理の場合、決定部２８は、データファイル「ＤＡＴＡＡ」を記録した第１記録部４２ａと同一の第１処理装置１６ａに含まれた第１処理部４０ａに当該処理を割り当てる。なお、データファイル「ＤＡＴＡＡ」は、第６記録部４２ｆにも記録されているため、当該処理を第６処理部４０ｆに割り当ててもよい。 The determination unit 28 determines the processing unit 40 that should actually execute the process when the plurality of processing units 40 execute the process according to the instruction from the client device 10. Here, if processing that has already been executed is ignored, processing is preferentially assigned to the processing unit 40 included in the same processing device 16 as the recording unit 42 that records the data file to be processed. For example, when the instruction from the client device 10 is a search process of the data file “DATA A”, the determination unit 28 is included in the same first processing device 16a as the first recording unit 42a that records the data file “DATA A”. The process is assigned to the first processing unit 40a. Since the data file “DATA A” is also recorded in the sixth recording unit 42f, the process may be assigned to the sixth processing unit 40f.

指示部３０は、決定部２８によって決定された処理装置１６に処理の実行を指示したり、あるいは管理部２４で直接実行された処理結果を出力する。 The instruction unit 30 instructs the processing device 16 determined by the determination unit 28 to execute processing, or outputs a processing result directly executed by the management unit 24.

データ処理システム１００の構成は、ハードウェア的には、任意のコンピュータのＣＰＵ、メモリ、その他のＬＳＩで実現でき、ソフトウェア的にはメモリのロードされた予約管理機能のあるプログラムなどによって実現されるが、ここではそれらの連携によって実現される機能ブロックを描いている。したがって、これらの機能ブロックがハードウェアのみ、ソフトウェアのみ、またはそれらの組合せによっていろいろな形で実現できることは、当業者には理解されるところである。以下説明する各図は、ハードウェア単位の構成ではなく、機能単位のブロックを示している。また、各図において、本発明の本質に関わらない部分の構成については省略してある。 The configuration of the data processing system 100 can be realized in terms of hardware by a CPU, memory, or other LSI of an arbitrary computer, and can be realized in terms of software by a program having a reservation management function loaded in a memory. Here, the functional blocks realized by the cooperation are depicted. Accordingly, those skilled in the art will understand that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof. Each figure described below shows functional unit blocks, not hardware unit configurations. Moreover, in each figure, the structure of the part which is not related to the essence of the present invention is omitted.

以上のように構成された本実施形態のデータ処理システム１００は、複数の処理装置１６が接続されたネットワーク１８にさらにサーバ装置１２を接続し、当該サーバ装置１２は、複数の処理装置１６に記録された複数のデータファイルのそれぞれに対して、パス情報を含んだ管理用ファイルを生成する。例えば、データファイル「Ａ」に対応した管理用ファイル「Ａ」は、データファイル「Ａ」を記録した処理装置１６のネットワークアドレスと処理装置１６に含まれた記録部４２内の記録領域のアドレスを記載している。さらに、サーバ装置１２は、複数のデータファイルを直接管理せず、その代わりに複数の管理用ファイルを所定のディレクトリに配置させ、そのような複数のディレクトリを階層構造によって管理する。ユーザまたはプログラムが所定のデータファイルを指定する場合、階層構造を形成するディレクトリの中から当該データファイルに対応した管理用ファイルを指定する。サーバ装置１２は、指定された管理用ファイルに含まれたパス情報にもとづいて、当該データファイルが記録された処理装置１６にアクセスし、当該データファイルを指定する。 In the data processing system 100 of the present embodiment configured as described above, the server device 12 is further connected to the network 18 to which the plurality of processing devices 16 are connected, and the server device 12 is recorded in the plurality of processing devices 16. A management file including path information is generated for each of the plurality of data files. For example, the management file “A” corresponding to the data file “A” includes the network address of the processing device 16 that recorded the data file “A” and the address of the recording area in the recording unit 42 included in the processing device 16. It is described. Further, the server device 12 does not directly manage a plurality of data files, but instead arranges a plurality of management files in a predetermined directory, and manages such a plurality of directories in a hierarchical structure. When a user or program designates a predetermined data file, a management file corresponding to the data file is designated from a directory forming a hierarchical structure. Based on the path information included in the designated management file, the server device 12 accesses the processing device 16 in which the data file is recorded, and designates the data file.

図２は、サーバ装置１２でのファイルシステムの構成を示す。図２の上部に記録部１４で記録されている複数のディレクトリが形成する階層構造の情報を示す。ディレクトリ「／ｇｒｉｄ」が階層構造の最上位に位置づけられており、その下位にディレクトリ「ｇｇｆ」と「ｊｐ」が配置されている。ディレクトリ「ｊｐ」の下位には、管理用ファイル５０ａ（図中、「ＤＡＴＡＡ」と示す）が配置されている。さらに、ディレクトリ「ｇｇｆ」の下位のディレクトリ「ａｉｓｔ」の下位には、管理用ファイル５０ｂ、５０ｃ、５０ｄ（図中、「ＤＡＴＡＢ−１」、「ＤＡＴＡＢ−２」、「ＤＡＴＡＢ−３」とそれぞれ示す）が配置されている。ここで、管理用ファイル５０ｂ、５０ｃ、５０ｄは、図１のデータファイル「ＤＡＴＡＡ」、「ＤＡＴＡＢ−１」、「ＤＡＴＡＢ−２」、「ＤＡＴＡＢ−３」にそれぞれ対応した管理用ファイルである。 FIG. 2 shows the configuration of the file system in the server device 12. The hierarchical structure information formed by a plurality of directories recorded by the recording unit 14 is shown in the upper part of FIG. The directory “/ grid” is positioned at the top of the hierarchical structure, and directories “ggf” and “jp” are arranged below the directory “/ grid”. A management file 50a (shown as “DATA A” in the figure) is arranged under the directory “jp”. Further, the management files 50b, 50c, and 50d (in the figure, “DATA B-1”, “DATA B-2”, “DATA B-3”) are subordinate to the directory “ist” that is lower than the directory “ggf”. Are shown). Here, the management files 50b, 50c, and 50d are management files corresponding to the data files “DATA A”, “DATA B-1”, “DATA B-2”, and “DATA B-3” in FIG. 1, respectively. It is.

管理用ファイル５０ａは、パス情報として「ｆｔｐ：／／ｐ１．ｃｏ．ｊｐ／ａ／ＤＡＴＡＡ」と「ｈｔｔｐ：／／ｐ６．ｃｏ．ｊｐ／ａ／ＤＡＴＡＡ」を記載している。ここで、図１の第１処理装置１６ａが「ｐ１．ｃｏ．ｊｐ」に、第６処理装置１６ｆが「ｐ６．ｃｏ．ｊｐ」に対応し、データファイル「ＤＡＴＡＡ」が「ＤＡＴＡＡ」に対応する。また、管理用ファイル５０ｂ、５０ｃ、５０ｄについても同様である。 The management file 50a describes “ftp://p1.co.jp/a/DATAA” and “http://p6.co.jp/a/DATAA” as path information. Here, the first processing device 16a in FIG. 1 corresponds to “p1.co.jp”, the sixth processing device 16f corresponds to “p6.co.jp”, and the data file “DATA A” corresponds to “DATAA”. To do. The same applies to the management files 50b, 50c, and 50d.

図２の下部には、図１での説明と同様に、記録部４２で記録されているデータファイルを示す。記録部４２のそれぞれの上部には、前述のごとく、それぞれのプロトコルとネットワークアドレスを「ｆｔｐ：／／ｐ１．ｃｏ．ｊｐ」のように示す。また、記録部４２のそれぞれの内部には、記録領域のアドレスを「／ａ／ＤＡＴＡＡ」と「／ｂ／ＤＡＴＡＢ−１」のように示す。データファイルは、複数の記録部４２に分散して記録されているが、記録部１４では、それらのデータファイルを直接管理するのではなく、データファイルに対応した管理用ファイルを階層構造によって管理するため、図示のごとく、データファイルを記録した記録部４２を意識することなく、データファイルを管理可能である。 The lower part of FIG. 2 shows the data file recorded by the recording unit 42 as in the description of FIG. As described above, each protocol and network address is indicated as “ftp://p1.co.jp” in the upper part of each recording unit 42. In each recording unit 42, the address of the recording area is indicated as “/ a / DATAA” and “/ b / DATAB-1”. Data files are distributed and recorded in a plurality of recording units 42, but the recording unit 14 does not directly manage these data files, but manages the management files corresponding to the data files in a hierarchical structure. Therefore, as shown in the figure, the data file can be managed without being aware of the recording unit 42 that recorded the data file.

ユーザまたはプログラムがデータファイル「ＤＡＴＡＢ−１」に対する処理を指示した場合、管理部２４は、記録部１４に記録された管理用ファイル５０ｂのパス情報に応じて、第１記録部４２ａか第４記録部４２ｄの「／ｂ／ＤＡＴＡＢ−１」にアクセスする。一方、データファイル「ＤＡＴＡＢ」のすべてに対する処理を指示する場合、管理部２４は、データファイル「ＤＡＴＡＢ」に対応した管理用ファイル５０ｂ、５０ｃ、５０ｄをすべて含んだディレクトリ「ａｉｓｔ」をデータファイル「ＤＡＴＡＢ」に対応した管理用ファイル５０ｅとみなす。さらに、それらに記載されたパス情報にもとづいて、対応すべき記録部４２にアクセスする。 When the user or the program instructs the processing for the data file “DATA B-1”, the management unit 24 determines whether the first recording unit 42a or the fourth recording unit 42a is in accordance with the path information of the management file 50b recorded in the recording unit 14. Access “/ b / DATAB-1” of the recording unit 42d. On the other hand, when instructing the processing for all of the data files “DATA B”, the management unit 24 stores the directory “aist” including all the management files 50b, 50c, and 50d corresponding to the data file “DATA B” as the data files. It is regarded as a management file 50e corresponding to “DATA B”. Furthermore, based on the path information described therein, the corresponding recording unit 42 is accessed.

図３は、本発明の実施形態に係るデータ処理システム１００の要部構成であるバックアップ部６０の機能ブロック図である。図１のデータ処理システム１００は、さらに図３のバックアップ部６０を含むことができる。本実施形態において、バックアップ部６０は、サーバ装置１２に含まれるが、これに限定されない。 FIG. 3 is a functional block diagram of the backup unit 60 which is a main configuration of the data processing system 100 according to the embodiment of the present invention. The data processing system 100 of FIG. 1 can further include a backup unit 60 of FIG. In the present embodiment, the backup unit 60 is included in the server device 12, but is not limited thereto.

本発明の実施の形態に係るデータ処理システム１００は、複数のノードがそれぞれ含む複数の記録媒体（図１の記録部４２）にそれぞれ記録された複数のデータファイルに、ネットワーク（図１のネットワーク１８）を介してアクセスするアクセス部（図１のアクセス部２０）と、複数のデータファイルにアクセスするためのパス情報を含んだ複数の管理用ファイルをそれぞれ生成する生成部（図１の生成部２６）と、複数のデータファイルを管理するために、複数のデータファイルのそれぞれに対応した管理用ファイルを所定のディレクトリに配置させ、さらにディレクトリを含んだ複数のディレクトリを階層構造によって論理的に管理する管理部（図１の管理部２４）と、データファイルの複製ファイルを生成する複製部（図３の複製部１０６）と、複製ファイルを格納する格納先のノードを所定の条件に従って選択する格納先選択部（図３の格納先選択部１２８）と、格納先選択部が選択した格納先に複製ファイルを格納する格納部（図３の格納部１２０）と、を備え、生成部は、複製部がデータファイルを複製し、格納部が格納先に複製ファイルを格納した時、当該複製ファイルのパス情報を含む管理用ファイルを生成し、アクセス部は、複数のディレクトリからなる階層構造の中の管理用ファイルに含まれたパス情報に基づいて、記録媒体に記録されたデータファイルにアクセスする。 The data processing system 100 according to the embodiment of the present invention includes a network (network 18 in FIG. 1) in a plurality of data files respectively recorded in a plurality of recording media (recording unit 42 in FIG. 1) included in a plurality of nodes. ) And a generation unit (generation unit 26 in FIG. 1) that respectively generates a plurality of management files including path information for accessing a plurality of data files. In order to manage a plurality of data files, a management file corresponding to each of the plurality of data files is arranged in a predetermined directory, and the plurality of directories including the directory are logically managed by a hierarchical structure. A management unit (the management unit 24 in FIG. 1) and a replication unit that generates a duplicate file of the data file (the replication unit in FIG. 3) 06), a storage destination selection unit (storage destination selection unit 128 in FIG. 3) for selecting a storage destination node for storing the duplicate file according to a predetermined condition, and a duplicate file stored in the storage destination selected by the storage destination selection unit A storage unit (storage unit 120 in FIG. 3), and the generation unit includes path information of the replication file when the replication unit replicates the data file and the storage unit stores the replication file in the storage destination. The management file is generated, and the access unit accesses the data file recorded on the recording medium based on the path information included in the management file in a hierarchical structure including a plurality of directories.

図３に示すようにバックアップ部６０は、トリガ条件記憶部１０２（図中、「トリガ条件」と示す）と、制御部１０４と、複製部１０６と、ファイル情報取得部１０８と、ファイル情報記憶部１１０（図中、「ファイル情報」と示す）と、数決定条件記憶部１１２（図中、「数決定条件」と示す）と、複製数決定部１１４と、格納部１２０と、媒体情報取得部１２２と、媒体情報記憶部１２４（図中、「媒体情報」と示す）と、格納先選択条件記憶部１２６（図中、「格納先選択条件」と示す）と、格納先選択部１２８と、を備えている。 As shown in FIG. 3, the backup unit 60 includes a trigger condition storage unit 102 (shown as “trigger condition” in the figure), a control unit 104, a duplication unit 106, a file information acquisition unit 108, and a file information storage unit. 110 (shown as “file information” in the figure), a number determination condition storage unit 112 (shown as “number decision condition” in the figure), a copy number determination unit 114, a storage unit 120, and a medium information acquisition unit 122, a medium information storage unit 124 (shown as “medium information” in the figure), a storage destination selection condition storage unit 126 (shown as “storage destination selection condition” in the figure), a storage destination selection unit 128, It has.

トリガ条件記憶部１０２は、複製部１０６がデータファイルの複製を生成するタイミングとなるトリガを記憶する。トリガ条件とは、たとえば、データファイルの格納時、更新時、定期的、保守作業発生時、災害、警報、ネットワークやストレージの故障などのイベント発生時などを複製のトリガとした複製ファイルを生成するタイミングを示す。また、ストレージおよびネットワークの利用状況、たとえば、性能に対する負荷、累積、短期間、ファイル単位、ディスク単位、組織または地域単位などのアクセス頻度、ストレージ容量の空きの増減の変化量などを条件とすることもできる。あるいは、データファイルにアクセス可能か否かを定期的または随時確認し、アクセス不可の場合をトリガとすることもできる。これらのトリガ条件は、複数の条件を組み合わせたり、優先順位を設けたりすることもできる。 The trigger condition storage unit 102 stores a trigger that is a timing at which the duplication unit 106 generates a duplication of the data file. A trigger condition is, for example, creating a replication file that triggers replication when a data file is stored, updated, periodically, when maintenance work occurs, or when an event such as a disaster, alarm, or network or storage failure occurs. Indicates timing. In addition, the usage conditions of storage and network, for example, load on performance, accumulation, short term, access frequency of file unit, disk unit, organization or region unit, change amount of increase / decrease in storage capacity, etc. You can also. Alternatively, whether or not the data file can be accessed is checked periodically or at any time, and the case where the data file cannot be accessed can be used as a trigger. These trigger conditions can be a combination of a plurality of conditions or can be prioritized.

制御部１０４は、トリガ条件記憶部１０２にアクセスし、トリガ条件を満たしているか否かを判定し、トリガ条件を満たしていると判定した時、複製部１０６にデータファイルの複製を生成させる。あるいは、クライアント装置１０のユーザあるいはサーバ装置１２の管理者からの指示を後述する指示受付部が受け付け、制御部１０４に通知することにより、データファイルの複製を行うこともできる。 The control unit 104 accesses the trigger condition storage unit 102 to determine whether or not the trigger condition is satisfied. When determining that the trigger condition is satisfied, the control unit 104 causes the replication unit 106 to generate a copy of the data file. Alternatively, an instruction receiving unit (to be described later) receives an instruction from the user of the client device 10 or the administrator of the server device 12 and notifies the control unit 104 to copy the data file.

複製部１０６は、データファイルの複製ファイルを生成する。複製部１０６は、管理部２４から、複製元のデータファイルの管理用ファイルを取得し、管理用ファイルに含まれるパス情報からデータファイルの格納先を取得し、パス情報に基づいて、アクセス部２０を介してデータファイルにアクセスして複製を生成する。 The duplication unit 106 generates a duplication file of the data file. The duplication unit 106 obtains a management file of the duplication source data file from the management unit 24, obtains the storage destination of the data file from the path information included in the management file, and based on the path information, the access unit 20 The data file is accessed via and a replica is generated.

ファイル情報取得部１０８は、複製部１０６から複製元データファイルの格納先を取得し、アクセス部２０を介して格納先にアクセスし、複製元データファイルに関する情報（以下、「ファイル情報」と呼ぶ）を取得する。ここで、ファイル情報とは、データファイルの属性情報を含み、たとえば、ファイルサイズ、生成時刻、更新時刻、オーナーなどの属性である。さらに、格納情報の種類を含むこともできる。すなわち、失われては困る情報を含むかどうかを示すファイルの重要度、個人情報、著作権、機密情報など必要な安全性の指標、人気度などにより格納情報の種類を分類し、ランク付けすることができる。なお、ファイル情報は、複製する都度取得して使用してもよいし、定期的または随時取得して記憶しておいた情報を使用してもよい。 The file information acquisition unit 108 acquires the storage destination of the replication source data file from the replication unit 106, accesses the storage destination via the access unit 20, and information about the replication source data file (hereinafter referred to as “file information”). To get. Here, the file information includes attribute information of the data file, for example, attributes such as file size, generation time, update time, and owner. Furthermore, the type of stored information can also be included. In other words, the type of stored information is classified and ranked according to the importance of the file that indicates whether it contains information that would not be lost, personal information, copyright, confidential information, necessary safety indicators, popularity, etc. be able to. The file information may be acquired and used every time it is copied, or may be information acquired and stored periodically or as needed.

これらのファイル情報は、ファイル情報取得部１０８がアクセス部２０を介して処理装置１６の記録部４２に記録されているデータファイルにアクセスして取得することができる。 The file information can be acquired by the file information acquisition unit 108 accessing the data file recorded in the recording unit 42 of the processing device 16 via the access unit 20.

ファイル情報記憶部１１０は、ファイル情報取得部１０８が取得した複製元のデータファイルに関する情報を記憶する。あるいは、予めファイル情報を登録する登録部（不図示）を含むこともできる。そして、サーバ装置１２の管理者などが、予め登録部を使用して、ファイル情報記憶部１１０のファイル情報を登録することができる。 The file information storage unit 110 stores information related to the replication source data file acquired by the file information acquisition unit 108. Alternatively, a registration unit (not shown) for registering file information in advance can be included. Then, an administrator of the server device 12 can register file information in the file information storage unit 110 using a registration unit in advance.

数決定条件記憶部１１２は、複製数決定部１１４が複製数を決定する際に使用する複製数決定条件を記憶する。たとえば、ファイルサイズが所定値以上の場合は、複製数を３つにし、所定値未満の場合は複製数を２つにするなどファイルサイズ範囲と複製数を対応付けたテーブルや、ファイルの重要度が所定レベル以上の場合、複製数を３つにし、所定レベル未満の場合は複製数を２つにするなど、重要度レベル範囲と複製数を対応付けたテーブルなどを含むことができる。複製数決定条件は、複数の条件を組み合わせたり、優先順位を設けたりすることもできる。 The number determination condition storage unit 112 stores a copy number determination condition used when the copy number determination unit 114 determines the copy number. For example, if the file size is greater than or equal to a predetermined value, the number of replicas is three, and if it is less than the predetermined value, the number of replicas is two. If the value is equal to or higher than a predetermined level, the number of replicas is set to three, and if the level is less than the predetermined level, the number of replicas is set to two. As the condition for determining the number of replicas, a plurality of conditions can be combined or a priority order can be set.

複製数決定部１１４は、ファイル情報記憶部１１０に記憶されている複製元のデータファイルの属性情報に基づいて、数決定条件記憶部１１２に記憶されている複製数決定条件に従って、複製数を決定する。複製数決定部１１４によって、使用する複製数決定条件は、後述するように、クライアント装置１０のユーザやサーバ装置１２の管理者などによって指定することもできる。複製部１０６は、複製数決定部１１４が決定した複製数分の複製ファイルを生成する。 The number-of-replications determination unit 114 determines the number of copies in accordance with the number-of-replications determination conditions stored in the number determination condition storage unit 112 based on the attribute information of the data file of the replication source stored in the file information storage unit 110. To do. The copy number determination unit 114 can specify the copy number determination condition to be used by a user of the client device 10 or an administrator of the server device 12 as described later. The duplication unit 106 generates duplicate files for the number of duplications determined by the duplication number determination unit 114.

格納部１２０は、複製部１０６が生成した複製ファイルを格納先選択部１２８が選択した格納先にアクセス部２０を介して格納する。格納部１２０が複製ファイルを格納したとき、管理部２４にその旨を通知する。管理部２４は、格納部１２０からの通知を受けて、複製ファイルの格納先のパス情報を取得し、生成部２６が管理用ファイルを生成し、記録部１４に記録する。 The storage unit 120 stores the duplicate file generated by the duplication unit 106 in the storage destination selected by the storage destination selection unit 128 via the access unit 20. When the storage unit 120 stores the duplicate file, it notifies the management unit 24 accordingly. In response to the notification from the storage unit 120, the management unit 24 acquires path information of the storage location of the duplicate file, and the generation unit 26 generates a management file and records it in the recording unit 14.

媒体情報取得部１２２は、データファイルの格納先となる複数の記録部４２の性能または特性に関する情報を取得する。媒体情報記憶部１２４は、媒体情報取得部１２２が取得した情報を記憶する。 The medium information acquisition unit 122 acquires information on the performance or characteristics of the plurality of recording units 42 that are storage destinations of data files. The medium information storage unit 124 stores information acquired by the medium information acquisition unit 122.

性能情報とは、たとえば、記録部４２の空き容量、アクセス性能、アクセスＣＰＵの性能、利用者とのネットワーク的な設置場所などである。これらの性能情報は、たとえば、記録部４２の空き容量であれば、記録部４２を使用してデータ処理を行うデータ処理装置にネットワークを介して問い合わせることにより取得することができる。また、アクセス性能は、実際に記録部４２にネットワーク１８を介してアクセスし、その応答時間を計測し、取得することができる。また、利用者とのネットワーク的な設置場所は、利用者のクライアント装置１０からネットワーク１８を介して問い合わせ信号を送信し、その応答時間を計測することにより取得することができる。格納先選択部１２８は、これらの性能情報が所定の条件を満たしている記録部４２を格納先として選択する。 The performance information is, for example, the free capacity of the recording unit 42, the access performance, the performance of the access CPU, the network installation location with the user, and the like. These pieces of performance information can be acquired, for example, by making an inquiry via a network to a data processing device that performs data processing using the recording unit 42 if the recording unit 42 has free space. Further, the access performance can be acquired by actually accessing the recording unit 42 via the network 18 and measuring the response time. The network installation location with the user can be acquired by transmitting an inquiry signal from the client apparatus 10 of the user via the network 18 and measuring the response time. The storage destination selection unit 128 selects the recording unit 42 whose performance information satisfies a predetermined condition as the storage destination.

また、特性情報とは、記録部４２の信頼性、そのファイルシステムのＭＴＢＦ、ＭＴＴＲ、ＵＰＳの有無などである。さらに、地理的、ネットワーク的な設置距離間隔、設置場所の安全性、たとえば、設置状況、構造、災害頻度、物理的な安全性、たとえば、警備状況、ストレージの容量当たりのコストパフォーマンスなどを含むことができる。 The characteristic information includes the reliability of the recording unit 42 and the presence / absence of MTBF, MTTR, UPS of the file system. In addition, geographical and network installation distance intervals, installation site safety, such as installation status, structure, disaster frequency, physical safety, eg security status, cost performance per storage capacity, etc. Can do.

たとえば、地理的な設置距離間隔が所定距離以上のノードや、安全性の高い設置場所のもの、構造上の強度が高いもの、コストが安い、利用者とのネットワーク的な設置場所が便利などを条件とすることができる。これらの情報は、複数の記録部４２毎に、各属性についてランク付けすることもできる。格納先選択部１２８は、このランク付けを利用して格納先を選択することもできる。また、格納先選択部１２８が選択に使用する条件を、ユーザや管理者から指定することもできる。 For example, a node with a geographical distance of more than a predetermined distance, a highly secure installation location, a high structural strength, a low cost, a convenient network installation location with users, etc. It can be a condition. These pieces of information can also be ranked for each attribute for each of the plurality of recording units 42. The storage destination selection unit 128 can also select a storage destination using this ranking. In addition, a condition used for selection by the storage destination selection unit 128 can be designated by a user or an administrator.

格納先選択条件記憶部１２６は、所定の格納先選択条件を記憶する。性能情報に対する条件は、たとえば、複数の条件を組み合わせたり、優先順位を設けたりすることもできる。さらに、上述のデータファイルの属性を条件として、格納先を決定することができる。なお、性能情報は、複製する都度取得して使用してもよいし、定期的または随時取得して記憶しておいた情報を使用してもよい。 The storage location selection condition storage unit 126 stores predetermined storage location selection conditions. The condition for the performance information can be a combination of a plurality of conditions or a priority order, for example. Furthermore, the storage destination can be determined on the condition of the attribute of the data file described above. The performance information may be acquired and used every time it is copied, or information acquired and stored periodically or as needed may be used.

また、特性情報に対する条件も、複数の条件を組み合わせたり、優先順位を設けたりすることもできる。さらに、上述のデータファイルの属性を条件として、格納先を決定することができる。なお、特性情報は、複製する都度取得して使用してもよいし、定期的または随時取得して記憶しておいた情報を使用してもよい。 Also, the condition for the characteristic information can be a combination of a plurality of conditions or can be given a priority order. Furthermore, the storage destination can be determined on the condition of the attribute of the data file described above. The characteristic information may be acquired and used every time it is copied, or information acquired and stored periodically or as needed may be used.

格納先選択部１２８は、媒体情報記憶部１２４に記憶されている情報に基づいて、格納先選択条件記憶部１２６に記憶されている所定の格納先選択条件を満たしている格納先の記録部４２を選択する。 The storage destination selection unit 128 is based on the information stored in the medium information storage unit 124, and the storage destination recording unit 42 that satisfies a predetermined storage destination selection condition stored in the storage destination selection condition storage unit 126. Select.

図４は、本実施形態のデータ処理システム１００において、複製するデータファイルや、その格納先を指定する構成を示す機能ブロック図である。 FIG. 4 is a functional block diagram showing a configuration for designating a data file to be copied and its storage destination in the data processing system 100 of the present embodiment.

サーバ装置１２は、表示部１３０と、操作部１３２と、提示部１３４と、指示受付部１３６と、をさらに含む。提示部１３４は、ファイル情報記憶部１１０に記憶されているデータファイルの属性情報や、媒体情報記憶部１２４に記憶されている性能情報および／または特性情報を提示する。これらの情報は、表示部１３０の画面上に表示することができる。あるいは、図１のアクセス部２０を介して、ネットワーク１８上のクライアント装置１０の表示部に表示することができる。提示部１３４は、データ処理システム１００の管理者が提供するポータルサイトによって実現することができる。 Server device 12 further includes a display unit 130, an operation unit 132, a presentation unit 134, and an instruction receiving unit 136. The presentation unit 134 presents the attribute information of the data file stored in the file information storage unit 110 and the performance information and / or characteristic information stored in the medium information storage unit 124. Such information can be displayed on the screen of the display unit 130. Alternatively, it can be displayed on the display unit of the client device 10 on the network 18 via the access unit 20 of FIG. The presentation unit 134 can be realized by a portal site provided by an administrator of the data processing system 100.

指示受付部１３６は、操作部１３２およびアクセス部２０を介してネットワーク１８上のクライアント装置１０の操作部からの指示を受け付け、制御部１０４に通知する。指示受付部１３６が受け付ける指示は、複製元のデータファイルの格納先、複製ファイルの複製数、複製ファイルの格納先などである。あるいは、複製ファイルを格納する格納先を決定する際に使用する格納先選択条件を指定することもできる。 The instruction receiving unit 136 receives an instruction from the operation unit of the client device 10 on the network 18 via the operation unit 132 and the access unit 20 and notifies the control unit 104 of the instruction. The instructions received by the instruction receiving unit 136 are the storage location of the replication source data file, the number of replications of the replication file, the storage location of the replication file, and the like. Alternatively, it is possible to specify a storage location selection condition used when determining a storage location for storing a duplicate file.

すなわち、サーバ装置１２の管理者やクライアント装置１０のユーザは、提示部１３４によって提示された情報を参照し、データファイルの属性情報や、格納先の記録部４２の性能や特性に関する情報に基づいて判断して、複製数や格納先を決定し、操作部を介して指示を行う。指示を指示受付部１３６が受け付け、制御部１０４に通知し、データファイルの複製および格納処理が行われることとなる。 That is, the administrator of the server device 12 and the user of the client device 10 refer to the information presented by the presentation unit 134 and based on the attribute information of the data file and the information on the performance and characteristics of the storage unit recording unit 42. Judgment is made, the number of copies and the storage destination are determined, and an instruction is given via the operation unit. The instruction receiving unit 136 receives the instruction, notifies the control unit 104, and the data file copying and storing process is performed.

このように構成されたデータ処理システム１００の動作について、以下に説明する。図５は、本実施形態のデータ処理システム１００の動作の一例を示すフローチャートである。以下、図１乃至図５を用いて説明する。 The operation of the data processing system 100 configured as described above will be described below. FIG. 5 is a flowchart showing an example of the operation of the data processing system 100 of the present embodiment. Hereinafter, description will be made with reference to FIGS.

制御部１０４が、トリガ条件記憶部１０２に記憶されているトリガ条件に従って、複製タイミングであるか否かを判定する（ステップＳ１１）。複製タイミングであると判定された場合（ステップＳ１１のＹＥＳ）、制御部１０４は、複製部１０６に通知し、複製部１０６からファイル情報取得部１０８に対してファイル情報の取得を指示する。なお、制御部１０４は、図４の指示受付部１３６が受け付けた指示に基づいて、複製部１０６に複製数および格納先の指示を通知することもできる。 The control unit 104 determines whether it is the duplication timing according to the trigger condition stored in the trigger condition storage unit 102 (step S11). If it is determined that it is the duplication timing (YES in step S11), the control unit 104 notifies the duplication unit 106, and instructs the file information acquisition unit 108 to obtain file information from the duplication unit 106. Note that the control unit 104 can also notify the duplication unit 106 of the number of duplications and a storage location instruction based on the instruction received by the instruction reception unit 136 of FIG. 4.

ファイル情報取得部１０８は、複製部１０６から複製元データファイルの格納先を取得し、アクセス部２０を介して格納先にアクセスし、複製元データファイルに関する情報を取得する（ステップＳ１３）。取得した情報はファイル情報記憶部１１０に記憶される。つづいて、ファイル情報取得部１０８からの通知を受けて、複製数決定部１１４がファイル情報記憶部１１０に記憶されている複製元のデータファイルの属性情報に基づいて、数決定条件記憶部１１２に記憶されている複製数決定条件に従って、複製数を決定する（ステップＳ１５）。決定した複製数は、複製部１０６に通知される。 The file information acquisition unit 108 acquires the storage destination of the replication source data file from the replication unit 106, accesses the storage destination via the access unit 20, and acquires information regarding the replication source data file (step S13). The acquired information is stored in the file information storage unit 110. Subsequently, upon receiving a notification from the file information acquisition unit 108, the copy number determination unit 114 stores the copy source data file attribute information stored in the file information storage unit 110 on the number determination condition storage unit 112. The number of copies is determined according to the stored copy number determination conditions (step S15). The determined number of copies is notified to the copy unit 106.

なお、ファイル情報取得部１０８によるファイル情報の取得は、必ずしも行う必要はなく、ファイル情報記憶部１１０に既に記憶されている情報を用いてもよい。 The file information acquisition by the file information acquisition unit 108 is not necessarily performed, and information already stored in the file information storage unit 110 may be used.

つづいて、複製部１０６からの指示を受けて、格納部１２０が媒体情報取得部１２２に媒体情報の取得を指示する。媒体情報取得部１２２は、データファイルの格納先となる複数の記録部４２の性能または特性に関する情報をアクセス部２０を介して取得する（ステップＳ１７）。あるいは、図示されない登録部から取得する。媒体情報取得部１２２が取得した媒体情報は、媒体情報記憶部１２４に記憶される。格納先選択部１２８は、格納先選択条件記憶部１２６からの通知を受けて、格納先選択条件記憶部１２６にアクセスし、所定の格納先選択条件を満たしている格納先の記録部４２を選択する（ステップＳ１９）。 Subsequently, in response to an instruction from the duplication unit 106, the storage unit 120 instructs the medium information acquisition unit 122 to acquire medium information. The medium information acquisition unit 122 acquires information on the performance or characteristics of the plurality of recording units 42 serving as storage destinations of data files via the access unit 20 (step S17). Or it acquires from the registration part which is not illustrated. The medium information acquired by the medium information acquisition unit 122 is stored in the medium information storage unit 124. The storage destination selection unit 128 receives the notification from the storage destination selection condition storage unit 126, accesses the storage destination selection condition storage unit 126, and selects the storage unit recording unit 42 that satisfies the predetermined storage destination selection condition. (Step S19).

つづいて、複製部１０６は、複製数決定部１１４が決定した複製数の複製ファイルを生成し（ステップＳ２１）、格納部１２０が、格納先選択部１２８により選択された格納先に複製ファイルを格納する（ステップＳ２３）。 Subsequently, the duplication unit 106 generates a duplication file having the duplication number determined by the duplication number determination unit 114 (step S21), and the storage unit 120 stores the duplication file in the storage destination selected by the storage destination selection unit 128. (Step S23).

つづいて、複製部１０６および格納部１２０からの通知を受けて、管理部２４が複製ファイルの格納先のパス情報を取得する（ステップＳ２５）。つづいて、生成部２６が、パス情報を含む管理用ファイルを生成し、記録部４２で記録されている複製ファイルのパス情報を含んだ管理用ファイルを生成し、管理部２４が、管理用ファイルを記録部１４の所定のディレクトリに配置させ、さらに、複数のディレクトリの管理すべき階層構造を形成する複数のディレクトリの情報を管理用ファイルとともに記録部１４に記録する（ステップＳ２７）。 Subsequently, upon receiving notifications from the duplication unit 106 and the storage unit 120, the management unit 24 acquires path information of the copy file storage destination (step S25). Subsequently, the generation unit 26 generates a management file including path information, generates a management file including the path information of the duplicate file recorded by the recording unit 42, and the management unit 24 manages the management file. Are recorded in a predetermined directory of the recording unit 14, and information on a plurality of directories forming a hierarchical structure to be managed by the plurality of directories is recorded on the recording unit 14 together with a management file (step S27).

このように、ネットワーク１８上の複数のノードの記録部４２にデータファイルの複製を所定の条件に従って分散させて格納することができ、管理用ファイルを更新することができるので、複製ファイルも同様にサーバ装置１２によって管理することが可能となる。このような複製ファイルの作成は、以下に説明するように、データファイルのバックアップの作成や、信頼性の向上、アクセス性能の向上などのために行われる。 In this manner, data file replicas can be distributed and stored in the recording units 42 of a plurality of nodes on the network 18 in accordance with predetermined conditions, and the management file can be updated. It can be managed by the server device 12. Such a duplicate file is created in order to create a backup of a data file, improve reliability, improve access performance, and the like, as described below.

以下に、本実施形態のデータ処理システム１００において、データファイルのバックアップを作成する場合の動作の実施例について説明する。 Hereinafter, an example of an operation when creating a backup of a data file in the data processing system 100 of the present embodiment will be described.

たとえば、クライアント装置１０のユーザまたはサーバ装置１２の管理者の指示により、所定のデータファイルの複製を作成する指示を指示受付部１３６が受け付ける。ここで、指示受付部１３６は、複製元のデータファイルの格納先と、複製数と、を受け付ける。ここでは、指示受付部１３６は、複製ファイルの格納先も受け付ける。なお、複製ファイルの格納先は、格納先選択部１２８に自動的に最適な格納先を選択させることにより決定することもできる。複製部１０６は、指示受付部１３６が受け付けた複製数でデータファイルを複製し、指示受付部１３６が受け付けた格納先に格納部１２０が格納する。そして生成部２６が複製ファイルの管理用ファイルを生成し、管理部２４が管理する。 For example, the instruction receiving unit 136 receives an instruction to create a copy of a predetermined data file according to an instruction from the user of the client device 10 or the administrator of the server device 12. Here, the instruction receiving unit 136 receives the storage destination of the copy source data file and the number of copies. Here, the instruction receiving unit 136 also receives the storage location of the duplicate file. The storage location of the duplicate file can also be determined by causing the storage location selection unit 128 to automatically select the optimal storage location. The duplicating unit 106 duplicates the data file with the number of duplications received by the instruction receiving unit 136, and the storage unit 120 stores the data file in the storage destination received by the instruction receiving unit 136. Then, the generation unit 26 generates a management file for the duplicate file, and the management unit 24 manages it.

他の実施例において、指示受付部１３６は、複製元のデータファイルを受け付け、複製数決定部１１４がそのデータファイルの属性情報に基づいて、そのファイルサイズが所定値以上の場合は複製数を２つに決定し、所定値未満の場合は複製数を３つに決定する。なお、ファイルサイズは、複製時にデータファイルが記録されている記録部４２を備えた処理装置１６に問い合わせて取得することもできる。 In another embodiment, the instruction receiving unit 136 receives a copy source data file, and the copy number determination unit 114 sets the copy number to 2 when the file size is a predetermined value or more based on the attribute information of the data file. If the number is less than the predetermined value, the number of copies is determined to be three. The file size can also be obtained by inquiring the processing device 16 including the recording unit 42 in which the data file is recorded at the time of duplication.

さらに他の実施例において、複製ファイルの格納先のノード、ドメイン名、ネットワークアドレス、複製先ノードの属性、状態などの指定を指示受付部１３６が受け付け、格納先選択部１２８が受け付けた指定に従って、格納先を選択することもできる。すなわち、格納先選択部１２８は、格納先選択条件記憶部１２６の中から指示受付部１３６が受け付けた指定に該当する格納先選択条件に従って、格納先を選択する。 In yet another embodiment, the designation receiving unit 136 accepts designation of the storage destination node, domain name, network address, replication destination node attribute, status, etc. of the duplicate file, A storage location can also be selected. That is, the storage location selection unit 128 selects a storage location from the storage location selection condition storage unit 126 according to the storage location selection condition corresponding to the designation received by the instruction reception unit 136.

また、他の実施例において、格納先選択部１２８は、格納先選択条件記憶部１２６に記憶されている格納先選択条件に従って、自動的に格納先を選択することができる。媒体情報記憶部１２４に記憶されている記録部４２の特性および性能に関する情報を利用して、格納先選択条件記憶部１２６に記憶されている条件に従って決定する。ここで、媒体情報記憶部１２４に記憶されている情報は、たとえば、複数のノードを条件に従ってランク付けして記憶することもできる。 In another embodiment, the storage destination selection unit 128 can automatically select a storage destination according to the storage destination selection conditions stored in the storage destination selection condition storage unit 126. The information on the characteristics and performance of the recording unit 42 stored in the medium information storage unit 124 is used to make a determination according to the conditions stored in the storage destination selection condition storage unit 126. Here, the information stored in the medium information storage unit 124 can be stored by ranking a plurality of nodes according to conditions, for example.

次に、本実施形態のデータ処理システム１００において、リソースの信頼性を向上させるために複製を作成する場合の実施例を説明する。 Next, in the data processing system 100 of this embodiment, an example in the case where a replica is created in order to improve resource reliability will be described.

たとえば、指示受付部１３６が複製元のデータファイルを受け付けた時、複製数決定部１１４が格納先のファイルシステムのＭＴＢＦなどに応じて、複製ファイルの複製数を決定する。すなわち、複製数決定部１１４は、該当するファイルシステムのＭＴＢＦが所定値以上の場合は複製数を２つに決定し、所定値未満の場合は複製数を３つに決定する。これにより、ＭＴＢＦが所定値未満の場合は、複製数を増やすことにより、データファイルの消失などの危険性を低減することができる。 For example, when the instruction accepting unit 136 accepts a replication source data file, the replication number determination unit 114 determines the replication number of the replication file according to the MTBF of the storage destination file system. That is, the copy number determination unit 114 determines the copy number to be 2 when the MTBF of the corresponding file system is equal to or greater than a predetermined value, and determines the copy number to be 3 when the MTBF is less than the predetermined value. Thereby, when MTBF is less than a predetermined value, the risk of data file loss and the like can be reduced by increasing the number of replicas.

他の実施例において、制御部１０４は、複製ファイルに対して定期的にアクセス可能か否かをチェックし、アクセスできなかった場合に、他のノードへの新規複製ファイル生成を指示する。これにより、バックアップファイルの破損やアクセス不能状態などを事前に検知して、対策を講じることができる。 In another embodiment, the control unit 104 periodically checks whether or not a duplicate file can be accessed, and if the access cannot be made, instructs the new node to generate a new duplicate file. As a result, it is possible to detect in advance a damaged backup file or an inaccessible state and take measures.

さらに、本実施形態のデータ処理システム１００において、アクセス性能を向上させるために複製ファイルを作成する場合の動作の実施例を説明する。 Furthermore, an example of the operation when creating a duplicate file in order to improve access performance in the data processing system 100 of the present embodiment will be described.

たとえば、複製元のデータファイルのアクセス頻度、ファイルサイズなどのファイル属性のランクに基づいて、複製数決定部１１４が複製数を決定する。たとえば、複製数決定部１１４は、アクセス頻度が所定ランク以上の場合、複製数を３とし、所定ランク未満の場合は複製数を２に決定する。これにより、アクセス頻度が高いデータファイルの複製数を増やすことができるので、ファイルへのアクセスの集中を分散させることができる。これにより、アクセス頻度が一定の記録媒体やデータファイルに集中しないので、処理速度の向上や、記録媒体の寿命を延ばすことが可能となる。 For example, the copy number determination unit 114 determines the copy number based on the rank of file attributes such as the access frequency and file size of the copy source data file. For example, the copy number determination unit 114 sets the copy number to 3 when the access frequency is equal to or higher than a predetermined rank, and determines the copy number to be 2 when the access frequency is less than the predetermined rank. As a result, the number of copies of data files with high access frequency can be increased, so that the concentration of access to the files can be distributed. As a result, since the access frequency is not concentrated on a recording medium or data file, the processing speed can be improved and the life of the recording medium can be extended.

他の実施例において、トリガ条件記憶部１０２が定期的またはアクセス頻度が閾値を超えたとき、複製指示を行ってもよい。さらに、複製ファイルの格納先は、格納先選択部１２８が自動的に選択することができる。このようにして生成された複製ファイルが格納先に格納されると、管理部２４は、複製ファイルの管理用ファイルを生成部２６に生成させ、管理する。 In another embodiment, the trigger condition storage unit 102 may issue a duplication instruction periodically or when the access frequency exceeds a threshold value. Further, the storage destination selection unit 128 can automatically select the storage location of the duplicate file. When the duplicate file generated in this way is stored in the storage destination, the management unit 24 causes the generation unit 26 to generate and manage a management file for the duplicate file.

また、他の実施例において、媒体情報取得部１２２がネットワーク１８を介して各ノードにアクセスし、その応答時間を計測し、格納先選択部１２８が、応答時間が閾値以下のノードを格納先として選択することができる。これにより、データファイルのアクセス時に、ユーザが通信トラフィックによるストレスをなるべく感じないで済むように、通信効率のよいノードに複製を保持させることができる。 In another embodiment, the medium information acquisition unit 122 accesses each node via the network 18 to measure the response time, and the storage destination selection unit 128 sets the node whose response time is equal to or less than the threshold as the storage destination. You can choose. As a result, when accessing a data file, a copy can be held in a node with good communication efficiency so that the user does not feel as much stress due to communication traffic as possible.

以上説明したように、本発明の実施の形態のデータ処理システム１００によれば、ネットワーク上の複数の記録部４２にデータファイルの複製を所定の条件に従って分散させて格納することができるので、効率よく、資源の安全性および信頼性の確保およびアクセス性能の向上を図ることができる。 As described above, according to the data processing system 100 of the embodiment of the present invention, it is possible to distribute and store a copy of a data file in a plurality of recording units 42 on the network according to a predetermined condition. Well, it is possible to ensure the safety and reliability of resources and improve the access performance.

以上、図面を参照して本発明の実施形態について述べたが、これらは本発明の例示であり、上記以外の様々な構成を採用することもできる。 As mentioned above, although embodiment of this invention was described with reference to drawings, these are the illustrations of this invention, Various structures other than the above are also employable.

たとえば、制御部１０４がトリガ条件記憶部１０２に記憶されているトリガ条件を満たしていると判定した時、データファイルの複製の生成を促すメッセージをクライアント装置１０のユーザまたはサーバ装置１２の管理者に報知する報知部（不図示）を含むことができる。ここで、報知部は、データ処理システム１００のポータルサイト上にメッセージやマークを表示したり、メッセージを含むメールを予め登録されているユーザアカウントに送信したり、音声出力やＬＥＤやランプ表示などを含むことができる。ユーザまたは管理者は、通知を受けて、操作部１３２を使用して複製指示を行うことができ、指示受付部１３６が受け付けた複製指示に呼応して、複製部１０６がデータファイルを複製し、格納部１２０が複製ファイルを格納先に格納することができる。 For example, when the control unit 104 determines that the trigger condition stored in the trigger condition storage unit 102 is satisfied, a message prompting generation of a copy of the data file is sent to the user of the client device 10 or the administrator of the server device 12. A notification unit (not shown) for notification may be included. Here, the notification unit displays a message or mark on the portal site of the data processing system 100, transmits a mail including the message to a pre-registered user account, performs voice output, LED or lamp display, and the like. Can be included. Upon receiving the notification, the user or the administrator can issue a copy instruction using the operation unit 132. In response to the copy instruction received by the instruction receiving unit 136, the copy unit 106 copies the data file, The storage unit 120 can store the duplicate file in the storage destination.

本発明の実施の形態に係るデータ処理システムを示す機能ブロック図である。It is a functional block diagram which shows the data processing system which concerns on embodiment of this invention. 図１のサーバ装置におけるファイルシステムの構成を示す図である。It is a figure which shows the structure of the file system in the server apparatus of FIG. 本実施形態のデータ処理システムの要部構成であるバックアップ部を示す機能ブロック図である。It is a functional block diagram which shows the backup part which is the principal part structure of the data processing system of this embodiment. 本実施形態のデータ処理システムにおいて、複製するデータファイルや、その格納先を指定する構成を示す機能ブロック図である。In the data processing system of this embodiment, it is a functional block diagram which shows the structure which designates the data file to copy and its storage location. 本実施形態のデータ処理システムの動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the data processing system of this embodiment.

Explanation of symbols

１０クライアント装置
１２サーバ装置
１２当該サーバ装置
１４記録部
１６処理装置
１８ネットワーク
２０アクセス部
２２受付部
２４管理部
２６生成部
２８決定部
３０指示部
４０処理部
４２記録部
５０管理用ファイル
６０バックアップ部
１００データ処理システム
１０２トリガ条件記憶部
１０４制御部
１０６複製部
１０８ファイル情報取得部
１１０ファイル情報記憶部
１１２数決定条件記憶部
１１４複製数決定部
１２０格納部
１２２媒体情報取得部
１２４媒体情報記憶部
１２６格納先選択条件記憶部
１２８格納先選択部
１３０表示部
１３２操作部
１３４提示部
１３６指示受付部 DESCRIPTION OF SYMBOLS 10 Client apparatus 12 Server apparatus 12 The said server apparatus 14 Recording part 16 Processing apparatus 18 Network 20 Access part 22 Reception part 24 Management part 26 Generation part 28 Determination part 30 Instruction part 40 Processing part 42 Recording part 50 Management file 60 Backup part 100 Data processing system 102 Trigger condition storage unit 104 Control unit 106 Replication unit 108 File information acquisition unit 110 File information storage unit 112 Number determination condition storage unit 114 Replication number determination unit 120 Storage unit 122 Medium information acquisition unit 124 Medium information storage unit 126 Storage Destination selection condition storage unit 128 Storage destination selection unit 130 Display unit 132 Operation unit 134 Presentation unit 136 Instruction reception unit

Claims

An access unit for accessing a plurality of data files respectively recorded on a plurality of recording media included in a plurality of nodes via a network;
A generating unit that generates a plurality of management files each including path information for accessing the plurality of data files;
In order to manage the plurality of data files, the management file corresponding to each of the plurality of data files is arranged in a predetermined directory, and the plurality of directories including the directory are logically managed by a hierarchical structure. A management department to
A duplicating unit for creating a duplicating file of the data file;
A storage destination selection unit for selecting a storage destination node for storing the duplicate file according to a predetermined condition;
A storage unit for storing the duplicate file in the storage destination selected by the storage destination selection unit;
With
The generation unit, when the replication unit replicates the data file, and the storage unit stores the replication file in the storage destination, generates the management file including path information of the replication file,
The access unit accesses the data file recorded on the recording medium based on the path information included in the management file in the hierarchical structure including the plurality of directories. Data management device.

The data management device according to claim 1,
An attribute information acquisition unit for acquiring attribute information of the data file to be replicated by the replication unit;
A data management apparatus, further comprising: a copy number determination unit that determines the copy number of the copy file according to a predetermined copy number determination condition based on the attribute information.

In the data management device according to claim 1 or 2,
The storage destination selection unit
A performance information acquisition unit that inquires and acquires performance information indicating the performance of the recording medium of the plurality of nodes via the network;
A selection unit that selects the storage destination node that satisfies a predetermined storage condition based on the performance information;
A data management device comprising:

The data management device according to any one of claims 1 to 3,
A registration unit for registering characteristic information related to characteristics of the recording medium of the plurality of nodes;
The storage destination selection unit
A characteristic information acquisition unit that acquires the characteristic information by inquiring of the registration unit;
A selection unit for selecting the storage destination node that satisfies a predetermined storage condition based on the characteristic information;
A data management device comprising:

The data management device according to any one of claims 1 to 4,
A determination unit for determining whether or not a predetermined trigger condition is satisfied;
When it is determined that the trigger condition is satisfied, a controller that causes the replica to generate a copy of the data file;
A data management device comprising:

The data management device according to any one of claims 1 to 5,
An attribute information presentation unit for presenting attribute information of the data file;
A file reception unit that accepts specification of a data file to be replicated;
A copy number receiving unit for receiving the number of copies of the data file received by the file receiving unit;
Including
The duplication unit generates duplicate files of the data file accepted by the file accepting unit for the number of duplications accepted by the duplication number accepting unit.

The data management device according to any one of claims 3 to 6,
An information presentation unit for presenting the performance information acquired by the performance information acquisition unit or the characteristic information acquired by the characteristic information acquisition unit;
A storage location receiving unit that receives the storage location of the duplicate file;
And the storage unit stores the duplicate file in the storage destination received by the storage destination reception unit.

The data management device according to any one of claims 5 to 7,
When the determination unit determines that the trigger condition is satisfied, a notification unit that notifies a user of a message that prompts generation of a copy of the data file;
An instruction receiving unit for receiving a copy instruction;
Including
The data management apparatus, wherein the duplication unit duplicates the data file in response to the duplication instruction, and the storage unit stores the data file in the storage destination.

Accessing a plurality of data files respectively recorded on a plurality of recording media included in a plurality of nodes via a network;
Generating a plurality of management files each including path information for accessing the plurality of data files;
In order to manage the plurality of data files, the management file corresponding to each of the plurality of data files is arranged in a predetermined directory, and the plurality of directories including the directory are logically managed by a hierarchical structure. And steps to
Generating a duplicate file of the data file;
Selecting a storage destination node for storing the duplicate file according to a predetermined condition;
Storing the duplicate file in the storage location selected in the selecting step;
Generating the management file including path information of the duplicate file when the data file is duplicated in the duplicating step and the duplicate file is stored in the storage destination in the storing step;
Accessing the data file recorded on the recording medium based on the path information included in the management file in the hierarchical structure comprising the plurality of directories. Data management method.

The data management method according to claim 9,
Determining whether a predetermined trigger condition is satisfied;
Generating a copy of the data file when it is determined that the trigger condition is satisfied;
A data management method comprising:

To a data management computer that accesses a plurality of data files respectively recorded on a plurality of recording media included in a plurality of nodes via a network,
Means for accessing the plurality of data files via the network;
Means for generating a plurality of management files each including path information for accessing the plurality of data files;
In order to manage the plurality of data files, the management file corresponding to each of the plurality of data files is arranged in a predetermined directory, and the plurality of directories including the directory are logically managed by a hierarchical structure. Means to
Means for generating a duplicate file of the data file;
Means for selecting a storage destination node for storing the duplicate file according to a predetermined condition;
Means for storing the duplicate file in the storage location selected by the means for selecting;
Means for generating the management file including path information of the duplicate file when the means for duplicating duplicates the data file and the means for storing stores the duplicate file in the storage destination;
Means for accessing the data file recorded on the recording medium based on the path information included in the management file in the hierarchical structure comprising the plurality of directories;
A program characterized by functioning as