JPWO2012117658A1

JPWO2012117658A1 - Storage system

Info

Publication number: JPWO2012117658A1
Application number: JP2013502163A
Authority: JP
Inventors: 悠永田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2011-02-28
Filing date: 2012-01-19
Publication date: 2014-07-07
Anticipated expiration: 2032-01-19
Also published as: WO2012117658A1; JP5561425B2

Abstract

本発明のストレージシステムは、複数のストレージ装置に対してデータを分散して記憶すると共に重複排除処理を行う複数の記憶処理装置と、一群のデータからなるデータフローをいずれかの記憶処理装置に割り当てて当該割り当てた記憶処理装置にデータフローが流れるよう設定するスイッチ部と、データフローの予め定められた特性を当該データフロー毎に検出するフロー特性検出部と、記憶処理装置の予め定められた特性を当該記憶処理装置毎に検出する装置特性検出部と、を備え、スイッチ部は、フロー特性検出部にて検出したデータフローの特性と、装置特性検出部にて検出した記憶処理装置の特性と、に基づいて、データフローを割り当てる記憶処理装置を決定する、という構成を取る。The storage system according to the present invention distributes data to a plurality of storage devices and stores a plurality of storage processing devices for performing deduplication processing, and assigns a data flow consisting of a group of data to any one of the storage processing devices. A switch unit that sets the data flow to flow to the allocated storage processing device, a flow characteristic detection unit that detects a predetermined characteristic of the data flow for each data flow, and a predetermined characteristic of the storage processing device A device characteristic detection unit that detects each storage processing device, and the switch unit includes a data flow characteristic detected by the flow characteristic detection unit, and a storage processing device characteristic detected by the device characteristic detection unit. Based on the above, the storage processing device to which the data flow is assigned is determined.

Description

本発明は、ストレージシステムにかかり、特に、複数のストレージ装置を備えたストレージシステムに関する。 The present invention relates to a storage system, and particularly relates to a storage system including a plurality of storage devices.

近年、コンピュータの発達及び普及に伴い、種々の情報がデジタルデータ化されている。このようなデジタルデータを保存しておく装置として、磁気テープや磁気ディスクなどの記憶装置がある。そして、保存すべきデータは日々増大し、膨大な量となるため、大容量なストレージシステムが必要となっている。また、記憶装置に費やすコストを削減しつつ、信頼性も必要とされる。これに加えて、後にデータを容易に取り出すことが可能であることも必要である。その結果、自動的に記憶容量や性能の増大を実現できると共に、重複記憶を排除して記憶コストを削減し、さらには、冗長性の高いストレージシステムが望まれている。 In recent years, with the development and spread of computers, various types of information have been converted into digital data. As a device for storing such digital data, there are storage devices such as a magnetic tape and a magnetic disk. Since the data to be stored increases day by day and becomes enormous, a large-capacity storage system is required. In addition, reliability is required while reducing the cost of the storage device. In addition to this, it is necessary that data can be easily retrieved later. As a result, there is a demand for a storage system that can automatically increase storage capacity and performance, eliminate duplicate storage, reduce storage costs, and have high redundancy.

このような状況に応じて、近年では、コンテンツアドレスストレージシステムが開発されている。このコンテンツアドレスストレージシステムは、データを分散して複数の記憶装置に記憶すると共に、このデータの内容に応じて特定される固有のコンテンツアドレスによって、当該データを格納した格納位置が特定される。具体的に、コンテンツアドレスストレージシステムでは、所定のデータを複数のフラグメントに分割すると共に、冗長データとなるフラグメントをさらに付加して、これら複数のフラグメントをそれぞれ複数の記憶装置にそれぞれ格納している。 In response to this situation, in recent years, content address storage systems have been developed. In this content address storage system, data is distributed and stored in a plurality of storage devices, and the storage location where the data is stored is specified by a unique content address specified according to the content of the data. Specifically, in the content address storage system, predetermined data is divided into a plurality of fragments, and a fragment that becomes redundant data is further added, and the plurality of fragments are respectively stored in a plurality of storage devices.

そして、後に、コンテンツアドレスを指定することにより、当該コンテンツアドレスにて特定される格納位置に格納されているデータつまりフラグメントを読み出し、複数のフラグメントから分割前の所定のデータを復元することができる。 Then, by designating the content address later, it is possible to read out the data stored at the storage location specified by the content address, that is, the fragment, and restore the predetermined data before the division from the plurality of fragments.

また、上記コンテンツアドレスは、データの内容に応じて固有となるよう生成される、例えば、データのハッシュ値を用いる。このため、重複データであれば同じ格納位置のデータを参照することで、同一内容のデータを取得することができる。従って、重複データを別々に格納する必要がなく、重複記録を排除し、データ容量の削減を図ることができる。 The content address is generated to be unique according to the content of data, for example, a hash value of data is used. For this reason, if it is duplicate data, the data of the same content can be acquired by referring to the data at the same storage position. Therefore, it is not necessary to store the duplicate data separately, and duplicate recording can be eliminated and the data capacity can be reduced.

ここで、上述したコンテンツアドレスストレージシステムのように、大容量のデータを格納するストレージシステムでは、複数の情報処理装置を備えている。このように複数の情報処理装置を備えているシステムでは、情報処理装置間の負荷分散が必要となる。そして、一般的な負荷分散技術としては、ラウンドロビンによる方法がある。また、負荷分散を行うシステムの一例が、特許文献１に開示されている。 Here, a storage system that stores a large amount of data, such as the content address storage system described above, includes a plurality of information processing apparatuses. As described above, in a system including a plurality of information processing apparatuses, load distribution among the information processing apparatuses is required. As a general load balancing technique, there is a round robin method. An example of a system that performs load distribution is disclosed in Patent Document 1.

この特許文献１に開示のストレージシステムでは、同一内容の主データと副データとが記憶されており、これら主データと副データのストレージ装置に対する配置状況を管理する管理情報を記憶する。また、各ストレージ装置の最新の負荷情報を継続的に収集する。負荷情報は、例えば、ＣＰＵの負荷、受け付けたアクセス要求の数、ネットワーク使用率などを収集する。そして、管理情報と収集した負荷情報とに基づいて、一組のデータの間で主データと副データの役割を変更している。つまり、主データと副データとに対するアクセス先となるストレージ装置を変更することで、データを移動することなく、データを記憶するストレージ装置の負荷分散を行っている。 In the storage system disclosed in Patent Document 1, main data and sub data having the same contents are stored, and management information for managing the arrangement status of the main data and sub data in the storage device is stored. In addition, the latest load information of each storage device is continuously collected. As the load information, for example, the CPU load, the number of received access requests, the network usage rate, and the like are collected. Based on the management information and the collected load information, the roles of the main data and the sub data are changed between a set of data. That is, by changing the storage device that is the access destination for the main data and the sub data, the load distribution of the storage device that stores the data is performed without moving the data.

特表２００８−１３６０７５号公報Special table 2008-136075 gazette

しかしながら、ラウンドロビンによる負荷分散方法や、上記特許文献１に開示された負荷分散方法では、効率的な負荷分散ができない、という問題が生じる。これは、重複排除を行うストレージシステムでは、格納されるデータの特性によって、データ格納処理に必要とされる性能や機能が異なるためである。例えば、重複するデータを格納する場合には、実際にはデータを格納しないためデータ圧縮処理を行う必要がなく、かかる点で性能の向上を図ることができる。一方で、バックアップソフトウェアなどによって既に圧縮／暗号化されたデータを書き込む場合には、重複排除や圧縮処理が効果的に機能しにくく性能が低下することとなる。さらには、重複率を向上させるためにバックアップソフトウェアが付加するマーカ情報を分離する処理が必要となる場合があるが、かかる処理を行う機能を装備していない場合には、重複排除効率が低下しうる。 However, the load distribution method using round robin and the load distribution method disclosed in Patent Document 1 have a problem that efficient load distribution cannot be performed. This is because in a storage system that performs deduplication, performance and functions required for data storage processing differ depending on the characteristics of stored data. For example, when overlapping data is stored, data is not actually stored, so there is no need to perform data compression processing, and performance can be improved in this respect. On the other hand, when data that has already been compressed / encrypted by backup software or the like is written, deduplication or compression processing is difficult to function effectively and performance is degraded. Furthermore, in order to improve the duplication rate, it may be necessary to separate the marker information added by the backup software. However, if there is no function to perform such processing, the deduplication efficiency will decrease. sell.

また、複数の情報処理装置にてデータ格納処理を行うため、各情報処理装置の性能や機能が異なる場合にも、効率のよい負荷分散が困難となる。例えば、圧縮処理やHash計算などをＣＰＵの代わりに処理する拡張カードや、小Ｉ／Ｏを高速に処理できるＳＳＤが搭載された拡張カードといったものもあるが、これら拡張カードは高価であり、複数の情報処理装置に搭載することは高コストとなる。また、１つの情報処理装置に搭載できる拡張カード数は、装置の最大スロット数によって制限されてしまい、多様なカードを１つの装置に搭載することはできない。 Further, since data storage processing is performed by a plurality of information processing apparatuses, efficient load distribution becomes difficult even when the performance and functions of the information processing apparatuses are different. For example, there are an expansion card that processes compression processing and Hash calculation in place of the CPU, and an expansion card equipped with an SSD that can process small I / O at high speed. However, these expansion cards are expensive. It is expensive to install in this information processing apparatus. Further, the number of expansion cards that can be mounted on one information processing apparatus is limited by the maximum number of slots of the apparatus, and various cards cannot be mounted on one apparatus.

さらに、長期使用を考慮したバックアップ装置としてのストレージ装置を考えると、複数世代のストレージ装置の混在が生じる。すると、旧世代のストレージ装置に搭載のＣＰＵやメモリなどの性能は、新世代のストレージ装置よりも劣るため、システム全体として性能が低くなる。 Furthermore, when considering a storage device as a backup device in consideration of long-term use, a mixture of multiple generation storage devices occurs. Then, the performance of the CPU, memory, etc. mounted on the old generation storage device is inferior to that of the new generation storage device, so the performance of the entire system is lowered.

このため、本発明の目的は、上述した課題である、ストレージシステムにおける効率的な負荷分散を図ること、にある。 Therefore, an object of the present invention is to achieve efficient load distribution in the storage system, which is the problem described above.

上記目的を達成すべく、本発明の一形態であるストレージシステムは、
複数のストレージ装置と、
前記複数のストレージ装置に対してデータを分散して記憶すると共に、当該ストレージ装置に既に記憶されているデータと同一内容の他のデータを格納する場合に、当該ストレージ装置に既に記憶されているデータを前記他のデータとして参照させる重複排除処理を行う複数の記憶処理装置と、
一群のデータからなるデータフローをいずれかの前記記憶処理装置に割り当てて当該割り当てた前記記憶処理装置に前記データフローが流れるよう設定するスイッチ部と、
前記データフローの予め定められた特性を当該データフロー毎に検出するフロー特性検出部と、
前記記憶処理装置の予め定められた特性を当該記憶処理装置毎に検出する装置特性検出部と、を備え、
前記スイッチ部は、前記フロー特性検出部にて検出した前記データフローの特性と、前記装置特性検出部にて検出した前記記憶処理装置の特性と、に基づいて、前記データフローを割り当てる前記記憶処理装置を決定する、
という構成を取る。In order to achieve the above object, a storage system according to one aspect of the present invention provides:
Multiple storage devices,
Data that is already stored in the storage device when data is distributed and stored in the plurality of storage devices and other data having the same content as the data that is already stored in the storage device is stored. A plurality of storage processing devices for performing deduplication processing to refer to as other data,
A switch unit that assigns a data flow composed of a group of data to any one of the storage processing devices and sets the data flow to flow to the assigned storage processing device;
A flow characteristic detection unit for detecting a predetermined characteristic of the data flow for each data flow;
A device characteristic detection unit for detecting a predetermined characteristic of the storage processing device for each storage processing device;
The storage unit assigns the data flow based on the characteristics of the data flow detected by the flow characteristic detector and the characteristics of the storage processing device detected by the apparatus characteristic detector. Determine the equipment,
Take the configuration.

また、本発明の他の形態であるスイッチ制御装置は、
複数のストレージ装置に対してデータを分散して記憶すると共に、当該ストレージ装置に既に記憶されているデータと同一内容の他のデータを格納する場合に、当該ストレージ装置に既に記憶されているデータを前記他のデータとして参照させる重複排除処理を行う複数の記憶処理装置のうち、いずれかの前記記憶処理装置に一群のデータからなるデータフローが流れるよう設定するスイッチ部に接続されたスイッチ制御装置であって、
前記データフロー毎に検出された当該データフローの予め定められた特性と、前記記憶処理装置毎に検出された当該記憶処理装置の予め定められた特性と、に基づいて、前記データフローに前記記憶処理装置を割り当てて当該割り当てた前記記憶処理装置に前記データフローが流れるよう前記スイッチ部を設定するフロー設定部を備えた、
という構成を取る。Moreover, the switch control apparatus which is the other form of this invention is the following.
When distributing data to a plurality of storage devices and storing other data having the same contents as the data already stored in the storage device, the data already stored in the storage device is stored. A switch control device connected to a switch unit that sets a data flow consisting of a group of data to flow through any one of the plurality of storage processing devices that perform deduplication processing to be referred to as the other data. There,
Based on a predetermined characteristic of the data flow detected for each of the data flows and a predetermined characteristic of the storage processing device detected for each of the storage processing devices, the data flow is stored in the data flow. A flow setting unit that assigns a processing device and sets the switch unit so that the data flow flows to the allocated storage processing device;
Take the configuration.

また、本発明の他の形態であるプログラムは、
複数のストレージ装置に対してデータを分散して記憶すると共に、当該ストレージ装置に既に記憶されているデータと同一内容の他のデータを格納する場合に、当該ストレージ装置に既に記憶されているデータを前記他のデータとして参照させる重複排除処理を行う複数の記憶処理装置のうち、いずれかの前記記憶処理装置に一群のデータからなるデータフローが流れるよう設定するスイッチ部に接続されたスイッチ制御装置に、
前記データフロー毎に検出された当該データフローの予め定められた特性と、前記記憶処理装置毎に検出された当該記憶処理装置の予め定められた特性と、に基づいて、前記データフローに前記記憶処理装置を割り当てて当該割り当てた前記記憶処理装置に前記データフローが流れるよう前記スイッチ部を設定するフロー設定部、
を実現させるためのプログラムである。
という構成を取る。Moreover, the program which is the other form of this invention is:
When distributing data to a plurality of storage devices and storing other data having the same contents as the data already stored in the storage device, the data already stored in the storage device is stored. A switch control device connected to a switch unit configured to set a data flow consisting of a group of data to flow through one of the storage processing devices among a plurality of storage processing devices that perform deduplication processing referred to as the other data. ,
Based on a predetermined characteristic of the data flow detected for each of the data flows and a predetermined characteristic of the storage processing device detected for each of the storage processing devices, the data flow is stored in the data flow. A flow setting unit for allocating a processing device and setting the switch unit so that the data flow flows to the allocated storage processing device;
It is a program for realizing.
Take the configuration.

また、本発明の他の形態であるフロー制御方法は、
複数のストレージ装置と、
前記複数のストレージ装置に対してデータを分散して記憶すると共に、当該ストレージ装置に既に記憶されているデータと同一内容の他のデータを格納する場合に、当該ストレージ装置に既に記憶されているデータを前記他のデータとして参照させる重複排除処理を行う複数の記憶処理装置と、
一群のデータからなるデータフローをいずれかの前記記憶処理装置に割り当てて当該割り当てた前記記憶処理装置に前記データフローが流れるよう設定するスイッチ部と、を備えたストレージシステムによるフロー制御方法であって、
前記データフローの予め定められた特性を当該データフロー毎に検出すると共に、前記記憶処理装置の予め定められた特性を当該記憶処理装置毎に検出し、
前記スイッチ部が、検出した前記データフローの特性と、検出した前記記憶処理装置の特性と、に基づいて、前記データフローを割り当てる前記記憶処理装置を決定する、
という構成を取る。In addition, a flow control method according to another embodiment of the present invention is as follows.
Multiple storage devices,
Data that is already stored in the storage device when data is distributed and stored in the plurality of storage devices and other data having the same content as the data that is already stored in the storage device is stored. A plurality of storage processing devices for performing deduplication processing to refer to as other data,
A flow control method by a storage system comprising: a switch unit that assigns a data flow consisting of a group of data to any one of the storage processing devices and sets the data flow to flow to the assigned storage processing device. ,
Detecting a predetermined characteristic of the data flow for each data flow, and detecting a predetermined characteristic of the storage processing apparatus for each storage processing apparatus;
The switch unit determines the storage processing device to which the data flow is assigned based on the detected characteristic of the data flow and the detected characteristic of the storage processing device;
Take the configuration.

本発明は、以上のように構成されることにより、ストレージシステムにおける効率的な負荷分散を図ることができる。 By configuring as described above, the present invention can achieve efficient load distribution in the storage system.

本発明におけるストレージシステムの構成の概略を示すブロック図である。1 is a block diagram showing an outline of a configuration of a storage system in the present invention. 図１に開示したアクセラレータノードとＰＦＳ制御装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the accelerator node disclosed in FIG. 1 and a PFS control apparatus. 図１に開示したストレージシステムにおけるデータを記憶するときの様子を示す図である。It is a figure which shows a mode when storing the data in the storage system disclosed in FIG. 図１に開示したストレージシステムにおけるデータを記憶するときの様子を示す図である。It is a figure which shows a mode when storing the data in the storage system disclosed in FIG. 図１に開示したストレージシステムにおける動作を示すフローチャートである。2 is a flowchart showing an operation in the storage system disclosed in FIG. 1. 図１に開示したストレージシステムにおける動作を示すフローチャートである。2 is a flowchart showing an operation in the storage system disclosed in FIG. 1. 図１に開示したストレージシステムにおける動作を示すフローチャートである。2 is a flowchart showing an operation in the storage system disclosed in FIG. 1. 本発明の付記１におけるストレージシステムの構成を示すブロック図である。It is a block diagram which shows the structure of the storage system in attachment 1 of this invention.

本発明の第１の実施形態を、図１乃至図７を参照して説明する。図１乃至図２はストレージシステムの構成を説明するための図であり、図３乃至７は、ストレージシステムの動作を説明するための図である。 A first embodiment of the present invention will be described with reference to FIGS. 1 and 2 are diagrams for explaining the configuration of the storage system, and FIGS. 3 to 7 are diagrams for explaining the operation of the storage system.

［構成］
図１に示すように、ストレージシステムは、複数のアクセラレータ１００，１１０，１２０と、ＰＦＳ制御装置３００と、ＣＡＳ（Content-Addressable Storage）４００と、ＰＦＳ（Programmable Flow Switch：プログラマブルフロースイッチ）５００と、を備えており、当該ストレージシステムにデータを格納する複数のクライアント２００，２１０，２２０，２３０が接続されている。以下、各構成について詳述する。[Constitution]
As shown in FIG. 1, the storage system includes a plurality of accelerators 100, 110, 120, a PFS control device 300, a CAS (Content-Addressable Storage) 400, a PFS (Programmable Flow Switch) 500, And a plurality of clients 200, 210, 220, 230 for storing data are connected to the storage system. Hereinafter, each configuration will be described in detail.

上記クライアント２００等は、バックアップソフトウェアが搭載された情報処理装置である。そして、クライアント２００等は、バックアップソフトウェアの処理により、あるいは、操作者の操作により、自装置に記憶されている一群のデータからなるデータフローをストレージシステムに格納すべく当該ストレージシステムに対して送信する。 The client 200 or the like is an information processing apparatus equipped with backup software. Then, the client 200 or the like transmits, to the storage system, a data flow composed of a group of data stored in the own device by processing of backup software or by an operation of the operator. .

上記ＰＦＳ５００は、クライアント２００等から送信されるデータを、フロー単位にルーティングしたりリダイレクトする機能を有する。つまり、ＰＦＳ５００は、クライアント２００等から送信された一群のデータであるデータフローのアクセラレータノード１００等への経路を設定したり、送信先となっているアクセラレータノード１００等を変更することができる。なお、ＰＦＳ５００によるデータフローの経路の設定や変更は、後述するようにＰＦＳ制御装置３００にて制御されることとなる。従って、ＰＦＳ５００とＰＦＳ制御装置３００は、データフローをいずれかのアクセラレータノード１００等に割り当てて当該割り当てたアクセラレータノードにデータフローが流れるよう設定するスイッチ部として機能する。 The PFS 500 has a function of routing or redirecting data transmitted from the client 200 or the like in units of flows. That is, the PFS 500 can set a route to the accelerator node 100 or the like of a data flow that is a group of data transmitted from the client 200 or the like, or can change the accelerator node 100 or the like that is a transmission destination. Note that the setting and changing of the data flow path by the PFS 500 are controlled by the PFS control device 300 as described later. Therefore, the PFS 500 and the PFS control device 300 function as a switch unit that assigns a data flow to one of the accelerator nodes 100 and sets the data flow to flow to the assigned accelerator node.

上記アクセラレータノード１００等と上記ＣＡＳ４００とは、データを分割及び冗長化し、分散して記憶すると共に、記憶するデータの内容に応じて設定される固有のコンテンツアドレスによって、当該データを格納した格納位置を特定するコンテンツアドレスストレージシステムを構成している。このコンテンツアドレスストレージシステムによると、既にＣＡＳ４００に記憶されているデータと同一内容の他のデータを格納する場合には、既に記憶されているデータをコンテンツアドレスを用いて他のデータとして参照することで、当該他のデータを記憶する必要が無くなる。そして、この他のデータ読み出すときには、上記コンテンツアドレスを参照して記憶されているデータを読み出すことで、同一内容のデータを読み出すことができる。このようにして、ストレージシステムは、データの重複記憶を排除した重複排除処理を実現している。 The accelerator node 100 and the like and the CAS 400 divide and make redundant and store the data in a distributed manner, and store the storage location where the data is stored by a unique content address set according to the content of the stored data. The specified content address storage system is configured. According to this content address storage system, when storing other data having the same content as the data already stored in the CAS 400, it is possible to refer to the already stored data as other data using the content address. The other data need not be stored. When other data is read, the same data can be read by reading the stored data with reference to the content address. In this way, the storage system realizes deduplication processing that eliminates duplicate storage of data.

そして、上記ＣＡＳ４００は、データを記憶する役割を担っており、複数のストレージノード（ストレージ装置）を備えて構成されている。また、アクセラレータノード１００等は、ＣＡＳ４００に対するデータ記憶処理つまり重複排除処理を行う役割を担っている。 The CAS 400 plays a role of storing data, and includes a plurality of storage nodes (storage devices). The accelerator node 100 and the like have a role of performing data storage processing, that is, deduplication processing, on the CAS 400.

次に、上記アクセラレータノード１００等についてさらに詳述する。なお、アクセラレータノード１００，１１０，１２０は複数装備されているが、以下に説明する符号１００のアクセラレータノードの構成を、全てのアクセラレータノードが備えているものとする。また、アクセラレータノードの数は、図２に示した数に限定されない。 Next, the accelerator node 100 and the like will be further described in detail. Although a plurality of accelerator nodes 100, 110, and 120 are provided, it is assumed that all accelerator nodes have a configuration of an accelerator node denoted by reference numeral 100 described below. Further, the number of accelerator nodes is not limited to the number shown in FIG.

アクセラレータノード１００等は、ＣＡＳ４００への入り口となる情報処理装置であるため、クライアント２００等からは、ＮＡＳ（Network Attached Storage）のように見える。なお、アクセラレータノード１００等とＣＡＳ４００とを含むストレージシステムは、ＧＮＳ（Global Name Space）機能を有しており、どのアクセラレータノード１００，１１０，１２０からも同じファイルシステムにアクセスすることができる。 Since the accelerator node 100 or the like is an information processing apparatus that serves as an entrance to the CAS 400, the client 200 or the like looks like NAS (Network Attached Storage). The storage system including the accelerator node 100 and the like and the CAS 400 has a GNS (Global Name Space) function, and any accelerator node 100, 110, 120 can access the same file system.

そして、アクセラレータノード１００等は、図２に示すように、ＣＡＳ処理部６２０を備えており、当該ＣＡＳ処理部６２０がＣＡＳ４００と協働してコンテンツアドレスを用いてファイルを管理することにより、同一内容のデータを重複してＣＡＳ４００に記憶しない重複排除処理を行っている。ここで、アクセラレータノード１００等と上記ＣＡＳ４００とによるデータ格納時における重複排除処理の様子を、図３乃至図４を参照して説明する。 As shown in FIG. 2, the accelerator node 100 or the like includes a CAS processing unit 620. The CAS processing unit 620 cooperates with the CAS 400 to manage files using content addresses, so that the same contents are obtained. The de-duplication processing is performed in which the data is not stored in the CAS 400 in duplicate. Here, the state of deduplication processing when data is stored by the accelerator node 100 and the CAS 400 will be described with reference to FIGS.

ストレージシステムは、まず、記憶対象となるデータＡ（データフロー）の入力を受けると（図４の矢印Ｙ１参照）、図３及び図４の矢印Ｙ２に示すように、当該データＡを、所定容量（例えば、６４ＫＢ）のブロックデータＤに分割する。そして、このブロックデータＤのデータ内容に基づいて、当該データ内容を代表する固有のハッシュ値Ｈを算出する（図４の矢印Ｙ３）。例えば、ハッシュ値Ｈは、予め設定されたハッシュ関数を用いて、ブロックデータＤのデータ内容から算出する。 First, when the storage system receives an input of data A (data flow) to be stored (see arrow Y1 in FIG. 4), the storage system stores the data A in a predetermined capacity as indicated by arrow Y2 in FIGS. The data is divided into block data D (for example, 64 KB). Based on the data contents of the block data D, a unique hash value H representing the data contents is calculated (arrow Y3 in FIG. 4). For example, the hash value H is calculated from the data content of the block data D using a preset hash function.

続いて、ストレージシステム、ブロックデータＤのハッシュ値Ｈを用いて、当該ブロックデータＤが既にＣＡＳ４００内に格納されているか否かを調べる。具体的には、まず、既に格納されているブロックデータＤは、後述するように、そのハッシュ値Ｈが含まれた格納位置を表すコンテンツアドレスＣＡが、コンテンツアドレス管理テーブルＭＦＩに登録されている。従って、格納前に算出したブロックデータＤのハッシュ値Ｈがコンテンツアドレス管理テーブルＭＦＩ内に存在していない場合には、まだ同一内容のブロックデータＤが記憶されていないと判断できる。一方で、ハッシュ値Ｈがコンテンツアドレス管理テーブルＭＦＩ内に存在している場合には、既に同一内容のブロックデータＤが記憶されていると判断できる（図４の矢印Ｙ４）。 Subsequently, the storage system uses the hash value H of the block data D to check whether the block data D is already stored in the CAS 400. Specifically, first, for the already stored block data D, as will be described later, a content address CA representing a storage position including the hash value H is registered in the content address management table MFI. Therefore, when the hash value H of the block data D calculated before storage does not exist in the content address management table MFI, it can be determined that the block data D having the same content is not yet stored. On the other hand, when the hash value H exists in the content address management table MFI, it can be determined that the block data D having the same content is already stored (arrow Y4 in FIG. 4).

続いて、ストレージシステムは、まだ同一のブロックデータＤが記憶されていないと判断されたブロックデータＤを、予め設定された圧縮ルールに従って圧縮し、図４の矢印Ｙ５に示すように、複数の所定の容量のフラグメントデータに分割する。例えば、図３の符号Ｄ１〜Ｄ９に示すように、９つのフラグメントデータ（分割データ１１）に分割する。さらに、ストレージシステムは、分割したフラグメントデータのうちいくつかが欠けた場合であっても、元となるブロックデータを復元可能なよう冗長データを生成し、上記分割したフラグメントデータ１１に追加する。例えば、図３の符号Ｄ１０〜Ｄ１２に示すように、３つのフラグメントデータ（冗長データ１２）を追加する。これにより、９つの分割データ１１と、３つの冗長データ１２とにより構成される１２個のフラグメントデータからなるデータセット１０を生成する。 Subsequently, the storage system compresses the block data D determined that the same block data D is not yet stored in accordance with a preset compression rule, and a plurality of predetermined data are indicated as indicated by an arrow Y5 in FIG. Is divided into fragment data of the capacity of. For example, as shown by symbols D1 to D9 in FIG. 3, the data is divided into nine fragment data (divided data 11). Further, the storage system generates redundant data so that the original block data can be restored even if some of the divided fragment data is missing, and adds it to the divided fragment data 11. For example, three pieces of fragment data (redundant data 12) are added as indicated by reference numerals D10 to D12 in FIG. As a result, a data set 10 composed of 12 pieces of fragment data composed of nine divided data 11 and three redundant data 12 is generated.

続いて、ストレージシステムは、生成されたデータセットを構成する各フラグメントデータを、ＣＡＳ４００を構成するストレージ装置に形成された各記憶領域に、それぞれ分散して格納する。例えば、図３に示すように、１２個のフラグメントデータＤ１〜Ｄ１２を生成した場合には、１２個の各記憶領域内にそれぞれ形成したデータ格納ファイルに、各フラグメントデータＤ１〜Ｄ１２を１つずつそれぞれ格納する（図４の矢印Ｙ６参照）。 Subsequently, the storage system stores each fragment data constituting the generated data set in a distributed manner in each storage area formed in the storage apparatus constituting the CAS 400. For example, as shown in FIG. 3, when 12 pieces of fragment data D1 to D12 are generated, one piece of each piece of fragment data D1 to D12 is stored in each data storage file formed in each of the 12 storage areas. Each is stored (see arrow Y6 in FIG. 4).

また、ストレージシステムは、上述したように記憶装置に格納したフラグメントデータＤ１〜Ｄ１２の格納位置、つまり、当該フラグメントデータＤ１〜Ｄ１２にて復元されるブロックデータＤの格納位置を表す、コンテンツアドレスＣＡを生成して管理する。具体的には、格納したブロックデータＤの内容に基づいて算出したハッシュ値Ｈの一部（ショートハッシュ）（例えば、ハッシュ値Ｈの先頭８Ｂ（バイト））と、論理格納位置を表す情報と、を組み合わせて、コンテンツアドレスＣＡを生成する（図４の矢印Ｙ７）。そして、重複排除システム１０は、記憶対象データのファイル名などの識別情報と、コンテンツアドレスＣＡとを関連付けてファイルシステムで管理すると共に、コンテンツアドレス管理テーブルＭＦＩに、生成したコンテンツアドレスＣＡの新規エントリを追加する。 Further, the storage system sets the content address CA indicating the storage position of the fragment data D1 to D12 stored in the storage device as described above, that is, the storage position of the block data D restored by the fragment data D1 to D12. Generate and manage. Specifically, a part of the hash value H (short hash) calculated based on the contents of the stored block data D (for example, the top 8B (bytes) of the hash value H), information indicating the logical storage position, Are combined to generate a content address CA (arrow Y7 in FIG. 4). The deduplication system 10 associates the identification information such as the file name of the storage target data with the content address CA and manages it in the file system, and also creates a new entry for the generated content address CA in the content address management table MFI. to add.

また、ストレージシステムは、記憶対象となるデータのブロックデータＤのハッシュ値Ｈが既にコンテンツアドレス管理テーブルＭＦＩに存在している場合、つまり、既に同一内容のブロックデータＤが格納されている場合には、格納前のブロックデータＤのハッシュ値Ｈと一致したハッシュ値が含まれるコンテンツアドレスＣＡを、コンテンツアドレス管理テーブルＭＦＩから取得する。そして、このコンテンツアドレスＣＡを、記憶対象データのブロックデータＤの格納先を表すコンテンツアドレスＣＡとする。これにより、コンテンツアドレスＣＡにて参照される既に格納されているデータが、記憶要求されたブロックデータＤとして参照されることとなり、当該記憶要求にかかるブロックデータＤ自体を重複して記憶する必要がなくなる。つまり、同一内容のデータが重複記憶されることを排除している。 Further, the storage system, when the hash value H of the block data D of the data to be stored already exists in the content address management table MFI, that is, when the block data D having the same content is already stored. The content address CA including the hash value that matches the hash value H of the block data D before storage is acquired from the content address management table MFI. Then, the content address CA is set as a content address CA representing the storage destination of the block data D of the storage target data. As a result, the already stored data referred to by the content address CA is referred to as the block data D requested to be stored, and the block data D itself related to the storage request needs to be stored redundantly. Disappear. That is, it is excluded that data having the same contents are stored repeatedly.

以上のように、アクセラレータノード１０等とＣＡＳ４００からなるストレージシステムは、記憶対象となるデータを小さなブロックサイズに分割し、そのブロックを既に保存されているブロックと比較して、他と重複しないユニークなブロックだけを圧縮して保存している。そして、既に同じ内容のブロックが存在すると判定された場合には、そのブロックが保存されている位置を表す情報（コンテンツアドレスＣＡ）を記録することで、データの重複記録を排除でき、実際に保存するデータ量を削減することができる。なお、上述したように、データの重複排除を行いつつ、圧縮して記憶するという重複排除処理は、主にアクセラレータノード１００のＣＰＵにプログラムが組み込まれることで構築されたＣＡＳ処理部６２０（図２参照）にて実行される。 As described above, the storage system including the accelerator node 10 and the CAS 400 and the storage system divides the data to be stored into small block sizes, and compares the block with a block that has already been stored. Only blocks are compressed and saved. If it is determined that a block having the same content already exists, information indicating the position where the block is stored (content address CA) can be recorded, so that duplicate recording of data can be eliminated and actually stored. Data amount to be reduced. As described above, the deduplication processing of compressing and storing data while performing deduplication of data is mainly performed by the CAS processing unit 620 (FIG. 2) constructed by incorporating a program into the CPU of the accelerator node 100. ).

また、アクセラレータノード１００等は、図２に示すように、データを格納する際に必要とされる一部の処理である特定の処理を専用に実行する各種の専用カード６３０（専用機器）を搭載している。これにより、アクセラレータノード１００等は、データ格納処理の一部を、当該アクセラレータノード１００等に搭載されたＣＰＵを用いることなく、ハードウェアである各種専用カード６３０に実行させることができる。その結果、アクセラレータノード１００等の負荷を抑制したり、処理の高速化を図ることができる。 Further, as shown in FIG. 2, the accelerator node 100 and the like are equipped with various dedicated cards 630 (dedicated devices) that execute specific processes that are a part of processes required when storing data. doing. Thereby, the accelerator node 100 or the like can cause the various dedicated cards 630 that are hardware to execute a part of the data storage processing without using a CPU mounted on the accelerator node 100 or the like. As a result, the load on the accelerator node 100 and the like can be suppressed, and the processing speed can be increased.

例えば、図１の符号１００に示すアクセラレータノードは、データをＣＡＳ４００に格納する前にデータの圧縮処理を行う「圧縮カード」を搭載している。また、符号１１０に示すアクセラレータノードは、ＣＡＳ４００に搭載されているハードディスクドライブなどの補助記憶装置よりもデータ入出力が高速なＳＳＤ（Solid State Drive）を搭載した「ＳＳＤカード」を搭載している。この「ＳＳＤカード」を用いることで、ファイルサイズの小さいデータを多数扱うときの処理が高速となる。さらに、符号１２０に示すアクセラレータノードは、上述した重複排除処理の一部であるブロックデータＤの内容に基づいてハッシュ値Ｈを算出する処理を実行する機器である「Hash計算カード」を備えている。 For example, an accelerator node indicated by reference numeral 100 in FIG. 1 is equipped with a “compression card” that performs data compression processing before data is stored in the CAS 400. The accelerator node denoted by reference numeral 110 is equipped with an “SSD card” equipped with a solid state drive (SSD) that is faster in data input / output than an auxiliary storage device such as a hard disk drive installed in the CAS 400. By using this “SSD card”, the processing when handling a large amount of data with a small file size becomes faster. Furthermore, the accelerator node denoted by reference numeral 120 includes a “Hash calculation card” that is a device that executes a process of calculating the hash value H based on the contents of the block data D that is a part of the deduplication process described above. .

但し、アクセラレータノード１００等は、全てが上述したような専用カード６３０を搭載していなくてもよい。なお、アクセラレータノード１００等のＣＰＵ処理性能やメモリ転送性能などの装置性能は、必ずしも全てが同一であるとは限らない。 However, all the accelerator nodes 100 and the like do not have to be equipped with the dedicated card 630 as described above. Note that the device performance such as the CPU processing performance and memory transfer performance of the accelerator node 100 and the like are not necessarily the same.

また、図２に示すように、アクセラレータノード１００等は、装備されたＣＰＵにプログラムが組み込まれることで構築された、フロー受付部６００、セッション移動処理部６１０、フロー特性判別部６４０、ＡＮ特性判別部６５０、フロー特性検出部６６０、を備えている。 Further, as shown in FIG. 2, the accelerator node 100 or the like includes a flow reception unit 600, a session movement processing unit 610, a flow characteristic determination unit 640, an AN characteristic determination, which are constructed by incorporating a program in the equipped CPU. Unit 650 and a flow characteristic detection unit 660.

上記フロー受付部６００は、クライアント２００等から送られてくるデータフローを受信し、フロー特性検出部６００やＣＡＳ処理部６２０に同じデータフローの内容を渡す。また、フロー受付部６００は、セッション移動処理部６１０の要求により、フローの受信を止めたり、受信したりする。 The flow receiving unit 600 receives a data flow sent from the client 200 or the like, and passes the contents of the same data flow to the flow characteristic detecting unit 600 or the CAS processing unit 620. In addition, the flow reception unit 600 stops or receives a flow according to a request from the session movement processing unit 610.

上記セッション移動処理部６１０は、フロー受付部６００にデータフローの停止／開始を要求したり、クライアントとのセッションの情報を他のアクセラレータノード（ＡＮ）に移動したりする処理を行なう。 The session movement processing unit 610 performs a process of requesting the flow reception unit 600 to stop / start the data flow or moving information on a session with the client to another accelerator node (AN).

上記フロー特性検出部６６０は、データフローの予め定められた種々の特性を当該データフロー毎に検出して、フロー特性判別部６４０に渡す処理を行う各部６６１〜６６５を備える。例えば、重複率計算部６６１は、ＣＡＳ処理部６２０による重複排除処理に基づいて、データフロー内のデータがＣＡＳ４００に記憶されているデータと重複している度合いを表す重複率を、このデータフローの特性として検出する。圧縮率計算部６６２は、ＣＡＳ処理部６２０による重複排除処理に基づいて、データフロー内のデータのＣＡＳ４００に対する記憶前と記憶後における圧縮度合いを表す圧縮率を、このデータフローの特性として検出する。圧縮検出部６６３は、クライアント２００に搭載されたバックアップソフトウェアなどによりデータフローに含まれたマーカ情報を参照して、データフローが既にアクセラレータノード１００等に入力される前に既に圧縮されているか否かを、このデータフローの特性として検出する。暗号化検出部６６４は、クライアント２００に搭載されたバックアップソフトウェアなどによりデータフローに含まれたマーカ情報を参照して、データフローが暗号化されているか否かを、このデータフローの特性として検出する。ファイルサイズ検出部６６５は、データフローに含まれるファイルのサイズを、このデータフローの特性として検出する。なお、フロー特性部検出部６６０にて検出されるデータフローの特性は、上述したものに限定されない。 The flow characteristic detection unit 660 includes units 661 to 665 that detect various predetermined characteristics of the data flow for each data flow and perform processing to pass to the flow characteristic determination unit 640. For example, the duplication rate calculation unit 661 calculates the duplication rate indicating the degree of duplication of the data in the data flow with the data stored in the CAS 400 based on the deduplication processing by the CAS processing unit 620. Detect as a characteristic. Based on the deduplication processing by the CAS processing unit 620, the compression rate calculation unit 662 detects a compression rate representing the degree of compression before and after storage of data in the data flow in the CAS 400 as a characteristic of this data flow. The compression detection unit 663 refers to the marker information included in the data flow by backup software or the like installed in the client 200, and determines whether or not the data flow has already been compressed before being input to the accelerator node 100 or the like. Are detected as characteristics of this data flow. The encryption detection unit 664 refers to marker information included in the data flow by backup software or the like installed in the client 200, and detects whether the data flow is encrypted as a characteristic of the data flow. . The file size detection unit 665 detects the size of the file included in the data flow as a characteristic of the data flow. Note that the characteristics of the data flow detected by the flow characteristic unit detection unit 660 are not limited to those described above.

上記フロー特性判別部６４０は、上述したフロー特性検出部６６０で検出した各データフローの特性を表す情報を、ＰＦＳ制御装置３００に送る。このとき、フロー特性判別部６４０は、データフローを特定する情報として、当該データフローを送信してきたクライアント２００等を識別する情報である「クライアントＩＰアドレス」と共に、上述したデータフローの重複率、圧縮率、圧縮の有無、暗号化の有無、ファイルサイズ、といった特性情報を、ＰＦＳ制御装置３００に送る。 The flow characteristic determination unit 640 sends information representing the characteristics of each data flow detected by the flow characteristic detection unit 660 to the PFS control apparatus 300. At this time, the flow characteristic discriminating unit 640 uses, as information for specifying the data flow, the “client IP address” that is information for identifying the client 200 or the like that has transmitted the data flow, and the data flow duplication rate and compression described above. Characteristic information such as the rate, the presence / absence of compression, the presence / absence of encryption, and the file size is sent to the PFS control device 300.

上記ＡＮ特性判別部６５０（装置特性検出部）は、アクセラレータノード１００等の特性を表す情報を当該アクセラレータノード１００等毎に収集して、ＰＦＳ制御装置に送る。例えば、ＡＮ特性判別部６５０は、アクセラレータノード１００等の特性として、搭載されているＣＰＵやメモリなどの性能、負荷状況、搭載されている専用カードの種類などの情報を収集する。そして、ＡＮ特性判別部６５０は、これらの情報を、アクセラレータノード１００等を識別する情報と共に、ＰＦＳ制御装置３００に送る。なお、ＡＮ特性判別部６５０にて収集されるアクセラレータノード１００等の特性は、上述したものに限定されない。 The AN characteristic discriminating unit 650 (apparatus characteristic detecting unit) collects information representing the characteristics of the accelerator node 100 and the like for each accelerator node 100 and transmits the information to the PFS control apparatus. For example, the AN characteristic discriminating unit 650 collects information such as the performance of the mounted CPU and memory, the load status, and the type of the dedicated card installed as the characteristics of the accelerator node 100 and the like. Then, the AN characteristic determination unit 650 sends these pieces of information to the PFS control apparatus 300 together with information for identifying the accelerator node 100 and the like. Note that the characteristics of the accelerator node 100 and the like collected by the AN characteristic determination unit 650 are not limited to those described above.

なお、上述したデータフローの特性やアクセラレータノードの特性の検出処理とＰＦＳ制御装置３００への送信は、全てのアクセラレータノード１００等により一定の間隔で常に行われる。 It should be noted that the above-described data flow characteristic and accelerator node characteristic detection processing and transmission to the PFS control device 300 are always performed at regular intervals by all accelerator nodes 100 and the like.

次に、上述したようにＰＦＳ５００と協働してスイッチ部として機能するＰＦＳ制御装置３００（スイッチ制御装置）について説明する。ＰＦＳ制御装置３００は、図２に示すように、装備された演算装置にプログラムが組み込まれることで構築されたフロー設定部３３０を備える。また、ＰＦＳ制御装置３００は、装備された記憶装置に形成された、ＡＮ特性情報データベース（ＤＢ）３１０と、フロー特性情報データベース（ＤＢ）３２０と、を備える。 Next, the PFS control device 300 (switch control device) that functions as a switch unit in cooperation with the PFS 500 as described above will be described. As shown in FIG. 2, the PFS control device 300 includes a flow setting unit 330 that is constructed by incorporating a program into an equipped arithmetic device. The PFS control device 300 includes an AN characteristic information database (DB) 310 and a flow characteristic information database (DB) 320 formed in the equipped storage device.

上記フロー設定部３３０は、各アクセラレータノード１００等から送られてきたデータフローの特性情報をフロー特性情報ＤＢ３２０に格納し、また、アクセラレータノード１００等の特性情報をＡＮ特性情報ＤＢ３１０に格納する。なお、各特性情報は、各アクセラレータノード１００等から送信されてくる度に更新される。 The flow setting unit 330 stores data flow characteristic information sent from each accelerator node 100 or the like in the flow characteristic information DB 320, and stores characteristic information of the accelerator node 100 or the like in the AN characteristic information DB 310. Each characteristic information is updated every time it is transmitted from each accelerator node 100 or the like.

そして、フロー設定部３３０は、フロー特性情報ＤＢ３２０に格納されたデータフローの特性情報と、ＡＮ特性情報ＤＢ３１０に格納されたアクセラレータノード１００等の特性情報と、に基づいて、各データフローに割り当てるアクセラレータノード１００等を決定する。そして、データフローに割り当てたアクセラレータノード１００等に当該データフローが流れるよう、ＰＦＳ５００に対して経路を新たに設定したり切り替える制御を行う。 The flow setting unit 330 then assigns an accelerator to each data flow based on the characteristic information of the data flow stored in the flow characteristic information DB 320 and the characteristic information of the accelerator node 100 or the like stored in the AN characteristic information DB 310. Node 100 and the like are determined. Then, control is performed to newly set or switch the path to the PFS 500 so that the data flow flows to the accelerator node 100 or the like assigned to the data flow.

具体的に、フロー設定部３３０は、まず、データフローの特性に対応する専用カード６３０を搭載したアクセラレータノード１００等を、そのデータフローに割り当てる。例えば、データフローの重複率が所定値よりも高い場合には、そのデータフローに、Hash計算カードを搭載したアクセラレータノードを割り当てる。また、データフローの圧縮率が所定値よりも高い場合には、そのデータフローに、圧縮カードを搭載したアクセラレータノードを割り当てる。また、データフロー内のファイルのサイズが所定値よりも小さい場合には、そのデータフローに、ＳＳＤカードを搭載したアクセラレータノードを割り当てる。但し、データフローが、クライアント２００等側のバックアップソフトウェアなどにより暗号化されていたり、既に圧縮されていた場合には、Hash計算カードや圧縮カードを搭載したアクセラレータノードを割り当てない。 Specifically, the flow setting unit 330 first assigns the accelerator node 100 or the like equipped with the dedicated card 630 corresponding to the data flow characteristic to the data flow. For example, when the data flow overlap rate is higher than a predetermined value, an accelerator node equipped with a Hash calculation card is assigned to the data flow. If the compression rate of the data flow is higher than a predetermined value, an accelerator node equipped with a compression card is assigned to the data flow. When the size of the file in the data flow is smaller than a predetermined value, an accelerator node equipped with an SSD card is assigned to the data flow. However, if the data flow is encrypted by backup software on the client 200 side or the like, or already compressed, an accelerator node equipped with a Hash calculation card or a compression card is not assigned.

また、フロー設定部３００は、上述したようにデータフローに割り当てる各専用カード６３０が搭載されたアクセラレータノード１００等がない場合には、ＡＮ特性情報ＤＢ３１０に格納されている各アクセラレータノード１００等の負荷状況に応じて、データフローに負荷の低いアクセラレータノード１００等を割り当てる。 In addition, when there is no accelerator node 100 or the like on which each dedicated card 630 assigned to the data flow is installed as described above, the flow setting unit 300 loads the accelerator node 100 or the like stored in the AN characteristic information DB 310. Depending on the situation, an accelerator node 100 having a low load is assigned to the data flow.

なお、フロー設定部３００は、上述したように、データフローにアクセラレータノードを割り当ててＰＦＳ５００のルーティングやリダイレクトを行う処理を、一定の間隔で行うか、各特性情報が更新される毎に行う。 Note that, as described above, the flow setting unit 300 performs processing for assigning an accelerator node to a data flow and performing routing and redirection of the PFS 500 at regular intervals or whenever each characteristic information is updated.

［動作］
次に、上述したストレージシステムの動作を、図５乃至図７のフローチャートを参照して説明する。[Operation]
Next, the operation of the above-described storage system will be described with reference to the flowcharts of FIGS.

事前に、アクセラレータノード１００，１１０，１２０には、同じＩＰアドレスが設定されている。これにより、クライアント２００，２１０，２２０，２３０からは、ＩＰ的には１つの装置にアクセスしているように見える。しかし、後述するように、ＰＦＳ５００のルーティング／リダイレクトによって実際にアクセスするアクセラレータノード１００等が決まる。 The same IP address is set in advance for the accelerator nodes 100, 110, and 120. As a result, the clients 200, 210, 220, and 230 appear to access one device in terms of IP. However, as will be described later, the accelerator node 100 or the like that is actually accessed is determined by the routing / redirection of the PFS 500.

まず、ＰＦＳ制御装置３００は、クライアント２００等がＰＦＳ５００に接続されたことを検出する（ステップＡ１）。次に、ＰＦＳ制御装置３００は、ＡＮ特性情報ＤＢ３１０から各アクセラレータノード１００等の負荷状況を取得し（ステップＡ２）、アクセラレータノード１００等の中から最も負荷の低いものを選び出す（ステップＡ３）。そして、選ばれたアクセラレータノードにクライアントからのフローが送信されるよう、フロー設定部３３０にてＰＦＳ５００を設定する（ステップＡ４）。このようにして、クライアントが最初にアクセスするアクセラレータノードが決定される。 First, the PFS control device 300 detects that the client 200 or the like is connected to the PFS 500 (step A1). Next, the PFS control device 300 acquires the load status of each accelerator node 100 or the like from the AN characteristic information DB 310 (step A2), and selects the one with the lowest load from the accelerator nodes 100 or the like (step A3). Then, the PFS 500 is set by the flow setting unit 330 so that the flow from the client is transmitted to the selected accelerator node (step A4). In this way, the accelerator node that the client accesses first is determined.

アクセラレータノードは、クライアントからのフローをフロー受付部６００で受信する。フロー受付部６００は、フローの特性検出用のパスと、通常の重複排除したデータ格納を行う処理のパスと、に同じデータフローを流す。つまり、フロー受付部６００は、フローの特性検出用のパスをフロー特性検出部６６０に流し、データ格納を行う処理のパスをＣＡＳ処理部６２０に流す。 The accelerator node receives a flow from the client by the flow reception unit 600. The flow reception unit 600 causes the same data flow to flow in a flow characteristic detection path and a normal processing path for storing data that has been deduplicated. In other words, the flow reception unit 600 causes a flow characteristic detection path to flow to the flow characteristic detection unit 660 and a process path for storing data to flow to the CAS processing unit 620.

そして、アクセラレータノードのフロー特性検出部６６０では、フロー受付部６００からデータフローを受け取ると、データフローの重複率、圧縮率、バックアップソフトウェアによる圧縮／暗号化の有無、ファイルサイズの検出など、データフローの各特性の検出処理を行なう。フロー特性検出部６６０は、検出したデータフローの特性をフロー特性判別部６４０に渡す。すると、フロー特性判別部６４０は、検出された情報を基に、データフローを送信してきたクライアントのＩＰアドレスを関連付けたフロー特性情報を作り、ＰＦＳ制御装置３００に送る。 When the flow characteristic detection unit 660 of the accelerator node receives the data flow from the flow reception unit 600, the data flow such as the data flow duplication rate, compression rate, presence / absence of compression / encryption by backup software, file size detection, etc. Detection processing of each characteristic is performed. The flow characteristic detection unit 660 passes the detected data flow characteristic to the flow characteristic determination unit 640. Then, the flow characteristic determination unit 640 creates flow characteristic information that associates the IP address of the client that has transmitted the data flow based on the detected information, and sends the flow characteristic information to the PFS control apparatus 300.

また、通常のデータ格納を行う処理のパスを受け取ったＣＡＳ処理部６２０は、図３，４を参照して説明したように、ＣＡＳ４００に対して重複記憶を排除したデータ格納処理を行う。このとき、アクセラレータノードにＣＰＵの代わりに処理を行なう専用カード６３０が搭載されている場合には、ＣＡＳ処理部６２０は搭載されている専用カード６３０で処理できる部分を、当該専用カード６３０に任せる。このようにして、データフローのデータは、ＣＡＳ４００に最終的に格納される。 In addition, the CAS processing unit 620 that has received the path of the processing for performing normal data storage performs data storage processing that eliminates duplicate storage on the CAS 400 as described with reference to FIGS. At this time, when the dedicated card 630 that performs processing instead of the CPU is mounted in the accelerator node, the CAS processing unit 620 leaves the portion that can be processed by the mounted dedicated card 630 to the dedicated card 630. In this way, the data flow data is finally stored in the CAS 400.

また、アクセラレータノード１００等のＡＮ特性判別部６５０は、アクセラレータノード１００等の処理性能や負荷状況、搭載されている専用カードの情報を収集して、ＰＦＳ制御装置３００に送る。 The AN characteristic determination unit 650 of the accelerator node 100 or the like collects processing performance and load status of the accelerator node 100 or the like, and information on the installed dedicated card, and sends the collected information to the PFS control apparatus 300.

その後、データフローの特性が変化したり、アクセラレータノードの負荷が高くなってきたりした場合には、ＰＦＳ制御装置３００は、負荷分散のためにフロー設定部３３０を使ってデータフローのリダイレクトを行なう。 Thereafter, when the characteristics of the data flow change or the load on the accelerator node increases, the PFS control device 300 redirects the data flow using the flow setting unit 330 for load distribution.

まず、ＰＦＳ制御装置３００のフロー設定部３３０は、ＡＮ特性情報ＤＢ３１０やフロー特性情報ＤＢ３２０内の情報を使って、データフローの特性変化やアクセラレータノードの負荷の変化を検出する（ステップＢ１）。そして、フロー設定部３３０は、データフローの特性に対応するアクセラレータノードの割り当てを決定し、そのアクセラレータノードにデータフローが流れるよう設定する。具体的に、フロー設定部３３０は、データフローの移動元のアクセラレータノードのセッション移動処理部６１０に対して、移動先のアクセラレータノードを通知してリダイレクトの指示を出す（ステップＢ２）。 First, the flow setting unit 330 of the PFS control device 300 detects data flow characteristic changes and accelerator node load changes using information in the AN characteristic information DB 310 and the flow characteristic information DB 320 (step B1). Then, the flow setting unit 330 determines the allocation of the accelerator node corresponding to the characteristics of the data flow, and sets the data flow to flow to the accelerator node. Specifically, the flow setting unit 330 notifies the session movement processing unit 610 of the accelerator node that is the movement source of the data flow of the accelerator node that is the movement destination and issues a redirect instruction (step B2).

セッション移動処理部６１０は、フロー受付部６００に対してデータフローの停止を要求する（ステップＢ３）。また、セッション移動処理部６１０は、データフローのセッション情報を、移動先のアクセラレータノードのセッション移動処理部６１０に送信する（ステップＢ４）。セッションの情報を受け取った移動先となるアクセラレータノードのセッション移動処理部６１０は、セッション情報を元にセッションを再構築する（ステップＢ５）。セッションの再構築が完了したら、ＰＦＳ制御装置３００はフロー設定部３３０を使ってデータフローのリダイレクトを設定する（ステップＢ６）。 The session movement processing unit 610 requests the flow reception unit 600 to stop the data flow (step B3). Also, the session movement processing unit 610 transmits the session information of the data flow to the session movement processing unit 610 of the movement destination accelerator node (step B4). The session movement processing unit 610 of the accelerator node that has received the session information reconstructs the session based on the session information (step B5). When the reconfiguration of the session is completed, the PFS control device 300 sets data flow redirection using the flow setting unit 330 (step B6).

次に、上述したＰＦＳ制御装置３００によるデータフローのアクセラレータノードに対する割り当て方法の一例を、図７を参照して説明する。 Next, an example of a method for assigning data flows to accelerator nodes by the PFS control device 300 described above will be described with reference to FIG.

まず、データフローに含まれるマーカ情報の存在を調べ、クライアント２００等がバックアップソフトウェアを使用して、データ格納のためにデータフローが送信されているか否かを調べる（ステップＣ１）。クライアント２００等がバックアップソフトウェアを使っている場合には（ステップＣ１でＹｅｓ）、バックアップソフトウェアにてデータフローに付加されたマーカ情報を参照して、データフローが既にバックアップソフトウェアによって圧縮されているか確認する（ステップＣ２）。バックアップソフトウェアで圧縮が実施されている場合には（ステップＣ２でＹｅｓ）、これ以上圧縮できない可能性が高いため、圧縮が効かないと判断し、次に重複しているのか確認する（ステップＣ４）。 First, the presence of marker information included in the data flow is checked, and the client 200 or the like uses the backup software to check whether the data flow is transmitted for data storage (step C1). When the client 200 or the like uses backup software (Yes in Step C1), it is checked whether the data flow has already been compressed by the backup software with reference to the marker information added to the data flow by the backup software. (Step C2). If compression is performed by the backup software (Yes in step C2), it is highly possible that compression is not possible any more, so it is determined that compression does not work, and then it is confirmed whether there is duplication (step C4). .

続いて、ステップＣ４で、データフローの重複率が所定値よりも高い場合にはそのデータフローのデータは今後も重複することが考えられるため（ステップＣ４でＹｅｓ）、Hash計算カードが搭載されているアクセラレータノードへ優先割り当てを行う（ステップＣ８）。一方、重複率が所定値以下である場合には（ステップＣ４でＮｏ）、アクセラレータノードの負荷状況に合わせて優先割り当てを行う（ステップＣ９）。 Subsequently, if the data flow duplication rate is higher than the predetermined value in step C4, the data of the data flow may be duplicated in the future (Yes in step C4), so the Hash calculation card is mounted. Priority allocation is performed to the existing accelerator node (step C8). On the other hand, when the duplication rate is equal to or less than the predetermined value (No in step C4), priority allocation is performed according to the load status of the accelerator node (step C9).

また、上述したステップＣ２で、データフローが既にバックアップソフトウェアで圧縮されておりこれ以上圧縮が効かないと判断された場合には（ステップＣ２でＮｏ）、データフローに含まれるマーカ情報を参照して、当該データフローがバックアップソフトウェアによって暗号化されているか判定する（ステップＣ５）。 If it is determined in step C2 that the data flow has already been compressed with the backup software and compression is not effective any more (No in step C2), the marker information included in the data flow is referred to. Then, it is determined whether the data flow is encrypted by the backup software (step C5).

そして、データフローが暗号化されている場合には（ステップＣ５でＹｅｓ）、圧縮も重複排除も効かないと考えられることから、アクセラレータノードの負荷状況に合わせて割り当てを決める（ステップＣ１０）。一方、データフローが暗号化されていない場合には（ステップＣ５でＮｏ）、上述同様にデータフローの重複率から当該データフローのデータが重複しているか確認する（ステップＣ６）。重複しているようであれば（ステップＣ６でＹｅｓ）、Hash計算カードが搭載されているアクセラレータノードへ優先割り当てを行う（ステップＣ１１）。重複していないようであれば、データフローに圧縮が効くか確認する（ステップＣ７）。 If the data flow is encrypted (Yes in Step C5), it is considered that neither compression nor deduplication is effective, and therefore allocation is determined according to the load status of the accelerator node (Step C10). On the other hand, if the data flow is not encrypted (No in step C5), it is confirmed from the data flow duplication rate whether the data of the data flow is duplicated (step C6). If they overlap (Yes in step C6), priority allocation is performed to the accelerator node on which the Hash calculation card is mounted (step C11). If they do not appear to overlap, it is confirmed whether compression is effective for the data flow (step C7).

続いて、ステップＣ７では、データフローの圧縮率が所定値よりも高い場合には、圧縮が効くと判断できるため（ステップＣ７でＹｅｓ）、圧縮カードが搭載されているアクセラレータノードへの優先割り当てを行う（ステップＣ１２）。一方、データフローの圧縮率が所定値以下であり圧縮が効かないようであれば（ステップＣ７でＮｏ）、アクセラレータノードの負荷状況に合わせて優先割り当てを行う（ステップＣ１３）。 Subsequently, in step C7, if the compression ratio of the data flow is higher than the predetermined value, it can be determined that the compression is effective (Yes in step C7), so priority allocation to the accelerator node on which the compression card is mounted is performed. Perform (Step C12). On the other hand, if the compression rate of the data flow is equal to or less than a predetermined value and compression does not work (No in step C7), priority allocation is performed according to the load status of the accelerator node (step C13).

また、ステップＣ１で、クライアント２００等でバックアップソフトウェアを使っていないと判定された場合には（ステップＣ１でＮｏ）、データストリーム中で次に書き込まれているファイルの大きさを確認する（ステップＣ３）。そして、ファイルの大きさが所定値以下であると判断された場合には（ステップＣ３でＮｏ）、小ファイルの書き込みが行われていると判断できる。このため、小ファイル処理性能を高速化するＳＳＤカードが搭載されているアクセラレータノードへの優先割り当てを行う（ステップＣ１４）。一方、ファイルサイズが所定値よりも大きい場合には（ステップＣ３でＹｅｓ）、上述同様にデータフローの重複率から当該データフローのデータが重複しているか確認する（ステップＣ６）。これ以降は、上述した内容と同じである。 If it is determined in step C1 that the client 200 or the like is not using backup software (No in step C1), the size of the file written next in the data stream is confirmed (step C3). ). If it is determined that the file size is equal to or smaller than the predetermined value (No in step C3), it can be determined that a small file is being written. For this reason, priority allocation is performed to the accelerator node on which the SSD card for increasing the small file processing performance is mounted (step C14). On the other hand, if the file size is larger than the predetermined value (Yes in step C3), it is confirmed from the data flow duplication rate whether the data of the data flow is duplicated (step C6). The subsequent contents are the same as described above.

ここで、データフローに割り当てるアクセラレータノードを決定する際には、複数のアクセラレータノードが候補に選ばれる可能性がある。この場合には、アクセラレータノードの負荷状況と処理性能とを参照して、複数選ばれたアクセラレータノードの中から一番負荷や処理性能に余裕のあるアクセラレータノードを選び出すことも可能である。また、データフローに割り当てる特定の専用カードを搭載したアクセラレータノードがなかった場合には、アクセラレータノードの負荷状況を判断基準として割り当てを行なう。 Here, when an accelerator node to be assigned to a data flow is determined, a plurality of accelerator nodes may be selected as candidates. In this case, referring to the load status and processing performance of the accelerator node, it is also possible to select an accelerator node having the most load and processing performance from among a plurality of selected accelerator nodes. Further, when there is no accelerator node equipped with a specific dedicated card to be allocated to the data flow, the allocation is performed using the load status of the accelerator node as a criterion.

以上のように、本発明のストレージシステムによると、データフローの重複率や圧縮率、バックアップソフトウェアが付加する情報、アクセラレータノードの処理性能や負荷状況を組み合わせて、それを基にＰＦＳを使ってフローを決定することにより、データフローに適切なアクセラレータノードに負荷を分散することができる。従って、システム全体としてデータ格納処理の高速化を図りつつ、効率的な負荷分散を実現することができる。 As described above, according to the storage system of the present invention, the data flow duplication rate and compression rate, the information added by the backup software, the processing performance and load status of the accelerator node are combined, and the flow using PFS is based on that combination. Can be distributed to accelerator nodes appropriate for the data flow. Therefore, efficient load distribution can be realized while speeding up data storage processing as a whole system.

また、ストレージシステム内に新旧世代のアクセラレータノードが混在したとしても、それぞれの性能に応じて負荷を分散することができるため、長期利用を考えたストレージ装置において効果的な負荷分散を実現することができる。 In addition, even if new and old generation accelerator nodes coexist in the storage system, the load can be distributed according to the performance of each, so that effective load distribution can be realized in a storage device for long-term use. it can.

さらに、アクセラレータノードで検出できる情報とＰＦＳを用いて負荷分散を行っているため、クライアントがデータの特性を意識すること無くフロー単位で負荷分散が可能となる。 Furthermore, since load distribution is performed using information that can be detected by the accelerator node and PFS, the load can be distributed in units of flows without the client being aware of the data characteristics.

なお、クライアント２００，２１０，２２０，２３０からバックアップしようとしているファイルの情報（バックアップソフトウェアの名前、バックアップソフトウェアによる圧縮／暗号化はあるか、バックアップしようとしているファイルのサイズなど）を、ＰＦＳ制御装置３００に送り、その情報を基に、ＰＦＳ制御装置３００が負荷分散を決定してもよい。つまり、ＰＦＳ制御装置３００がデータフローの情報を検出して、当該データフローの特性に応じたアクセラレータノードを割り当ててもよい。 It should be noted that information on the files to be backed up from the clients 200, 210, 220, and 230 (the name of the backup software, whether there is compression / encryption by the backup software, the size of the file to be backed up, etc.) And the PFS control device 300 may determine load distribution based on the information. That is, the PFS control device 300 may detect data flow information and assign an accelerator node according to the data flow characteristics.

＜付記＞
上記実施形態の一部又は全部は、以下の付記のようにも記載されうる。以下、本発明におけるストレージシステムの概略を図８を参照して説明する。但し、本発明は、以下の構成に限定されない。<Appendix>
Part or all of the above-described embodiment can be described as in the following supplementary notes. The outline of the storage system in the present invention will be described below with reference to FIG. However, the present invention is not limited to the following configuration.

（付記１）
複数のストレージ装置１０００と、
前記複数のストレージ装置に対してデータを分散して記憶すると共に、当該ストレージ装置に既に記憶されているデータと同一内容の他のデータを格納する場合に、当該ストレージ装置に既に記憶されているデータを前記他のデータとして参照させる重複排除処理を行う複数の記憶処理装置１１００と、
一群のデータからなるデータフローをいずれかの前記記憶処理装置に割り当てて当該割り当てた前記記憶処理装置に前記データフローが流れるよう設定するスイッチ部１２００と、
前記データフローの予め定められた特性を当該データフロー毎に検出するフロー特性検出部１１１０と、
前記記憶処理装置の予め定められた特性を当該記憶処理装置毎に検出する装置特性検出部１１２０と、を備え、
前記スイッチ部１２００は、前記フロー特性検出部にて検出した前記データフローの特性と、前記装置特性検出部にて検出した前記記憶処理装置の特性と、に基づいて、前記データフローを割り当てる前記記憶処理装置を決定する、
ストレージシステム。(Appendix 1)
A plurality of storage apparatuses 1000;
Data that is already stored in the storage device when data is distributed and stored in the plurality of storage devices and other data having the same content as the data that is already stored in the storage device is stored. A plurality of storage processing devices 1100 for performing deduplication processing for referring to the data as the other data;
A switch unit 1200 that assigns a data flow consisting of a group of data to any one of the storage processing devices and sets the data flow to flow to the assigned storage processing device;
A flow characteristic detection unit 1110 for detecting a predetermined characteristic of the data flow for each data flow;
A device characteristic detection unit 1120 for detecting a predetermined characteristic of the storage processing device for each storage processing device;
The switch unit 1200 assigns the data flow based on the characteristics of the data flow detected by the flow characteristic detection unit and the characteristics of the storage processing device detected by the device characteristic detection unit. Determine the processing equipment,
Storage system.

（付記２）
付記１に記載のストレージシステムであって、
前記装置特性検出部は、前記記憶処理装置の特性として、当該記憶処理装置に搭載されている前記データフローを前記ストレージ装置に記憶するための特定の処理を実行する専用機器を検出し、
前記スイッチ部は、前記フロー特性検出部にて検出した前記データフローの特性に対応する前記専用機器を搭載した前記記憶処理装置を、前記データフローに割り当てる、
ストレージシステム。(Appendix 2)
The storage system according to attachment 1, wherein
The device characteristic detection unit detects a dedicated device that executes a specific process for storing the data flow mounted in the storage processing device in the storage device as a characteristic of the storage processing device,
The switch unit assigns the storage processing device equipped with the dedicated device corresponding to the data flow characteristic detected by the flow characteristic detection unit to the data flow.
Storage system.

（付記３）
付記２に記載のストレージシステムであって、
前記フロー特性検出部は、前記データフローの特性として、当該データフロー内のデータが前記ストレージ装置に記憶されているデータと重複している度合いを表す重複度合いを検出し、
前記装置特性検出部は、前記記憶処理装置の特性として、当該記憶処理装置に搭載されている前記重複排除処理を実行する専用機器を検出し、
前記スイッチ部は、前記フロー特性検出部にて検出した前記データフローの重複度合いが所定値よりも高い場合に、当該データフローに前記重複排除処理を実行する専用機器を搭載した前記記憶処理装置を割り当てる、
ストレージシステム。(Appendix 3)
The storage system according to appendix 2,
The flow characteristic detection unit detects a degree of duplication representing a degree of duplication of data in the data flow with data stored in the storage device as the characteristic of the data flow,
The device characteristic detection unit detects a dedicated device that executes the deduplication process mounted on the storage processing device as a characteristic of the storage processing device,
The switch unit includes the storage processing device equipped with a dedicated device that executes the deduplication processing in the data flow when the degree of duplication of the data flow detected by the flow characteristic detection unit is higher than a predetermined value. assign,
Storage system.

（付記４）
付記３に記載のストレージシステムであって、
前記フロー特性検出部は、前記データフローの特性として、当該データフローが暗号化されているか否かを検出し、
前記スイッチ部は、前記フロー特性検出部にて前記データフローが暗号化されていない場合であって、前記フロー特性検出部にて検出した前記データフローの重複度合いが所定値よりも高い場合に、当該データフローに前記重複排除処理を実行する専用機器を搭載した前記記憶処理装置を割り当てる、
ストレージシステム。(Appendix 4)
The storage system according to attachment 3, wherein
The flow characteristic detection unit detects whether the data flow is encrypted as the characteristic of the data flow,
The switch unit is when the data flow is not encrypted by the flow characteristic detection unit, and when the degree of duplication of the data flow detected by the flow characteristic detection unit is higher than a predetermined value, Assigning the storage processing device equipped with a dedicated device for executing the deduplication processing to the data flow;
Storage system.

（付記５）
付記２乃至４のいずれかに記載のストレージシステムであって、
前記記憶処理装置は、前記ストレージ装置に対してデータを圧縮して記憶し、
前記フロー特性検出部は、前記データフローの特性として、当該データフロー内のデータの前記ストレージ装置に対する記憶前後における圧縮度合いを検出し、
前記装置特性検出部は、前記記憶処理装置の特性として、当該記憶処理装置に搭載されている圧縮処理を実行する専用機器を検出し、
前記スイッチ部は、前記フロー特性検出部にて検出した前記データフローの圧縮度合いが所定値よりも高い場合に、当該データフローに前記圧縮処理を実行する専用機器を搭載した前記記憶処理装置を割り当てる、
ストレージシステム。(Appendix 5)
The storage system according to any one of appendices 2 to 4,
The storage processing device compresses and stores data in the storage device,
The flow characteristic detection unit detects the degree of compression of data in the data flow before and after storage in the storage device as the characteristic of the data flow,
The device characteristic detection unit detects a dedicated device that executes a compression process mounted on the storage processing device as a characteristic of the storage processing device,
When the degree of compression of the data flow detected by the flow characteristic detection unit is higher than a predetermined value, the switch unit allocates the storage processing device equipped with a dedicated device that executes the compression processing to the data flow ,
Storage system.

（付記６）
付記５に記載のストレージシステムであって、
前記フロー特性検出部は、前記データフローの特性として、当該データフローが前記記憶処理装置に入力される前に圧縮されているか否かを検出し、
前記スイッチ部は、前記フロー特性検出部にて前記データフローが前記記憶処理装置に入力される前に圧縮されていない場合であって、前記フロー特性検出部にて検出した前記データフローの圧縮度合いが所定値よりも高い場合に、当該データフローに前記圧縮処理を実行する専用機器を搭載した前記記憶処理装置を割り当てる、
ストレージシステム。(Appendix 6)
The storage system according to appendix 5,
The flow characteristic detection unit detects, as the data flow characteristic, whether or not the data flow is compressed before being input to the storage processing device,
The switch unit is a case where the data flow is not compressed by the flow characteristic detection unit before being input to the storage processing device, and the degree of compression of the data flow detected by the flow characteristic detection unit Assigning the storage processing device equipped with a dedicated device for executing the compression processing to the data flow when the value is higher than a predetermined value,
Storage system.

（付記７）
付記２乃至６のいずれかに記載のストレージシステムであって、
前記フロー特性検出部は、前記データフローの特性として、当該データフロー中のファイルのサイズを検出し、
前記装置特性検出部は、前記記憶処理装置の特性として、当該記憶処理装置に搭載されているデータ入出力が前記ストレージ装置に搭載された補助記憶装置よりも高速な高速記憶装置を装備した専用機器を検出し、
前記スイッチ部は、前記フロー特性検出部にて検出した前記データフローのファイルサイズが所定値よりも小さい場合に、当該データフローに前記高速記憶装置を装備した専用機器を搭載した前記記憶処理装置を割り当てる、
ストレージシステム。(Appendix 7)
The storage system according to any one of appendices 2 to 6,
The flow characteristic detection unit detects the size of a file in the data flow as the characteristic of the data flow,
The device characteristic detection unit is a dedicated device equipped with a high-speed storage device that is faster in data input / output installed in the storage processing device than the auxiliary storage device installed in the storage device, as the characteristics of the storage processing device. Detect
When the file size of the data flow detected by the flow characteristic detection unit is smaller than a predetermined value, the switch unit includes the storage processing device equipped with a dedicated device equipped with the high-speed storage device for the data flow. assign,
Storage system.

（付記８）
付記２乃至７のいずれかに記載のストレージシステムであって、
前記装置特性検出部は、前記記憶処理装置の特性として、当該記憶処理装置の負荷状況を検出し、
前記スイッチ部は、前記フロー特性検出部にて検出した前記データフローの特性に対応する専用機器を搭載した前記記憶処理装置がない場合に、当該データフローに前記フロー特性検出部にて検出した負荷状況に応じて他の前記記憶処理装置を割り当てる、
ストレージシステム。(Appendix 8)
The storage system according to any one of appendices 2 to 7,
The device characteristic detection unit detects a load status of the storage processing device as a characteristic of the storage processing device,
The switch unit detects a load detected by the flow characteristic detection unit in the data flow when there is no storage processing device equipped with a dedicated device corresponding to the data flow characteristic detected by the flow characteristic detection unit. Assign other said storage processing device according to the situation,
Storage system.

（付記９）
複数のストレージ装置に対してデータを分散して記憶すると共に、当該ストレージ装置に既に記憶されているデータと同一内容の他のデータを格納する場合に、当該ストレージ装置に既に記憶されているデータを前記他のデータとして参照させる重複排除処理を行う複数の記憶処理装置のうち、いずれかの前記記憶処理装置に一群のデータからなるデータフローが流れるよう設定するスイッチ部に接続されたスイッチ制御装置であって、
前記データフロー毎に検出された当該データフローの予め定められた特性と、前記記憶処理装置毎に検出された当該記憶処理装置の予め定められた特性と、に基づいて、前記データフローに前記記憶処理装置を割り当てて当該割り当てた前記記憶処理装置に前記データフローが流れるよう前記スイッチ部を設定するフロー設定部を備えた、
スイッチ制御装置。(Appendix 9)
When distributing data to a plurality of storage devices and storing other data having the same contents as the data already stored in the storage device, the data already stored in the storage device is stored. A switch control device connected to a switch unit that sets a data flow consisting of a group of data to flow through any one of the plurality of storage processing devices that perform deduplication processing to be referred to as the other data. There,
Based on a predetermined characteristic of the data flow detected for each of the data flows and a predetermined characteristic of the storage processing device detected for each of the storage processing devices, the data flow is stored in the data flow. A flow setting unit that assigns a processing device and sets the switch unit so that the data flow flows to the allocated storage processing device;
Switch control device.

（付記１０）
付記９に記載のスイッチ制御装置であって、
前記記憶処理装置の特性として、当該記憶処理装置に搭載されている前記データフローを前記ストレージ装置に記憶するための特定の処理を実行する専用機器が検出された場合に、
前記フロー設定部は、検出された前記データフローの特性に対応する前記専用機器を搭載した前記記憶処理装置を、前記データフローに割り当てる、
スイッチ制御装置。(Appendix 10)
The switch control device according to attachment 9, wherein
As a characteristic of the storage processing device, when a dedicated device that executes a specific process for storing the data flow mounted in the storage processing device in the storage device is detected,
The flow setting unit assigns the storage processing device equipped with the dedicated device corresponding to the detected characteristic of the data flow to the data flow;
Switch control device.

（付記１１）
複数のストレージ装置に対してデータを分散して記憶すると共に、当該ストレージ装置に既に記憶されているデータと同一内容の他のデータを格納する場合に、当該ストレージ装置に既に記憶されているデータを前記他のデータとして参照させる重複排除処理を行う複数の記憶処理装置のうち、いずれかの前記記憶処理装置に一群のデータからなるデータフローが流れるよう設定するスイッチ部に接続されたスイッチ制御装置に、
前記データフロー毎に検出された当該データフローの予め定められた特性と、前記記憶処理装置毎に検出された当該記憶処理装置の予め定められた特性と、に基づいて、前記データフローに前記記憶処理装置を割り当てて当該割り当てた前記記憶処理装置に前記データフローが流れるよう前記スイッチ部を設定するフロー設定部、
を実現させるためのプログラム。(Appendix 11)
When distributing data to a plurality of storage devices and storing other data having the same contents as the data already stored in the storage device, the data already stored in the storage device is stored. A switch control device connected to a switch unit configured to set a data flow consisting of a group of data to flow through one of the storage processing devices among a plurality of storage processing devices that perform deduplication processing referred to as the other data. ,
Based on a predetermined characteristic of the data flow detected for each of the data flows and a predetermined characteristic of the storage processing device detected for each of the storage processing devices, the data flow is stored in the data flow. A flow setting unit for allocating a processing device and setting the switch unit so that the data flow flows to the allocated storage processing device;
A program to realize

（付記１２）
付記１１に記載のプログラムであって、
前記記憶処理装置の特性として、当該記憶処理装置に搭載されている前記データフローを前記ストレージ装置に記憶するための特定の処理を実行する専用機器が検出された場合に、
前記フロー設定部は、検出された前記データフローの特性に対応する前記専用機器を搭載した前記記憶処理装置を、前記データフローに割り当てる、
プログラム。(Appendix 12)
The program according to attachment 11, wherein
As a characteristic of the storage processing device, when a dedicated device that executes a specific process for storing the data flow mounted in the storage processing device in the storage device is detected,
The flow setting unit assigns the storage processing device equipped with the dedicated device corresponding to the detected characteristic of the data flow to the data flow;
program.

（付記１３）
複数のストレージ装置と、
前記複数のストレージ装置に対してデータを分散して記憶すると共に、当該ストレージ装置に既に記憶されているデータと同一内容の他のデータを格納する場合に、当該ストレージ装置に既に記憶されているデータを前記他のデータとして参照させる重複排除処理を行う複数の記憶処理装置と、
一群のデータからなるデータフローをいずれかの前記記憶処理装置に割り当てて当該割り当てた前記記憶処理装置に前記データフローが流れるよう設定するスイッチ部と、を備えたストレージシステムによるフロー制御方法であって、
前記データフローの予め定められた特性を当該データフロー毎に検出すると共に、前記記憶処理装置の予め定められた特性を当該記憶処理装置毎に検出し、
前記スイッチ部が、検出した前記データフローの特性と、検出した前記記憶処理装置の特性と、に基づいて、前記データフローを割り当てる前記記憶処理装置を決定する、
フロー制御方法。(Appendix 13)
Multiple storage devices,
Data that is already stored in the storage device when data is distributed and stored in the plurality of storage devices and other data having the same content as the data that is already stored in the storage device is stored. A plurality of storage processing devices for performing deduplication processing to refer to as other data,
A flow control method by a storage system comprising: a switch unit that assigns a data flow consisting of a group of data to any one of the storage processing devices and sets the data flow to flow to the assigned storage processing device. ,
Detecting a predetermined characteristic of the data flow for each data flow, and detecting a predetermined characteristic of the storage processing apparatus for each storage processing apparatus;
The switch unit determines the storage processing device to which the data flow is assigned based on the detected characteristic of the data flow and the detected characteristic of the storage processing device;
Flow control method.

（付記１４）
付記１３に記載のフロー制御方法であって、
前記記憶処理装置の特性として、当該記憶処理装置に搭載されている前記データフローを前記ストレージ装置に記憶するための特定の処理を実行する専用機器を検出し、
前記スイッチ部が、検出した前記データフローの特性に対応する前記専用機器を搭載した前記記憶処理装置を、前記データフローに割り当てる、
フロー制御方法。(Appendix 14)
The flow control method according to attachment 13, wherein
As a characteristic of the storage processing device, a dedicated device that executes a specific process for storing the data flow mounted in the storage processing device in the storage device is detected,
The switch unit assigns the storage processing device equipped with the dedicated device corresponding to the detected characteristic of the data flow to the data flow.
Flow control method.

なお、上記記載におけるプログラムは、記憶装置に記憶されていたり、コンピュータが読み取り可能な記録媒体に記録されている。例えば、記録媒体は、フレキシブルディスク、光ディスク、光磁気ディスク、及び、半導体メモリ等の可搬性を有する媒体である。 Note that the program described above is stored in a storage device or recorded on a computer-readable recording medium. For example, the recording medium is a portable medium such as a flexible disk, an optical disk, a magneto-optical disk, and a semiconductor memory.

以上、上記各実施形態を参照して本願発明を説明したが、本願発明は、上述した実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明の範囲内で当業者が理解しうる様々な変更をすることができる。 Although the present invention has been described with reference to the above embodiments, the present invention is not limited to the above-described embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

なお、本発明は、日本国にて２０１１年２月２８日に特許出願された特願２０１１−４１８６４の特許出願に基づく優先権主張の利益を享受するものであり、当該特許出願に記載された内容は、全て本明細書に含まれるものとする。 In addition, this invention enjoys the benefit of the priority claim based on the patent application of Japanese Patent Application No. 2011-41864 for which it applied for a patent in Japan on February 28, 2011, and was described in the said patent application. The contents are all included in this specification.

１００，１１０，１２０アクセラレータノード（ＡＮ）
２００，２１０，２２０，２３０クライアント
３００ＰＦＳ制御装置
３１０ＡＮ特性情報ＤＢ
３２０フロー特性情報ＤＢ
３３０フロー設定部
４００ＣＡＳ
５００ＰＦＳ
６００フロー受付部
６１０セッション移動処理部
６２０ＣＡＳ処理部
６３０専用カード
６４０フロー特性判別部
６５０ＡＮ特性判別部
６６０フロー特性検出部
６６１重複率計算部
６６２圧縮率計算部
６６３圧縮検出部
６６４暗号化検出部
６６５ファイルサイズ検出部
１０００ストレージ装置
１１００記憶処理装置
１１１０フロー特性検出部
１１２０装置特性検出部
１２００スイッチ部
100, 110, 120 Accelerator node (AN)
200, 210, 220, 230 Client 300 PFS control device 310 AN characteristic information DB
320 Flow characteristic information DB
330 flow setting unit 400 CAS
500 PFS
600 Flow reception unit 610 Session movement processing unit 620 CAS processing unit 630 Dedicated card 640 Flow characteristic determination unit 650 AN characteristic determination unit 660 Flow characteristic detection unit 661 Duplicate rate calculation unit 662 Compression rate calculation unit 663 Compression detection unit 664 Encryption detection unit 665 File size detection unit 1000 Storage device 1100 Storage processing device 1110 Flow characteristic detection unit 1120 Device characteristic detection unit 1200 Switch unit

Claims

Multiple storage devices,
Data that is already stored in the storage device when data is distributed and stored in the plurality of storage devices and other data having the same content as the data that is already stored in the storage device is stored. A plurality of storage processing devices for performing deduplication processing to refer to as other data,
A switch unit that assigns a data flow composed of a group of data to any one of the storage processing devices and sets the data flow to flow to the assigned storage processing device;
A flow characteristic detection unit for detecting a predetermined characteristic of the data flow for each data flow;
A device characteristic detection unit for detecting a predetermined characteristic of the storage processing device for each storage processing device;
The storage unit assigns the data flow based on the characteristics of the data flow detected by the flow characteristic detector and the characteristics of the storage processing device detected by the apparatus characteristic detector. Determine the equipment,
Storage system.

The storage system according to claim 1,
The device characteristic detection unit detects a dedicated device that executes a specific process for storing the data flow mounted in the storage processing device in the storage device as a characteristic of the storage processing device,
The switch unit assigns the storage processing device equipped with the dedicated device corresponding to the data flow characteristic detected by the flow characteristic detection unit to the data flow.
Storage system.

The storage system according to claim 2,
The flow characteristic detection unit detects a degree of duplication representing a degree of duplication of data in the data flow with data stored in the storage device as the characteristic of the data flow,
The device characteristic detection unit detects a dedicated device that executes the deduplication process mounted on the storage processing device as a characteristic of the storage processing device,
The switch unit includes the storage processing device equipped with a dedicated device that executes the deduplication processing in the data flow when the degree of duplication of the data flow detected by the flow characteristic detection unit is higher than a predetermined value. assign,
Storage system.

The storage system according to claim 3,
The flow characteristic detection unit detects whether the data flow is encrypted as the characteristic of the data flow,
The switch unit is when the data flow is not encrypted by the flow characteristic detection unit, and when the degree of duplication of the data flow detected by the flow characteristic detection unit is higher than a predetermined value, Assigning the storage processing device equipped with a dedicated device for executing the deduplication processing to the data flow;
Storage system.

The storage system according to any one of claims 2 to 4,
The storage processing device compresses and stores data in the storage device,
The flow characteristic detection unit detects the degree of compression of data in the data flow before and after storage in the storage device as the characteristic of the data flow,
The device characteristic detection unit detects a dedicated device that executes a compression process mounted on the storage processing device as a characteristic of the storage processing device,
When the degree of compression of the data flow detected by the flow characteristic detection unit is higher than a predetermined value, the switch unit allocates the storage processing device equipped with a dedicated device that executes the compression processing to the data flow ,
Storage system.

The storage system according to claim 5,
The flow characteristic detection unit detects, as the data flow characteristic, whether or not the data flow is compressed before being input to the storage processing device,
The switch unit is a case where the data flow is not compressed by the flow characteristic detection unit before being input to the storage processing device, and the degree of compression of the data flow detected by the flow characteristic detection unit Assigning the storage processing device equipped with a dedicated device for executing the compression processing to the data flow when the value is higher than a predetermined value,
Storage system.

The storage system according to any one of claims 2 to 6,
The flow characteristic detection unit detects the size of a file in the data flow as the characteristic of the data flow,
The device characteristic detection unit is a dedicated device equipped with a high-speed storage device that is faster in data input / output installed in the storage processing device than the auxiliary storage device installed in the storage device, as the characteristics of the storage processing device. Detect
When the file size of the data flow detected by the flow characteristic detection unit is smaller than a predetermined value, the switch unit includes the storage processing device equipped with a dedicated device equipped with the high-speed storage device for the data flow. assign,
Storage system.

The storage system according to any one of claims 2 to 7,
The device characteristic detection unit detects a load status of the storage processing device as a characteristic of the storage processing device,
The switch unit detects a load detected by the flow characteristic detection unit in the data flow when there is no storage processing device equipped with a dedicated device corresponding to the data flow characteristic detected by the flow characteristic detection unit. Assign other said storage processing device according to the situation,
Storage system.

When distributing data to a plurality of storage devices and storing other data having the same contents as the data already stored in the storage device, the data already stored in the storage device is stored. A switch control device connected to a switch unit that sets a data flow consisting of a group of data to flow through any one of the plurality of storage processing devices that perform deduplication processing to be referred to as the other data. There,
Based on a predetermined characteristic of the data flow detected for each of the data flows and a predetermined characteristic of the storage processing device detected for each of the storage processing devices, the data flow is stored in the data flow. A flow setting unit that assigns a processing device and sets the switch unit so that the data flow flows to the allocated storage processing device;
Switch control device.

Multiple storage devices,
Data that is already stored in the storage device when data is distributed and stored in the plurality of storage devices and other data having the same content as the data that is already stored in the storage device is stored. A plurality of storage processing devices for performing deduplication processing to refer to as other data,
A flow control method by a storage system comprising: a switch unit that assigns a data flow consisting of a group of data to any one of the storage processing devices and sets the data flow to flow to the assigned storage processing device. ,
Detecting a predetermined characteristic of the data flow for each data flow, and detecting a predetermined characteristic of the storage processing apparatus for each storage processing apparatus;
The switch unit determines the storage processing device to which the data flow is assigned based on the detected characteristic of the data flow and the detected characteristic of the storage processing device;
Flow control method.