JP6229577B2

JP6229577B2 - Cache storage program, information processing apparatus, and cache storage method

Info

Publication number: JP6229577B2
Application number: JP2014079678A
Authority: JP
Inventors: 真一佐沢
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2014-04-08
Filing date: 2014-04-08
Publication date: 2017-11-15
Anticipated expiration: 2034-04-08
Also published as: JP2015201050A

Description

本発明は、キャッシュ保存プログラムなどに関する。 The present invention relates to a cache storage program and the like.

データのグローバル化およびクラウドやモバイルの利用が進む中、通信の高速化の要望が増大している。通信の高速化の要望に対して、通信装置は、データを転送する際、転送するデータを最適化することで、転送するデータ量を削減する。すなわち、通信装置は、過去に転送したデータを重複データとして転送するデータから除去し、転送するデータのデータ量を削減する。 With the globalization of data and the use of cloud and mobile, the demand for faster communication is increasing. In response to the demand for high-speed communication, the communication device reduces the amount of data to be transferred by optimizing the data to be transferred when transferring the data. That is, the communication apparatus removes data transferred in the past from data to be transferred as duplicate data, and reduces the amount of data to be transferred.

ここで、従来の重複データの除去技術について、図１１を参照して説明する。図１１は、従来の重複データの除去技術を示す図である。図１１に示すように、送信側の通信装置は、入力データをチャンクと呼ばれる可変長ブロックに分割し、チャンク単位で、キャッシュメモリに記憶された過去の入力データとの重複判定を行う。ここで、チャンクとは、データがバウンダリで分割されることにより得られる可変長ブロックのことをいう。そして、送信側の通信装置は、重複判定によって、チャンクに対応するデータがキャッシュメモリになければ、データ（生データ）をキャッシュメモリに保存するとともに、生データを、ネットワークを介して受信側の通信装置へ転送する。一方、送信側の通信装置は、重複判定によって、チャンクに対応するデータがキャッシュメモリにあれば、データのインデックス（ＩＤ：identification）のみを、ネットワークを介して受信側の通信装置へ転送する。 Here, a conventional duplicate data removal technique will be described with reference to FIG. FIG. 11 is a diagram showing a conventional duplicate data removal technique. As illustrated in FIG. 11, the communication device on the transmission side divides input data into variable-length blocks called chunks, and performs duplication determination with past input data stored in the cache memory in units of chunks. Here, a chunk refers to a variable length block obtained by dividing data on a boundary. Then, if the data corresponding to the chunk is not in the cache memory due to the duplication determination, the transmitting communication device saves the data (raw data) in the cache memory and transmits the raw data to the receiving side via the network. Transfer to device. On the other hand, if there is data corresponding to the chunk in the cache memory by duplication determination, the transmitting communication device transfers only the data index (ID: identification) to the receiving communication device via the network.

受信側の通信装置では、生データを受信した場合、受信した生データをキャッシュメモリに保存するとともに、生データを受信データとする。一方、受信側の通信装置では、データのインデックスを受信した場合、キャッシュメモリを用いてデータを復元し、復元したデータを受信データとする。 When receiving the raw data, the communication device on the receiving side stores the received raw data in the cache memory and uses the raw data as received data. On the other hand, when receiving the data index, the communication device on the receiving side restores the data using the cache memory, and uses the restored data as received data.

送信側の通信装置のキャッシュメモリが満杯になった場合には、使われていない一番古いデータ（ＬＲＵ：Least Recently Used）から順番に、キャッシュメモリに記憶されたデータを削除する。 When the cache memory of the communication device on the transmission side becomes full, the data stored in the cache memory is deleted in order from the oldest unused data (LRU: Least Recently Used).

また、データをキャッシュメモリに保存する技術が開示されている。かかる技術では、複数のキャッシュを有するプロキシサーバが、ウェブサーバから送信されたデータの大きさに応じて、当該データを記憶させるキャッシュを決定し、決定されたキャッシュにそのデータを記憶する技術が開示されている（例えば、特許文献１参照）。 Further, a technique for storing data in a cache memory is disclosed. In such a technique, a technique is disclosed in which a proxy server having a plurality of caches determines a cache for storing the data according to the size of the data transmitted from the web server, and stores the data in the determined cache. (For example, refer to Patent Document 1).

特開２００１−２５６０９８号公報JP 2001-256098 A

しかしながら、通信装置は、キャッシュメモリを効率的にヒットさせることができないという問題がある。例えば、通信装置は、転送するデータの重複の特性を考慮しないで、キャッシュメモリにデータを保存するので、キャッシュメモリを効率的にヒットさせることができない。ここで、データの重複の特性の一例について、図１２Ａ、図１２Ｂを参照して説明する。 However, there is a problem that the communication device cannot efficiently hit the cache memory. For example, the communication device stores the data in the cache memory without considering the duplication characteristics of the data to be transferred, and thus cannot efficiently hit the cache memory. Here, an example of data duplication characteristics will be described with reference to FIGS. 12A and 12B.

図１２Ａは、重複データの一例を示す図である。図１２Ａでは、転送するデータの流れを示し、Ｘが、重複データであるとする。図１２Ａが示す送信データでは、重複データＸの継続する長さは短いが、重複データＸは頻繁に発生している。このような転送するデータの重複の特性は、ＦＳＣ（frequent and short chunk series）と呼ばれる。 FIG. 12A is a diagram illustrating an example of duplicate data. FIG. 12A shows the flow of data to be transferred, and X is duplicate data. In the transmission data shown in FIG. 12A, the continuous length of the duplicate data X is short, but the duplicate data X occurs frequently. Such a characteristic of duplication of data to be transferred is called FSC (frequent and short chunk series).

図１２Ｂは、重複データの別の例を示す図である。図１２Ｂでは、転送するデータの流れを示し、Ｘが、重複データであるとする。図１２Ｂが示す送信データでは、重複データＸの継続する長さは長いが、重複データＸは稀に発生している。このような転送するデータの重複の特性は、ＲＬＣ（rare and long chunk series）と呼ばれる。 FIG. 12B is a diagram illustrating another example of duplicate data. FIG. 12B shows the flow of data to be transferred, and X is duplicate data. In the transmission data shown in FIG. 12B, the continuous length of the duplicate data X is long, but the duplicate data X occurs rarely. Such a characteristic of duplication of data to be transferred is called RLC (rare and long chunk series).

このように、データには、重複の特性があるにもかかわらず、通信装置は、転送するデータの重複の特性を考慮しないで、キャッシュメモリにデータを保存しているので、キャッシュメモリを効率的にヒットさせることができない。 In this way, the communication device stores data in the cache memory without considering the duplication characteristics of the data to be transferred, even though the data has duplication characteristics. Can not be hit.

なお、上記課題は、データを転送する通信装置の場合だけではなく、データを処理する情報処理装置の場合にも同様に生じる課題である。 The above-mentioned problem is not only caused in the case of a communication apparatus that transfers data, but also occurs in the case of an information processing apparatus that processes data.

１つの側面では、キャッシュメモリを効率的にヒットさせることを目的とする。 In one aspect, an object is to efficiently hit a cache memory.

本願の開示するキャッシュ保存プログラムは、コンピュータに、処理対象のデータが新規である場合に、前記処理対象のデータを第１のキャッシュメモリに記憶し、前記第１のキャッシュメモリに記憶されたデータ毎に、処理対象のデータと重複する重複回数および処理対象のデータと重複する際に継続して重複する長さを示す継続重複長を重複履歴として管理し、新規のデータを処理する際、前記第１のキャッシュメモリに記憶されたデータの数が上限である場合に、所定の削除対象のデータを抽出し、前記重複履歴で管理された前記削除対象のデータの重複回数および継続重複長に応じて、前記削除対象のデータを、第２のキャッシュメモリおよび第３のキャッシュメモリのいずれかのキャッシュメモリに移動する処理を実行させる。 The cache storage program disclosed in the present application stores, in a computer, when data to be processed is new, stores the data to be processed in a first cache memory, and stores each data stored in the first cache memory. In addition, the number of times of duplication that overlaps with the data to be processed and the continuous duplication length that indicates the length of duplication that continues when overlapping with the data to be processed are managed as duplication history, and when the new data is processed, When the number of data stored in one cache memory is the upper limit, a predetermined deletion target data is extracted, and the deletion target data managed in the duplication history is duplicated and the continuous duplication length Then, a process of moving the data to be deleted to one of the second cache memory and the third cache memory is executed.

本願の開示するキャッシュ保存プログラムの１つの態様によれば、キャッシュメモリを効率的にヒットさせることが可能となる。 According to one aspect of the cache storage program disclosed in the present application, it is possible to efficiently hit the cache memory.

図１は、実施例１に係る通信装置の構成を示す機能ブロック図である。FIG. 1 is a functional block diagram illustrating the configuration of the communication apparatus according to the first embodiment. 図２は、実施例１に係る通信装置のキャッシュ保存処理の一例を示す図である。FIG. 2 is a diagram illustrating an example of cache storage processing of the communication apparatus according to the first embodiment. 図３は、入力データと送信するデータのデータ構造を示す図である。FIG. 3 is a diagram illustrating a data structure of input data and data to be transmitted. 図４は、実施例１に係る重複管理情報のデータ構造の一例を示す図である。FIG. 4 is a diagram illustrating an example of a data structure of duplication management information according to the first embodiment. 図５Ａは、実施例１に係るキャッシュ保存処理のフローチャートを示す図（１）である。FIG. 5A is a diagram (1) illustrating the flowchart of the cache storage process according to the first embodiment. 図５Ｂは、実施例１に係るキャッシュ保存処理のフローチャートを示す図（２）である。FIG. 5B is a diagram (2) illustrating the flowchart of the cache saving process according to the first embodiment. 図５Ｃは、実施例１に係るキャッシュ保存処理のフローチャートを示す図（３）である。FIG. 5C is a diagram (3) illustrating a flowchart of the cache storage process according to the first embodiment. 図６は、実施例２に係る通信装置の構成を示す機能ブロック図である。FIG. 6 is a functional block diagram illustrating the configuration of the communication apparatus according to the second embodiment. 図７は、実施例２に係る通信装置のキャッシュ保存処理の一例を示す図である。FIG. 7 is a diagram illustrating an example of cache storage processing of the communication apparatus according to the second embodiment. 図８は、実施例２に係る重複管理情報のデータ構造の一例を示す図である。FIG. 8 is a diagram illustrating an example of a data structure of duplication management information according to the second embodiment. 図９Ａは、実施例２に係るキャッシュ保存処理のフローチャートを示す図（１）である。FIG. 9A is a diagram (1) illustrating the flowchart of the cache storage process according to the second embodiment. 図９Ｂは、実施例２に係るキャッシュ保存処理のフローチャートを示す図（２）である。FIG. 9B is a diagram (2) illustrating the flowchart of the cache saving process according to the second embodiment. 図９Ｃは、実施例２に係るキャッシュ保存処理のフローチャートを示す図（３）である。FIG. 9C is a diagram (3) illustrating the flowchart of the cache storage process according to the second embodiment. 図９Ｄは、実施例２に係るキャッシュ保存処理のフローチャートを示す図（４）である。FIG. 9D is a diagram (4) illustrating the flowchart of the cache saving process according to the second embodiment. 図１０は、キャッシュ保存プログラムを実行するコンピュータの一例を示す図である。FIG. 10 is a diagram illustrating an example of a computer that executes a cache storage program. 図１１は、従来の重複データの除去技術を示す図である。FIG. 11 is a diagram showing a conventional duplicate data removal technique. 図１２Ａは、重複データの一例を示す図である。FIG. 12A is a diagram illustrating an example of duplicate data. 図１２Ｂは、重複データの別の例を示す図である。FIG. 12B is a diagram illustrating another example of duplicate data.

以下に、本願の開示するキャッシュ保存プログラム、情報処理装置およびキャッシュ保存方法の実施例を図面に基づいて詳細に説明する。なお、実施例によりこの発明が限定されるものではない。 Embodiments of a cache storage program, an information processing apparatus, and a cache storage method disclosed in the present application will be described below in detail with reference to the drawings. The present invention is not limited to the embodiments.

［通信装置の構成］
図１は、実施例１に係る通信装置の構成を示す機能ブロック図である。通信装置１は、通信装置間で送受信されるデータを送信する通信装置である。通信装置１は、過去に送信したデータを記憶するキャッシュ保存領域を階層化し、上位のキャッシュが満杯であるタイミングで、データの重複の特性に応じて、データを所定の下位のキャッシュに移動する。 [Configuration of communication device]
FIG. 1 is a functional block diagram illustrating the configuration of the communication apparatus according to the first embodiment. The communication device 1 is a communication device that transmits data transmitted and received between communication devices. The communication device 1 hierarchizes a cache storage area for storing data transmitted in the past, and moves data to a predetermined lower cache at a timing when the upper cache is full according to the data duplication characteristics.

通信装置１は、記憶部１０および制御部２０を有する。記憶部１０は、例えばフラッシュメモリ（Flash Memory）やＦＲＡＭ（登録商標）（Ferroelectric Random Access Memory）等の不揮発性の半導体メモリ素子等の記憶装置に対応する。記憶部１０は、キャッシュメモリとしての役割を担うキャッシュ保存領域を含む。キャッシュ保存領域は、メモリ１１と、ディスク１２と、ＳＳＤ（Solid State Drive）１３とを有し、メモリ１１の配下にディスク１２およびＳＳＤ１３を配置する。 The communication device 1 includes a storage unit 10 and a control unit 20. The storage unit 10 corresponds to a storage device such as a nonvolatile semiconductor memory element such as a flash memory or a FRAM (registered trademark) (Ferroelectric Random Access Memory). The storage unit 10 includes a cache storage area that serves as a cache memory. The cache storage area includes a memory 11, a disk 12, and an SSD (Solid State Drive) 13, and the disk 12 and the SSD 13 are arranged under the memory 11.

ここで、実施例１に係る通信装置１のキャッシュ保存処理の一例を、図２を参照して説明する。図２は、実施例１に係る通信装置のキャッシュ保存処理の一例を示す図である。図２に示すように、通信装置１は、入力データをチャンクと呼ばれる可変長ブロックに分割し、チャンク単位で、メモリ１１に記憶された過去の入力データとの重複判定を行う。
通信装置１は、重複判定によって、チャンクに対応するデータがメモリ１１にあれば、重複したデータを除去すべく、データのインデックス（ＩＤ：identification）のみを、ネットワークを介して受信側の通信装置へ転送する。 Here, an example of the cache storage process of the communication apparatus 1 according to the first embodiment will be described with reference to FIG. FIG. 2 is a diagram illustrating an example of cache storage processing of the communication apparatus according to the first embodiment. As illustrated in FIG. 2, the communication device 1 divides input data into variable-length blocks called chunks, and performs duplication determination with past input data stored in the memory 11 in units of chunks.
If there is data corresponding to the chunk in the memory 11 as a result of the duplication determination, the communication device 1 sends only the data index (ID: identification) to the receiving communication device via the network in order to remove the duplicate data. Forward.

一方、通信装置１は、重複判定によって、チャンクに対応するデータがメモリ１１になければ、データ（生データ）をメモリ１１に保存するとともに、生データを、ネットワークを介して受信側の通信装置へ送信する。通信装置１は、データをメモリ１１に保存する際にメモリ１１が満杯になった場合には、メモリ１１に記憶されたチャンクのうち、使われていない一番古いチャンク（ＬＲＵ：Least Recently Used）を抽出する。そして、通信装置１は、抽出したチャンクを削除対象とする。 On the other hand, if there is no data corresponding to the chunk in the memory 11 due to the duplication determination, the communication device 1 stores the data (raw data) in the memory 11 and sends the raw data to the receiving communication device via the network. Send. When the memory 11 becomes full when the data is stored in the memory 11, the communication device 1 among the chunks stored in the memory 11 is the oldest unused chunk (LRU: Least Recently Used). To extract. Then, the communication device 1 sets the extracted chunk as a deletion target.

通信装置１は、削除対象のデータについて、重複が稀にあり、重複が長い間継続すると判定する場合には、削除対象のデータをキャッシュ保存領域のメモリ１１からディスク１２に移動する。すなわち、データの重複の特性が、ＲＬＣ（rare and long chunk series）である場合には、通信装置１は、当該データを、階層化されたキャッシュの低速のディスク１２に移動する。 If the communication device 1 determines that the data to be deleted has duplication rarely and the duplication continues for a long time, the communication device 1 moves the data to be deleted from the memory 11 in the cache storage area to the disk 12. That is, when the data duplication characteristic is RLC (rare and long chunk series), the communication device 1 moves the data to the low-speed disk 12 of the hierarchical cache.

また、通信装置１は、削除対象のデータについて、重複が頻繁にあり、重複が短い間継続すると判定する場合には、削除対象のデータをキャッシュ保存領域のメモリ１１からＳＳＤ１３に移動する。すなわち、データの重複の特性が、ＦＳＣ（frequent and short chunk series）である場合には、通信装置１は、当該データを、階層化されたキャッシュの高速のＳＳＤ１３に移動する。 If the communication device 1 determines that the data to be deleted frequently overlaps and continues for a short time, the communication device 1 moves the data to be deleted from the memory 11 in the cache storage area to the SSD 13. That is, when the data duplication characteristic is FSC (frequent and short chunk series), the communication device 1 moves the data to the high-speed SSD 13 of the hierarchical cache.

これにより、通信装置１は、データの重複の特性を考慮して、キャッシュ保存領域にデータを保存するので、キャッシュ保存領域を効率的にヒットさせることが可能となる。 As a result, the communication device 1 stores data in the cache storage area in consideration of data duplication characteristics, and thus it is possible to efficiently hit the cache storage area.

図３は、入力データと送信するデータのデータ構造を示す図である。なお、図３で示すＡ、Ｂ、Ｃ、Ｄは、それぞれチャンクを表すものとする。すなわち、入力データは、Ａ、Ｂ、Ｃ、Ｂ、Ｄの各チャンクに分割されている。図３に示すように、入力データの各チャンクの出現位置は、継続している。ここでいう「出現位置」は、新規のチャンクが出現する都度増大し、チャンクが重複する時には増大しない一種のアドレスのことをいう。入力データのデータ量は、チャンクＡの１２００、チャンクＢの１０００、チャンクＣの１１００、チャンクＢの１０００、チャンクＤの９００の総数５２００となる。ところが、チャンクＢが重複しているので、送信するデータのデータ量は、チャンクＡの１２００、チャンクＢの１０００、チャンクＣの１１００、チャンクＤの９００の総数４２００となる。すなわち、チャンクＢは重複しているため、ＩＤを除いてデータ量が０となるとともに、出現位置は０となる。 FIG. 3 is a diagram illustrating a data structure of input data and data to be transmitted. Note that A, B, C, and D shown in FIG. 3 each represent a chunk. That is, the input data is divided into A, B, C, B, and D chunks. As shown in FIG. 3, the appearance position of each chunk of input data continues. The “appearance position” here refers to a kind of address that increases each time a new chunk appears and does not increase when the chunks overlap. The data amount of the input data is a total number 5200 of chunk A 1200, chunk B 1000, chunk C 1100, chunk B 1000, and chunk D 900. However, since chunk B is duplicated, the amount of data to be transmitted is 1200 for chunk A, 1000 for chunk B, 1100 for chunk C, and a total of 4200 for 900 for chunk D. That is, since the chunk B is duplicated, the data amount is 0 except for the ID, and the appearance position is 0.

図１に戻って、メモリ１１は、重複管理情報１１１を記憶する。重複管理情報１１１は、既に記憶したチャンクについて、入力データとの重複の履歴を管理する。すなわち、重複管理情報１１１は、既に送信したデータを分割した複数のチャンク毎に、入力したデータとの重複時のデータ長および重複の回数を重複履歴として管理する。なお、重複管理情報１１１のデータ構造については、後述する。 Returning to FIG. 1, the memory 11 stores duplication management information 111. The duplication management information 111 manages the duplication history with the input data for the already stored chunks. That is, the duplication management information 111 manages, as duplication history, the data length and the number of times of duplication with the input data for each of a plurality of chunks obtained by dividing already transmitted data. The data structure of the duplication management information 111 will be described later.

制御部２０は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、これらによって種々の処理を実行する。そして、制御部２０は、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などの集積回路の電子回路に対応する。または、制御部２０は、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）などの電子回路に対応する。さらに、制御部２０は、チャンク分割部２１、ハッシュ計算部２２、重複判定部２３、新規登録部２４、重複管理部２５、キャッシュ移動部２６および送信部２７を有する。 The control unit 20 has an internal memory for storing programs defining various processing procedures and control data, and executes various processes using these. And the control part 20 respond | corresponds to the electronic circuit of integrated circuits, such as ASIC (Application Specific Integrated Circuit) and FPGA (Field Programmable Gate Array). Alternatively, the control unit 20 corresponds to an electronic circuit such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit). Furthermore, the control unit 20 includes a chunk division unit 21, a hash calculation unit 22, a duplication determination unit 23, a new registration unit 24, a duplication management unit 25, a cache migration unit 26, and a transmission unit 27.

チャンク分割部２１は、入力データを複数のチャンクに分割する。ここで、チャンクとは、データがバウンダリで区別されることにより得られる可変長ブロックのことをいう。入力データを複数のチャンクに分割する方法は、以下のように実行される。 The chunk division unit 21 divides input data into a plurality of chunks. Here, a chunk means a variable length block obtained by distinguishing data by boundary. A method of dividing input data into a plurality of chunks is executed as follows.

例えば、チャンク分割部２１は、入力データを固定長のウィンドウサイズでスキャンする。また、チャンク分割部２１は、ウィンドウデータのハッシュ値をラビン（Ｒａｂｉｎ）のフィンガープリント（ｆｉｎｇｅｒｐｒｉｎｔ）と呼ばれる方法（「ｆｐ」と略記）で計算する。一例として、ウィンドウサイズをｎ、ウィンドウ内のデータをｗ_０、ｗ_１、・・・、ｗ_ｎ−１とすると、ｆｐは、以下の式（１）で表わされる。なお、式（１）で示されるｐは、比較的大きな素数である。

For example, the chunk division unit 21 scans input data with a fixed window size. Further, the chunk division unit 21 calculates a hash value of the window data by a method called “Rabin” fingerprint (abbreviated as “fp”). As an example, assuming that the window size is n and the data in the window is w ₀ , w ₁ ,..., W _n−1 , fp is expressed by the following equation (1). In addition, p shown by Formula (1) is a comparatively big prime number.

また、チャンク分割部２１は、式（１）の結果および式（２）を用いて、１バイトずつ延長したｆｐをそれぞれ計算する。

Further, the chunk division unit 21 calculates fp extended by 1 byte by using the result of Expression (1) and Expression (2).

また、チャンク分割部２１は、計算で得られた複数のｆｐのうち、後方（あるいは前方）のｍビットが予め定められたビットパターンと一致するｆｐを探索する。探索できたｆｐによってデータのバウンダリが得られる。ｆｐのｍビットが一致する確率は、１／２^ｍの確率であるので、例えばｍが「１０」であれば平均して２^１０（１ＫＢ）に１回の割合でデータのバウンダリが得られる。チャンク分割部２１は、探索できたｆｐのデータを１つのチャンクとして分割する。なお、入力データを複数のチャンクに分割する方法は、これに限定されず、公知のいかなる技術を用いても良い。 Further, the chunk division unit 21 searches for a fp in which m bits in the rear (or front) match a predetermined bit pattern among a plurality of fps obtained by calculation. The boundary of data is obtained by fp that can be searched. Since the probability that the m bits of fp match is a probability of 1/2 ^m , for example, if m is “10”, a data boundary is obtained at a rate of once every 2 ¹⁰ (1 KB) on average. The chunk division unit 21 divides the found fp data as one chunk. Note that the method of dividing the input data into a plurality of chunks is not limited to this, and any known technique may be used.

ハッシュ計算部２２は、チャンク分割部２１によって分割されたチャンクに対して、ハッシュ値を算出する。例えば、チャンク算出部２２は、チャンク分割部２１によって分割されたチャンクを順次選択する。チャンク算出部２２は、ＳＨＡ１（Secure Hash Algorithm １）を用いて、選択したチャンクに対して、ハッシュ値を算出する。 The hash calculator 22 calculates a hash value for the chunk divided by the chunk divider 21. For example, the chunk calculation unit 22 sequentially selects the chunks divided by the chunk division unit 21. The chunk calculation unit 22 calculates a hash value for the selected chunk using SHA1 (Secure Hash Algorithm 1).

重複判定部２３は、選択されたチャンクが、メモリ１１に記憶された複数のチャンクのうちいずれかのチャンクと重複するか否かを判定する。例えば、重複判定部２３は、重複管理情報１１１を用いて、選択されたチャンクの算出されたハッシュ値が既にメモリ１１に記憶されたチャンクのハッシュ値と一致するか否かを判定する。 The duplication determination unit 23 determines whether or not the selected chunk overlaps any one of the plurality of chunks stored in the memory 11. For example, the duplication determination unit 23 uses the duplication management information 111 to determine whether the calculated hash value of the selected chunk matches the hash value of the chunk already stored in the memory 11.

新規登録部２４は、選択されたチャンクが、メモリ１１に記憶された複数のチャンクのうちいずれのチャンクとも重複しないと判定された場合に、選択されたチャンクをメモリ１１に新規登録する。例えば、新規登録部２４は、重複判定部２３によって、選択されたチャンクの算出されたハッシュ値が既にメモリ１１に記憶されたチャンクのいずれのハッシュ値とも一致しないと判定された場合に、チャンクをメモリ１１に保存する。また、新規登録部２４は、選択されたチャンクの情報を重複管理情報１１１に登録する。 When it is determined that the selected chunk does not overlap any of the plurality of chunks stored in the memory 11, the new registration unit 24 newly registers the selected chunk in the memory 11. For example, when the new registration unit 24 determines that the calculated hash value of the selected chunk does not match any hash value of the chunk already stored in the memory 11 by the duplication determination unit 23, Save in the memory 11. Further, the new registration unit 24 registers the information of the selected chunk in the duplication management information 111.

ここで、重複管理情報１１１のデータ構造について、図４を参照して説明する。図４は、実施例１に係る重複管理情報のデータ構造の一例を示す図である。図４に示すように、重複管理情報１１１は、出現位置１１１ｂ、長さ１１１ｃ、出現回数１１１ｄおよび継続１１１ｅを、ハッシュ値（ＳＨＡ１）１１１ａに対応付けて記憶する。ハッシュ値（ＳＨＡ１）１１１ａは、チャンクのＳＨＡ１によって算出されたハッシュ値を示す。出現位置１１１ｂは、チャンクの出現位置を示す。長さ１１１ｃは、チャンクの長さを示す。出現回数１１１ｄは、チャンクが既に出現した回数を示す。継続１１１ｅは、入力データが分割された複数のチャンクで重複が継続するチャンクの数を示す。継続１１１ｅには、入力データの先頭のチャンクに対応する部分に設定される。なお、出現回数１１１ｄおよび継続１１１ｅは、後述する重複管理部２５によって設定される。また、出現回数１１１ｄ、継続１１１ｅは、一例として、それぞれ、継続重複長、重複回数に対応する。 Here, the data structure of the duplication management information 111 will be described with reference to FIG. FIG. 4 is a diagram illustrating an example of a data structure of duplication management information according to the first embodiment. As shown in FIG. 4, the duplication management information 111 stores the appearance position 111b, the length 111c, the number of appearances 111d, and the continuation 111e in association with the hash value (SHA1) 111a. The hash value (SHA1) 111a indicates a hash value calculated by the SHA1 of the chunk. The appearance position 111b indicates the appearance position of the chunk. The length 111c indicates the length of the chunk. The appearance count 111d indicates the number of times the chunk has already appeared. The continuation 111e indicates the number of chunks in which duplication continues in a plurality of chunks into which input data is divided. The continuation 111e is set to a portion corresponding to the first chunk of the input data. Note that the appearance count 111d and the continuation 111e are set by the duplication management unit 25 described later. Further, the appearance count 111d and the continuation 111e correspond to the continuation overlap length and the overlap count, respectively, as an example.

一例として、チャンクａ、ｂ、ｃが含まれる入力データが入力され、後続してチャンクａ、ｂが含まれる入力データが入力された場合の各チャンクの情報が記憶されている。ハッシュ値（ＳＨＡ１）１１１ａが「ａａａ」である情報は、チャンクａの情報である。ハッシュ値（ＳＨＡ１）１１１ａが「ｂｂｂ」である情報は、チャンクｂの情報である。ハッシュ値（ＳＨＡ１）１１１ａが「ｃｃｃ」である情報は、チャンクｃの情報である。チャンクａが入力データの先頭である。ハッシュ値（ＳＨＡ１）１１１ａが「ａａａ」である場合に、出現位置１１１ｂとして「０」、長さ１１１ｃとして「１２００」、出現回数１１１ｄとして「２」と記憶している。そして、継続１１１ｅとして「２」と記憶している。これは、重複が継続するチャンクの数がａ、ｂの２つであるからである。 As an example, information of each chunk when input data including chunks a, b, and c is input and subsequently input data including chunks a and b is input is stored. The information whose hash value (SHA1) 111a is “aaa” is the information of chunk a. The information whose hash value (SHA1) 111a is “bbb” is the information of chunk b. The information whose hash value (SHA1) 111a is “ccc” is the information of chunk c. Chunk a is the head of the input data. When the hash value (SHA1) 111a is “aaa”, “0” is stored as the appearance position 111b, “1200” is stored as the length 111c, and “2” is stored as the appearance count 111d. Then, “2” is stored as the continuation 111e. This is because the number of chunks in which duplication continues is two, a and b.

図１に戻って、新規登録部２４は、選択されたチャンクについて、ハッシュ値（ＳＨＡ１）１１１ａとしてハッシュ計算部２２によって算出されたハッシュ値、出現位置１１１ｂとして新規のチャンクのデータ量を加算して得られたオフセット値を設定する。新規登録部２４は、長さ１１１ｃとして当該チャンクの長さ、出現回数１１１ｄとして「１」、継続１１１ｅとして「０」を設定する。 Returning to FIG. 1, the new registration unit 24 adds the hash value calculated by the hash calculation unit 22 as the hash value (SHA1) 111a and the data amount of the new chunk as the appearance position 111b for the selected chunk. Set the obtained offset value. The new registration unit 24 sets the length of the chunk as the length 111c, “1” as the number of appearances 111d, and “0” as the continuation 111e.

重複管理部２５は、選択されたチャンクが、メモリ１１に記憶された複数のチャンクのうちいずれかのチャンクと重複すると判定された場合に、選択されたチャンクの重複を管理する。例えば、重複管理部２５は、重複判定部２３によって、選択されたチャンクの算出されたハッシュ値が既にメモリ１１に記憶されたチャンクのいずれかのハッシュ値と一致すると判定された場合に、以下の処理を行う。重複管理部２５は、入力データの先頭のチャンクが認識されていない場合に、選択されたチャンクを入力データの先頭のチャンクとして認識し、選択されたチャンクに対応する継続１１１ｅに１を加算する。重複管理部２５は、入力データの先頭のチャンクが既に認識されている場合に、選択されたチャンクが直前に重複したチャンクと継続していれば、先頭のチャンクに対応する継続１１１ｅに１を加算する。重複管理部２５は、入力データの先頭のチャンクが既に認識されている場合に、選択されたチャンクが直近に重複したチャンクと継続していなければ、選択されたチャンクを入力データの先頭のチャンクとして認識し、選択されたチャンクに対応する継続１１１ｅに１を加算する。また、重複管理部２５は、選択されたチャンクの出現回数１１１ｄに１を加算する。また、重複管理部２５は、選択されたチャンクの情報を重複管理情報１１１の先頭に設定する。 The duplication management unit 25 manages duplication of the selected chunk when it is determined that the selected chunk overlaps any one of the plurality of chunks stored in the memory 11. For example, when the duplication management unit 25 determines that the calculated hash value of the selected chunk matches the hash value of one of the chunks already stored in the memory 11, Process. If the leading chunk of the input data is not recognized, the duplication management unit 25 recognizes the selected chunk as the leading chunk of the input data, and adds 1 to the continuation 111e corresponding to the selected chunk. The duplication management unit 25 adds 1 to the continuation 111e corresponding to the first chunk if the first chunk of the input data has already been recognized and the selected chunk continues with the previously duplicated chunk. To do. When the first chunk of the input data is already recognized, the duplication management unit 25 determines that the selected chunk is the first chunk of the input data if the selected chunk does not continue with the most recently duplicated chunk. Recognize and add 1 to the continuation 111e corresponding to the selected chunk. Further, the duplication management unit 25 adds 1 to the appearance count 111d of the selected chunk. In addition, the duplication management unit 25 sets information on the selected chunk at the head of the duplication management information 111.

キャッシュ移動部２６は、新規のチャンクをメモリ１１に登録する際、メモリ１１に記憶されたチャンクの数が上限である場合に、メモリ１１に記憶されたチャンクのうちＬＲＵのチャンクを抽出する。抽出されたチャンクは、削除対象となる。キャッシュ移動部２６は、削除対象のチャンクにおける重複時の予測継続長および重複の出現回数に応じて、削除対象のチャンクを含むデータを、ディスク１２およびＳＳＤ１３のいずれかのキャッシュに移動する。例えば、キャッシュ移動部２６は、重複管理情報１１１の末尾のチャンクを削除対象とする。また、キャッシュ移動部２６は、削除対象のチャンクの継続１１１ｅが所定の継続閾値より大きい場合に、当該チャンクから継続する継続１１１ｅ分のチャンクのデータをディスク１２に移動する。すなわち、チャンクの重複の特性が、ＲＬＣ（rare and long chunk series）である場合に、キャッシュ移動部２６は、該当するデータを、階層化されたキャッシュの低速のディスク１２に移動する。また、キャッシュ移動部２６は、削除対象のチャンクの継続１１１ｅが所定の継続閾値以下である場合に、削除対象のチャンクの出現回数１１１ｄが所定の出現回数閾値より大きい場合に、削除対象のチャンクを含むデータをＳＳＤ１３に移動する。すなわち、チャンクの重複の特性が、ＦＳＣ（frequent and short chunk series）である場合に、キャッシュ移動部２６は、該当するデータを、階層化されたキャッシュの高速のＳＳＤ１３に移動する。また、キャッシュ移動部２６は、移動したチャンクをメモリ１１から削除するとともに、移動したチャンクの情報を重複管理情報１１１から削除する。 When registering a new chunk in the memory 11, the cache moving unit 26 extracts an LRU chunk from the chunks stored in the memory 11 when the number of chunks stored in the memory 11 is the upper limit. The extracted chunk is a deletion target. The cache moving unit 26 moves the data including the deletion target chunk to one of the caches of the disk 12 and the SSD 13 according to the predicted continuation length at the time of duplication in the deletion target chunk and the number of occurrences of the duplication. For example, the cache transfer unit 26 sets the last chunk of the duplication management information 111 as a deletion target. In addition, when the continuation 111e of the chunk to be deleted is larger than a predetermined continuation threshold, the cache transfer unit 26 moves the chunk data for the continuation 111e that continues from the chunk to the disk 12. That is, when the chunk duplication characteristic is RLC (rare and long chunk series), the cache mover 26 moves the corresponding data to the low-speed disk 12 of the hierarchical cache. In addition, when the deletion target chunk continuation 111e is equal to or smaller than a predetermined continuation threshold value, the cache transfer unit 26 determines that the deletion target chunk is greater than a predetermined appearance frequency threshold value. The included data is moved to the SSD 13. In other words, when the characteristic of chunk overlap is FSC (frequent and short chunk series), the cache mover 26 moves the corresponding data to the high-speed SSD 13 of the hierarchical cache. Further, the cache transfer unit 26 deletes the moved chunk from the memory 11 and deletes the moved chunk information from the duplication management information 111.

送信部２７は、選択したチャンクについて、重複している場合には、内容が重複しているチャンクのＩＤのみを送信する。また、送信部２７は、選択したチャンクについて、重複していない場合には、チャンクに対応するデータ（生データ）を送信する。 If the selected chunk is duplicated, the transmitting unit 27 transmits only the ID of the chunk whose contents are duplicated. Moreover, the transmission part 27 transmits the data (raw data) corresponding to a chunk, when it does not overlap about the selected chunk.

［キャッシュ保存処理の手順］
次に、実施例１に係るキャッシュ保存処理の手順を、図５Ａ〜図５Ｃを参照して説明する。図５Ａ〜図５Ｃは、実施例１に係るキャッシュ保存処理のフローチャートを示す図である。なお、重複中の先頭のチャンクは「重複先頭チャンク」といい、「ＨＥＡＤＣＨＵＮＫ」で表わすものとする。重複チャンクの直前の重複チャンクは「直前重複チャンク」といい、「ＰＲＥＣＨＵＮＫ」で表わすものとする。現在の出現位置は、「現出現位置」といい、「ＯＦＦＳＥＴ」で表わすものとする。メモリ１１に記憶されたチャンクの数は、「ｄａｔａｎｕｍ」で表わすものとする。ＨＥＡＤＣＨＵＮＫ、ＰＲＥＣＨＵＮＫ、ＯＦＦＳＥＴおよびｄａｔａｎｕｍは、記憶部１０に一時的に記憶されるローカルエリアである。 [Cache saving processing procedure]
Next, the procedure of the cache storage process according to the first embodiment will be described with reference to FIGS. 5A to 5C. 5A to 5C are flowcharts illustrating the cache storage process according to the first embodiment. Note that the first chunk being duplicated is referred to as “duplicate first chunk” and is represented by “HEADCHUNK”. The duplicate chunk immediately before the duplicate chunk is called “immediate duplicate chunk”, and is represented by “PRECHUNK”. The current appearance position is referred to as “current appearance position” and is represented by “OFFSET”. The number of chunks stored in the memory 11 is represented by “datanum”. HEADCHUNK, PRECHUNK, OFFSET, and dataum are local areas temporarily stored in the storage unit 10.

図５Ａに示すように、チャンク分割部２１は、ＨＥＡＤＣＨＵＮＫおよびＰＲＥＣＨＵＮＫにｎｕｌｌを設定し、ＯＦＦＳＥＴおよびｄａｔａｎｕｍに０を設定する（ステップＳ１１）。チャンク分割部２１は、データの受信を待つ（ステップＳ１２）。 As shown in FIG. 5A, the chunk division unit 21 sets null to HEADCHUNK and PRECHUNK, and sets 0 to OFFSET and dataum (step S11). The chunk division unit 21 waits for data reception (step S12).

チャンク分割部２１は、データが終了であるか否かを判定する（ステップＳ１３）。データが終了である場合には（ステップＳ１３；Ｙｅｓ）、チャンク分割部２１は、キャッシュ保存処理を終了する。 The chunk division unit 21 determines whether the data is complete (step S13). When the data is finished (step S13; Yes), the chunk division unit 21 finishes the cache storage process.

一方、データが終了でない場合には（ステップＳ１３；Ｎｏ）、チャンク分割部２１は、データ（入力データ）を受信する（ステップＳ１４）。チャンク分割部２１は、受信した入力データを複数のチャンクに分割する（ステップＳ１５）。そして、チャンク分割部２１は、分割したチャンクのうち、１つのチャンクを選択する（ステップＳ１６）。 On the other hand, when the data is not finished (step S13; No), the chunk division unit 21 receives the data (input data) (step S14). The chunk division unit 21 divides the received input data into a plurality of chunks (step S15). Then, the chunk dividing unit 21 selects one chunk from the divided chunks (step S16).

続いて、制御部２０は、選択したチャンクに対して、キャッシュ保存処理を行う（ステップＳ１７）。なお、キャッシュ保存処理のフローチャートについては、図５Ｂおよび図５Ｃで説明する。そして、制御部２０は、全てのチャンクにおける処理が終了したか否かを判定する（ステップＳ１８）。全てのチャンクにおける処理が終了していない場合には（ステップＳ１８；Ｎｏ）、制御部２０は、次のチャンクを選択させるべく、ステップＳ１６に移行する。 Subsequently, the control unit 20 performs a cache storage process on the selected chunk (step S17). Note that the flowchart of the cache storage process will be described with reference to FIGS. 5B and 5C. And the control part 20 determines whether the process in all the chunks was complete | finished (step S18). If the processing has not been completed for all the chunks (step S18; No), the control unit 20 proceeds to step S16 to select the next chunk.

一方、全てのチャンクにおける処理が終了した場合には（ステップＳ１８；Ｙｅｓ）、制御部２０は、次のデータの受信待ちをさせるべく、ステップＳ１２に移行する。 On the other hand, when the processing in all the chunks is completed (step S18; Yes), the control unit 20 proceeds to step S12 so as to wait for reception of the next data.

図５Ｂに示すように、ハッシュ計算部２２は、選択したチャンクに対して、ＳＨＡ１を用いてハッシュ値を算出する（ステップＳ２１）。ハッシュ計算部２２は、算出したハッシュ値をローカルエリアであるＳＨＡＶＡＬに設定する。そして、重複判定部２３は、選択したチャンクの重複を探索する（ステップＳ２２）。 As shown in FIG. 5B, the hash calculator 22 calculates a hash value for the selected chunk using SHA1 (step S21). The hash calculation unit 22 sets the calculated hash value in the SHAVAL that is the local area. And the duplication determination part 23 searches for duplication of the selected chunk (step S22).

そして、重複判定部２３は、選択したチャンクが新規であるか否かを判定する（ステップＳ２３）。例えば、重複判定部２３は、選択したチャンクのハッシュ値が重複管理情報１１１に記憶されたいずれのハッシュ値（ＳＨＡ１）１１１ａとも一致しないかどうかを判定する。 And the duplication determination part 23 determines whether the selected chunk is new (step S23). For example, the duplication determination unit 23 determines whether the hash value of the selected chunk does not match any hash value (SHA1) 111a stored in the duplication management information 111.

そして、選択したチャンクが新規である場合には（ステップＳ２３；Ｙｅｓ）、新規登録部２４は、新規のチャンクの情報を重複管理情報１１１に登録する（ステップＳ２４）。例えば、ハッシュ値（ＳＨＡ１）１１１ａに、ＳＨＡＶＡＬに設定された値が設定される。出現位置１１１ｂに、ＯＦＦＳＥＴに設定された値が設定される。長さ１１１ｃに、チャンクの長さが設定される。出現回数１１１ｄに、「１」が設定される。継続１１１ｅに、「０」が設定される。 If the selected chunk is new (step S23; Yes), the new registration unit 24 registers the new chunk information in the duplication management information 111 (step S24). For example, the value set in SHAVAL is set in the hash value (SHA1) 111a. A value set to OFFSET is set at the appearance position 111b. The chunk length is set to the length 111c. “1” is set in the appearance count 111d. “0” is set in the continuation 111e.

そして、新規登録部２４は、ローカルエリアであるＨＥＡＤＣＨＵＮＫ（重複先頭チャンク）、ＯＦＦＳＥＴ（現出現位置）およびｄａｔａｎｕｍ（メモリ１１に記憶されたチャンクの数）を調整する（ステップＳ２５）。例えば、ＨＥＡＤＣＨＵＮＫに、ｎｕｌｌが設定される。ＯＦＦＳＥＴに、選択したチャンクの長さを加算して得られた値が設定される。ｄａｔａｎｕｍに、選択したチャンク分の１を加算して得られた値が設定される。 Then, the new registration unit 24 adjusts HEADCHUNK (overlapping head chunk), OFFSET (current appearance position), and dataum (number of chunks stored in the memory 11), which are local areas (step S25). For example, null is set in HEADCHUNK. A value obtained by adding the length of the selected chunk to OFFSET is set. A value obtained by adding 1 to the selected chunk is set to dataum.

そして、新規登録部２４は、新規のチャンクの情報を重複管理情報１１１の先頭に設定する（ステップＳ２６）。この新規のチャンクが最新で使われたからである。そして、新規登録部２４は、キャッシュ保存処理におけるキャッシュ移動処理をさせるべく、ステップＳ４１に移行する。なお、キャッシュ保存処理におけるキャッシュ移動処理のフローチャートについては、図５Ｃで説明する。 Then, the new registration unit 24 sets the new chunk information at the head of the duplication management information 111 (step S26). This is because this new chunk has been used recently. Then, the new registration unit 24 proceeds to step S41 in order to perform the cache transfer process in the cache storage process. Note that a flowchart of the cache transfer process in the cache storage process will be described with reference to FIG. 5C.

ステップＳ２３において、選択したチャンクが新規でない場合には（ステップＳ２３；Ｎｏ）、重複管理部２５は、重複管理情報１１１に登録済みの、選択したチャンク（以降、登録済チャンク）の情報を取得する（ステップＳ２７）。続いて、重複管理部２５は、ＨＥＡＤＣＨＵＮＫ（重複先頭チャンク）がｎｕｌｌであるか否かを判定する（ステップＳ２８）。すなわち、重複管理部２５は、入力データの先頭のチャンクが認識されていないか否かを判定する。 If the selected chunk is not new in step S23 (step S23; No), the duplication management unit 25 acquires information on the selected chunk (hereinafter, registered chunk) registered in the duplication management information 111. (Step S27). Subsequently, the duplication management unit 25 determines whether or not HEADCHUNK (duplication head chunk) is null (step S28). That is, the duplication management unit 25 determines whether or not the leading chunk of the input data is recognized.

ＨＥＡＤＣＨＵＮＫ（重複先頭チャンク）がｎｕｌｌである場合には（ステップＳ２８；Ｙｅｓ）、重複管理部２５は、ＨＥＡＤＣＨＵＮＫに登録済チャンクのハッシュ値を設定（ステップＳ２９）する。すなわち、重複管理部２５は、登録済チャンクを重複の先頭のチャンクとして認識する。そして、重複管理部２５は、ステップＳ３２に移行する。 If HEADCHUNK (duplicate head chunk) is null (step S28; Yes), the duplication management unit 25 sets the hash value of the registered chunk in HEADCHUNK (step S29). That is, the duplication management unit 25 recognizes the registered chunk as the first chunk of duplication. And the duplication management part 25 transfers to step S32.

一方、ＨＥＡＤＣＨＵＮＫ（重複先頭チャンク）がｎｕｌｌでない場合には（ステップＳ２８；Ｎｏ）、重複管理部２５は、ＰＲＥＣＨＮＫ（直前重複チャンク）の出現位置と長さを加算して得た値が登録済チャンクの出現位置であるか否かを判定する（ステップＳ３０）。すなわち、重複管理部２５は、登録済チャンクが直前に重複したチャンクと継続しているか否かを判定する。 On the other hand, when HEADCHUNK (duplicate head chunk) is not null (step S28; No), the duplication management unit 25 adds the appearance position and length of PRECHNK (preceding duplicate chunk) to the registered chunk. It is determined whether it is the appearance position of (step S30). That is, the duplication management unit 25 determines whether the registered chunk is continued with the chunk that was duplicated immediately before.

ＰＲＥＣＨＵＮＫ（直前重複チャンク）の出現位置と長さを加算して得た値が登録済チャンクの出現位置でない場合には（ステップＳ３０；Ｎｏ）、重複管理部２５は、ＨＥＡＤＣＨＵＮＫに登録済チャンクのハッシュ値を設定する（ステップＳ３１）。すなわち、重複管理部２５は、登録済チャンクが直前に重複したチャンクと継続していないので、登録済チャンクを重複の先頭のチャンクとして認識する。そして、重複管理部２５は、ステップＳ３２に移行する。 When the value obtained by adding the appearance position and the length of PRECHUNK (preceding duplicate chunk) is not the appearance position of the registered chunk (step S30; No), the duplication management unit 25 hashes the chunks registered in HEADCHUNK. A value is set (step S31). That is, since the registered chunk does not continue with the previously duplicated chunk, the duplication management unit 25 recognizes the registered chunk as a duplicated first chunk. And the duplication management part 25 transfers to step S32.

ＰＲＥＣＨＮＫ（直前重複チャンク）の出現位置と長さを加算して得た値が登録済チャンクの出現位置である場合には（ステップＳ３０；Ｙｅｓ）、重複管理部２５は、重複の先頭のチャンクが既に認識されているので、ステップＳ３２に移行する。 When the value obtained by adding the appearance position and the length of PRECHNK (preceding duplicate chunk) is the appearance position of the registered chunk (step S30; Yes), the duplication management unit 25 determines that the first chunk of duplication is Since it has already been recognized, the process proceeds to step S32.

ステップＳ３２において、重複管理部２５は、ＨＥＡＤＣＨＵＮＫの継続１１１ｅに１を加算する（ステップＳ３２）。すなわち、重複管理部２５は、重複の先頭のチャンクに対応する継続１１１ｅに１を加算する。そして、重複管理部２５は、登録済チャンクの出現回数１１１ｄに１を加算する（ステップＳ３３）。重複管理部２５は、ＰＲＥＣＨＵＮＫ（直前重複チャンク）に登録済チャンクのハッシュ値を設定する（ステップＳ３４）。 In step S32, the duplication management unit 25 adds 1 to the HEADCHUNK continuation 111e (step S32). That is, the duplication management unit 25 adds 1 to the continuation 111e corresponding to the first chunk of duplication. And the duplication management part 25 adds 1 to the appearance count 111d of the registered chunk (step S33). The duplication management unit 25 sets the hash value of the registered chunk in PRECHUNK (previous duplication chunk) (step S34).

そして、重複管理部２５は、登録済チャンクの情報を重複管理情報１１１の先頭に設定する（ステップＳ３５）。この登録済チャンクが最新で使われたからである。そして、重複管理部２５は、次のチャンクを選択すべく、ステップＳ１８に戻る。 And the duplication management part 25 sets the information of the registered chunk to the head of duplication management information 111 (step S35). This is because the registered chunk was used the latest. Then, the duplication management unit 25 returns to step S18 to select the next chunk.

図５Ｃに示すように、キャッシュ移動部２６は、以下のように、キャッシュ移動処理を実行する。キャッシュ移動部２６は、ｄａｔａｎｕｍ（メモリ１１に記憶されたチャンクの数）が上限値を超えたか否かを判定する（ステップＳ４１）。ｄａｔａｎｕｍが上限値を超えていない場合には（ステップＳ４１；Ｎｏ）、キャッシュ移動部２６は、キャッシュ移動処理を終了し、ステップＳ１８に戻る。 As shown in FIG. 5C, the cache transfer unit 26 executes a cache transfer process as follows. The cache migration unit 26 determines whether or not the dataum (number of chunks stored in the memory 11) exceeds the upper limit value (step S41). If the dataumum does not exceed the upper limit (step S41; No), the cache transfer unit 26 ends the cache transfer process and returns to step S18.

一方、ｄａｔａｎｕｍが上限値を超えた場合には（ステップＳ４１；Ｙｅｓ）、キャッシュ移動部２６は、重複管理情報１１１の末尾のチャンクを削除チャンクとする（ステップＳ４２）。すなわち、キャッシュ移動部２６は、重複管理情報１１１の末尾のチャンクを削除対象とする。 On the other hand, when the dataumum exceeds the upper limit value (step S41; Yes), the cache transfer unit 26 sets the last chunk of the duplication management information 111 as a deletion chunk (step S42). That is, the cache transfer unit 26 sets the last chunk of the duplication management information 111 as a deletion target.

そして、キャッシュ移動部２６は、削除チャンクの継続１１１ｅが継続閾値より大きいか否かを判定する（ステップＳ４３）。削除チャンクの継続１１１ｅが継続閾値より大きい場合には（ステップＳ４３；Ｙｅｓ）、キャッシュ移動部２６は、削除チャンクを含むデータをディスク１２に退避する（ステップＳ４４）。例えば、キャッシュ移動部２６は、削除チャンクから継続する継続１１１ｅ分のチャンクのデータをディスク１２に移動する。すなわち、チャンクの重複の特性が、ＲＬＣ（rare and long chunk series）である場合に、キャッシュ移動部２６は、該当するデータを、階層化されたキャッシュの低速のディスク１２に移動する。そして、キャッシュ移動部２６は、ステップＳ４８に移行する。 Then, the cache migration unit 26 determines whether or not the deletion chunk continuation 111e is larger than the continuation threshold (step S43). When the deletion chunk continuation 111e is larger than the continuation threshold (step S43; Yes), the cache transfer unit 26 saves the data including the deletion chunk to the disk 12 (step S44). For example, the cache transfer unit 26 moves the chunk data for 111e continuing from the deleted chunk to the disk 12. That is, when the chunk duplication characteristic is RLC (rare and long chunk series), the cache mover 26 moves the corresponding data to the low-speed disk 12 of the hierarchical cache. Then, the cache transfer unit 26 proceeds to Step S48.

削除チャンクの継続１１１ｅが継続閾値以下である場合には（ステップＳ４３；Ｎｏ）、キャッシュ移動部２６は、削除チャンクの出現回数１１１ｄが出現回数閾値より大きいか否かを判定する（ステップＳ４５）。削除チャンクの出現回数１１１ｄが出現回数閾値より大きい場合には（ステップＳ４５；Ｙｅｓ）、キャッシュ移動部２６は、削除チャンクを含むデータをＳＳＤ１３に退避する（ステップＳ４６）。例えば、キャッシュ移動部２６は、削除チャンクから継続する継続１１１ｅ分のチャンクのデータをＳＳＤ１３に移動する。すなわち、チャンクの重複の特性が、ＦＳＣ（frequent and short chunk series）である場合に、キャッシュ移動部２６は、該当するデータを、階層化されたキャッシュの高速のＳＳＤ１３に移動する。そして、キャッシュ移動部２６は、ステップＳ４８に移行する。 When the deletion chunk continuation 111e is equal to or less than the continuation threshold (step S43; No), the cache migration unit 26 determines whether or not the deletion chunk appearance count 111d is larger than the appearance count threshold (step S45). When the appearance count 111d of the deleted chunk is larger than the appearance count threshold (step S45; Yes), the cache transfer unit 26 saves the data including the deleted chunk to the SSD 13 (step S46). For example, the cache moving unit 26 moves the chunk data for 111e continuing from the deleted chunk to the SSD 13. In other words, when the characteristic of chunk overlap is FSC (frequent and short chunk series), the cache mover 26 moves the corresponding data to the high-speed SSD 13 of the hierarchical cache. Then, the cache transfer unit 26 proceeds to Step S48.

一方、削除チャンクの出現回数１１１ｄが出現回数閾値以下である場合には（ステップＳ４５；Ｎｏ）、キャッシュ移動部２６は、削除チャンクを破棄する（ステップＳ４７）。そして、キャッシュ移動部２６は、ステップＳ４８に移行する。 On the other hand, when the deleted chunk appearance count 111d is equal to or smaller than the appearance count threshold (step S45; No), the cache transfer unit 26 discards the deleted chunk (step S47). Then, the cache transfer unit 26 proceeds to Step S48.

ステップＳ４８において、キャッシュ移動部２６は、ｄａｔａｎｕｍ（メモリ１１に記憶されたチャンクの数）から１を減算する（ステップＳ４８）。そして、キャッシュ移動部２６は、キャッシュ移動処理を終了し、ステップＳ１８に戻る。 In step S48, the cache transfer unit 26 subtracts 1 from dataum (number of chunks stored in the memory 11) (step S48). Then, the cache transfer unit 26 ends the cache transfer process and returns to step S18.

［実施例１の効果］
上記実施例１によれば、通信装置１は、処理対象のデータが新規である場合に、処理対象のデータをメモリ１１に記憶する。通信装置１は、メモリ１１に記憶されたデータ毎に、処理対象のデータと重複する重複回数および処理対象のデータと重複する際に継続して重複する長さを示す継続重複長を重複履歴として管理する。通信装置１は、新規のデータを処理する際、メモリ１１に記憶されたデータの数が上限である場合に、所定の削除対象のデータを抽出する。そして、通信装置１は、削除対象のデータを、ディスク１２およびＳＳＤ１３のいずれかのキャッシュに移動する。かかる構成によれば、通信装置１は、メモリ１１のデータ数が上限である場合に、削除対象のデータをディスク１２およびＳＳＤ１３のいずれかのキャッシュに移動するので、以降の処理対象のデータについて、キャッシュを効率的にヒットさせることが可能となる。 [Effect of Example 1]
According to the first embodiment, the communication device 1 stores the processing target data in the memory 11 when the processing target data is new. For each piece of data stored in the memory 11, the communication device 1 uses, as the duplication history, the number of times of duplication that overlaps with the data to be processed and the continuous duplication length that indicates the length of duplication that continues when the data to be processed overlaps to manage. When processing new data, the communication device 1 extracts predetermined data to be deleted when the number of data stored in the memory 11 is the upper limit. Then, the communication device 1 moves the data to be deleted to one of the caches of the disk 12 and the SSD 13. According to such a configuration, when the number of data in the memory 11 is the upper limit, the communication device 1 moves the data to be deleted to one of the caches of the disk 12 and the SSD 13. It becomes possible to hit the cache efficiently.

また、上記実施例１によれば、通信装置１は、削除対象のデータの継続重複長が第１の閾値より大きい場合に、削除対象のデータをディスク１２およびＳＳＤ１３のうち低速のディスク１２に移動する。かかる構成によれば、通信装置１は、削除対象のデータの重複の特性（ＲＬＣ）に合わせて、削除対象のデータを最適なキャッシュに保存することができる。 Further, according to the first embodiment, the communication device 1 moves the deletion target data to the low-speed disk 12 among the disk 12 and the SSD 13 when the continuous overlap length of the deletion target data is larger than the first threshold. To do. According to such a configuration, the communication device 1 can store the deletion target data in an optimum cache in accordance with the duplication characteristic (RLC) of the deletion target data.

また、上記実施例１によれば、通信装置１は、削除対象のデータの継続重複長が第１の閾値以下である場合、且つ削除対象のデータの重複回数が第２の閾値より大きい場合に、削除対象のデータをディスク１２およびＳＳＤ１３のうち高速のＳＳＤ１３に移動する。かかる構成によれば、通信装置１は、削除対象のデータの重複の特性（ＦＳＣ）に合わせて、削除対象のデータを最適なキャッシュに保存することができる。 Further, according to the first embodiment, the communication device 1 determines that the continuous duplication length of the data to be deleted is equal to or smaller than the first threshold and the duplication count of the data to be deleted is larger than the second threshold. The data to be deleted is moved to the high-speed SSD 13 out of the disk 12 and the SSD 13. According to such a configuration, the communication device 1 can store the deletion target data in an optimal cache in accordance with the duplication characteristic (FSC) of the deletion target data.

また、上記実施例１によれば、通信装置１は、削除対象のデータの継続重複長が第１の閾値以下である場合、且つ削除対象のデータの重複回数が第２の閾値以下である場合に、削除対象のデータを破棄する。かかる構成によれば、通信装置１は、キャッシュから不要なデータを破棄することで、重要なデータを記憶する容量を増加させることができる。 Further, according to the first embodiment, the communication device 1 has a case where the continuous duplication length of the data to be deleted is equal to or smaller than the first threshold and the number of times of duplication of the data to be deleted is equal to or smaller than the second threshold. In addition, the data to be deleted is discarded. According to this configuration, the communication device 1 can increase the capacity for storing important data by discarding unnecessary data from the cache.

ところで、実施例１では、通信装置１は、過去に送信したデータを記憶するキャッシュ保存領域を階層化する場合を説明した。すなわち、通信装置１は、メモリ１１に記憶されたデータが満杯になった場合に、メモリ１１に記憶された所定のデータの重複の特性に応じて、階層化されたキャッシュに所定のデータを移動する。しかしながら、通信装置１は、これに限定されず、さらに、キャッシュとして使用可能なメモリ１１の容量が小さい場合に、過去に送信したデータを間引きながらメモリ１１に記憶するようにしても良い。 By the way, in the first embodiment, the communication apparatus 1 has described the case where the cache storage area for storing data transmitted in the past is hierarchized. That is, when the data stored in the memory 11 is full, the communication device 1 moves the predetermined data to the hierarchical cache according to the duplication characteristics of the predetermined data stored in the memory 11. To do. However, the communication apparatus 1 is not limited to this, and when the capacity of the memory 11 that can be used as a cache is small, the communication apparatus 1 may store the data transmitted in the past in the memory 11 while thinning out the data.

そこで、実施例２では、通信装置１は、さらに、キャッシュとして使用可能なメモリ１１の容量が小さい場合に、過去に送信したデータを間引きながらメモリ１１に記憶する場合について説明する。 Therefore, in the second embodiment, a case will be described in which the communication device 1 further stores data transmitted in the past in the memory 11 when the capacity of the memory 11 usable as a cache is small.

［通信装置の構成］
図６は、実施例２に係る通信装置の構成を示す機能ブロック図である。なお、図１に示す通信装置１と同一の構成については同一符号を示すことで、その重複する構成および動作の説明については省略する。実施例１と実施例２とが異なるところは、重複判定部２３を重複判定部３１に変更した点にある。実施例１と実施例２とが異なるところは、新規登録部２４をサンプリングチャンク登録部３２に変更した点にある。実施例１と実施例２とが異なるところは、ディスク１４を追加した点にある。ディスク１４は、重複管理情報２１１を記憶する。 [Configuration of communication device]
FIG. 6 is a functional block diagram illustrating the configuration of the communication apparatus according to the second embodiment. In addition, about the same structure as the communication apparatus 1 shown in FIG. 1, the same code | symbol is shown, and the description of the overlapping structure and operation | movement is abbreviate | omitted. The difference between the first embodiment and the second embodiment is that the overlap determination unit 23 is changed to the overlap determination unit 31. The difference between the first embodiment and the second embodiment is that the new registration unit 24 is changed to the sampling chunk registration unit 32. The difference between the first embodiment and the second embodiment is that a disk 14 is added. The disk 14 stores duplication management information 211.

ここで、実施例２に係る通信装置１のキャッシュ保存処理の一例を、図７を参照して説明する。図７は、実施例２に係る通信装置のキャッシュ保存処理の一例を示す図である。図７に示すように、通信装置１は、入力データＡをチャンクと呼ばれる可変長ブロックに分割する。そして、通信装置１は、分割した各チャンクを、メモリ１１の利用比率に応じて、メモリ１１およびディスク１４に保存する。例えば、メモリ１１の利用比率が１／２である場合に、通信装置１は、新規のチャンクが出現するたびに、カウンタを１増やし、このカウンタをメモリ１１の利用比率の分母である２で割った余りでチャンクの保存先を変更する。一例として、余りが０であるチャンクの保存先をメモリ１１（またはディスク１４）とし、余りが１であるチャンクの保存先をディスク１４（またはメモリ１１）とする。 Here, an example of the cache storage process of the communication device 1 according to the second embodiment will be described with reference to FIG. FIG. 7 is a diagram illustrating an example of cache storage processing of the communication apparatus according to the second embodiment. As shown in FIG. 7, the communication device 1 divides the input data A into variable length blocks called chunks. Then, the communication device 1 stores the divided chunks in the memory 11 and the disk 14 according to the usage ratio of the memory 11. For example, when the usage ratio of the memory 11 is ½, the communication device 1 increments the counter by 1 each time a new chunk appears, and divides this counter by 2, which is the denominator of the usage ratio of the memory 11. Change the chunk storage location using the remainder. As an example, the storage destination of the chunk with a remainder of 0 is the memory 11 (or disk 14), and the storage destination of the chunk with a remainder of 1 is the disk 14 (or memory 11).

図７では、通信装置１は、入力データＡのキャッシュ１回目で、入力データＡの分割された各チャンクについて、偶数番目のチャンクを１次キャッシュであるメモリ１１に保存し、奇数番目のチャンクを２次キャッシュであるディスク１４に保存する。なお、メモリ１１の利用比率が１／２の場合について説明したが、これに限定されず、メモリ１１の容量に応じてメモリ１１の利用比率が決定されれば良い。一例として、メモリ１１の利用比率が３／１０の場合には、１０で割った余りが０〜２であるチャンクの保存先をメモリ１１とし、１０で割った余りが３〜９であるチャンクの保存先をディスク１４とすれば良い。 In FIG. 7, the communication device 1 stores the even-numbered chunk in the memory 11 that is the primary cache for each divided chunk of the input data A at the first cache of the input data A, and stores the odd-numbered chunk. Save to the disk 14 which is a secondary cache. Although the case where the usage ratio of the memory 11 is ½ has been described, the present invention is not limited to this, and the usage ratio of the memory 11 may be determined according to the capacity of the memory 11. As an example, when the usage ratio of the memory 11 is 3/10, the storage destination of the chunk whose remainder is 0 to 2 divided by 10 is the memory 11, and the chunk whose remainder is 3 to 9 is divided by 10 The storage destination may be the disk 14.

そして、通信装置１は、データをメモリ１１に保存する際にメモリ１１が満杯になった場合には、メモリ１１に記憶されたチャンクのうち、使われていない一番古いチャンク（ＬＲＵ：Least Recently Used）を抽出する。そして、通信装置１は、抽出したチャンクを削除対象とする。 When the memory 11 becomes full when data is stored in the memory 11, the communication device 1 stores the oldest unused chunk (LRU: Least Recently) among the chunks stored in the memory 11. Used) is extracted. Then, the communication device 1 sets the extracted chunk as a deletion target.

ここでは、削除対象のチャンクが入力データＡのチャンクｃ１であるとする。すると、通信装置１は、削除対象のチャンクｃ１から継続する継続分のチャンクｃ１〜ｃ７のデータをディスク１２またはＳＳＤ１３に移動する。通信装置１は、継続分のチャンクの数が継続閾値より大きい場合には、データを、階層化されたキャッシュの低速のディスク１２に移動する。通信装置１は、削除対象のチャンクｃ１のデータをメモリ１１から削除する。 Here, it is assumed that the chunk to be deleted is the chunk c1 of the input data A. Then, the communication device 1 moves the continuation chunk data c1 to c7 from the deletion target chunk c1 to the disk 12 or the SSD 13. When the number of chunks for the continuation is larger than the continuation threshold, the communication device 1 moves the data to the low-speed disk 12 of the hierarchical cache. The communication device 1 deletes the data of the deletion target chunk c1 from the memory 11.

この後、通信装置１は、入力データＡのキャッシュ２回目において、キャッシュ保存領域に入力データＡのチャンクｃ１〜ｃ７が全て記憶されているので、キャッシュヒットする。そして、通信装置１は、重複する入力データＡの重複したチャンクｃ１〜ｃ７を除去して送信することができる。 Thereafter, in the second cache of the input data A, the communication device 1 makes a cache hit because all the chunks c1 to c7 of the input data A are stored in the cache storage area. And the communication apparatus 1 can remove and transmit the duplicate chunks c1-c7 of the duplicate input data A.

これにより、通信装置１は、１次キャッシュのメモリ１１の容量が小さい場合であっても、２次キャッシュのディスク１４をともに用いてデータを記憶する。したがって、通信装置１は、データの重複の特性がＲＬＣ（rare and long chunk series）であるデータの一部を失わないで、データ送信の重複除去を実現することができる。 As a result, the communication device 1 stores the data using the disk 14 of the secondary cache together even when the capacity of the memory 11 of the primary cache is small. Therefore, the communication device 1 can realize data transmission deduplication without losing a part of data whose data duplication characteristic is RLC (rare and long chunk series).

図６に戻って、重複判定部３１は、選択されたチャンクが、メモリ１１に記憶された複数のチャンクのうちいずれかのチャンクと重複するか否かを判定する。例えば、重複判定部３１は、重複管理情報１１１を用いて、選択されたチャンクの算出されたハッシュ値が既にメモリ１１に記憶されたチャンクのハッシュ値と一致するか否かを判定する。 Returning to FIG. 6, the duplication determination unit 31 determines whether or not the selected chunk overlaps any one of the plurality of chunks stored in the memory 11. For example, the duplication determination unit 31 uses the duplication management information 111 to determine whether the calculated hash value of the selected chunk matches the hash value of the chunk already stored in the memory 11.

また、重複判定部３１は、メモリ１１に記憶された複数のチャンクのうちいずれかのチャンクと重複する場合には、選択されたチャンクが見つかったと判断する。そして、重複判定部３１は、選択されたチャンクを間引きチャンク探索用のチャンクとする。 In addition, the duplication determination unit 31 determines that the selected chunk has been found when it overlaps with any one of the plurality of chunks stored in the memory 11. Then, the duplication determination unit 31 sets the selected chunk as a chunk for thinning-out chunk search.

また、重複判定部３１は、メモリ１１に記憶された複数のチャンクのいずれのチャンクとも重複しない場合には、選択されたチャンクが、ディスク１４に記憶された複数のチャンクのうちいずれかのチャンクと重複するか否かを判定する。例えば、重複判定部３１は、重複管理情報１１１に記憶された間引きチャンク探索用のチャンクの間引きチャンク位置および重複管理情報２１１を用いて、選択されたチャンクの算出されたハッシュ値がディスク１４に記憶されたチャンクのハッシュ値と一致するか否かを判定する。 In addition, when there is no duplication with any of the plurality of chunks stored in the memory 11, the duplication determination unit 31 selects the selected chunk as one of the plurality of chunks stored in the disk 14. Determine whether they overlap. For example, the duplication determination unit 31 stores the calculated hash value of the selected chunk in the disk 14 using the decimation chunk position for decimation chunk search and the duplication management information 211 stored in the duplication management information 111. It is determined whether or not it matches the hash value of the designated chunk.

また、重複判定部３１は、選択されたチャンクが、ディスク１４に記憶された複数のチャンクのうちいずれかのチャンクと重複する場合には、選択されたチャンクが見つかったと判断する。また、重複判定部３１は、選択されたチャンクが、ディスク１４に記憶された複数のチャンクのいずれのチャンクとも重複しない場合には、選択されたチャンクが見つからなかったと判断する。 The duplication determination unit 31 determines that the selected chunk has been found when the selected chunk overlaps any one of the plurality of chunks stored in the disk 14. The duplication determination unit 31 determines that the selected chunk has not been found when the selected chunk does not overlap with any of the plurality of chunks stored in the disk 14.

ここで、メモリ１１に記憶された重複管理情報１１１、ディスク１４に記憶された重複管理情報２１１のデータ構造について、図８を参照して説明する。図８は、実施例２に係る重複管理情報のデータ構造の一例を示す図である。 Here, the data structures of the duplication management information 111 stored in the memory 11 and the duplication management information 211 stored in the disk 14 will be described with reference to FIG. FIG. 8 is a diagram illustrating an example of a data structure of duplication management information according to the second embodiment.

図８上図は、メモリ１１に記憶される重複管理情報１１１のデータ構造の一例を示す図である。メモリ１１に記憶される重複管理情報１１１に関し、図４に示す重複管理情報１１１と同一の構成については同一符号を示すことで、その重複する構成の説明については省略する。図８上図に示す重複管理情報１１１が、図４に示す重複管理情報１１１と異なるところは、間引きチャンク位置１１１ｆを追加した点にある。図８下図は、ディスク１４に記憶される重複管理情報２１１のデータ構造の一例を示す図である。図８下図に示すディスク１４に記憶される重複管理情報１１１は、図４に示す重複管理情報１１１と同一の構成であるので、その重複する構成の説明については省略する。 The upper part of FIG. 8 is a diagram illustrating an example of the data structure of the duplication management information 111 stored in the memory 11. Regarding the duplication management information 111 stored in the memory 11, the same components as those in the duplication management information 111 shown in FIG. 8 differs from the duplication management information 111 shown in FIG. 4 in that a thinned chunk position 111f is added. The lower part of FIG. 8 is a diagram showing an example of the data structure of the duplication management information 211 stored in the disk 14. Since the duplication management information 111 stored in the disk 14 shown in the lower diagram of FIG. 8 has the same configuration as that of the duplication management information 111 shown in FIG. 4, description of the duplicate configuration is omitted.

間引きチャンク位置１１１ｆは、間引きしたチャンクの情報について、ディスク１４の重複管理情報２１１に記憶された位置を示す。すなわち、間引きチャンク位置１１１ｆは、間引きしたチャンクの情報への位置を示す。図８には、入力データに含まれるチャンクｘ、ｙの各チャンクの情報が記憶されている。ハッシュ値（ＳＨＡ１）１１１ａが「ｘｘｘ」である情報は、チャンクｘの情報である。ハッシュ値（ＳＨＡ１）２１１ａが「ｙｙｙ」である情報は、チャンクｙの情報である。チャンクｘの間引きチャンク位置１１１ｆには、チャンクｙの情報への位置が設定されている。重複判定部３１は、チャンクｘのハッシュ値がメモリ１１上の重複管理情報１１１で見つかったので、チャンクｘを間引きチャンク探索用のチャンクとする。重複判定部３１は、チャンクｙのハッシュ値がメモリ１１上の重複管理情報１１１で見つからなかったので、間引きチャンク探索用のチャンクｘの間引きチャンク位置１１１ｆからディスク１４に記憶されたチャンクの情報を辿る。そして、重複判定部３１は、チャンクｙの算出されたハッシュ値がディスク１４上の重複管理情報２１１のハッシュ値と一致するか否かを判定し、チャンクｙの情報を見つける。 The thinned chunk position 111f indicates the position stored in the duplication management information 211 of the disk 14 regarding the thinned chunk information. That is, the thinned chunk position 111f indicates the position of the thinned chunk information. In FIG. 8, information on each chunk x and y included in the input data is stored. The information whose hash value (SHA1) 111a is “xxx” is the information of chunk x. The information whose hash value (SHA1) 211a is “yyy” is chunk y information. The position to the information of the chunk y is set in the thinned chunk position 111f of the chunk x. Since the hash value of the chunk x is found in the duplicate management information 111 on the memory 11, the duplication determination unit 31 sets the chunk x as a chunk for thinning-out chunk search. Since the hash value of chunk y was not found in the duplicate management information 111 on the memory 11, the duplication determination unit 31 traces the chunk information stored in the disk 14 from the thinned chunk position 111f for the thinned chunk search chunk x. . Then, the duplication determination unit 31 determines whether the calculated hash value of the chunk y matches the hash value of the duplication management information 211 on the disk 14, and finds information on the chunk y.

図６に戻って、サンプリングチャンク登録部３２は、選択されたチャンクが重複判定部３１によって見つからなかった場合には、選択されたチャンクの保存先を決定する。例えば、サンプリングチャンク登録部３２は、新規のチャンクが出現するたびに１ずつ加算されるカウンタの値を、予め定められたメモリ１１の利用比率の分母の値で割った余りを算出する。また、サンプリングチャンク登録部３２は、余りが０以上であって予め定められたメモリ１１の利用比率の分子の値より小さい場合には、選択されたチャンクをメモリ１１に保存する。また、サンプリングチャンク登録部３２は、余りが予め定められたメモリ１１の利用比率の分子の値以上である場合には、選択されたチャンクをディスク１４に保存する。 Returning to FIG. 6, when the selected chunk is not found by the duplication determination unit 31, the sampling chunk registration unit 32 determines the storage destination of the selected chunk. For example, the sampling chunk registration unit 32 calculates the remainder obtained by dividing the value of the counter that is incremented by one each time a new chunk appears by the value of the denominator of the predetermined usage ratio of the memory 11. In addition, the sampling chunk registration unit 32 stores the selected chunk in the memory 11 when the remainder is 0 or more and is smaller than the predetermined numerator value of the usage ratio of the memory 11. Further, the sampling chunk registration unit 32 stores the selected chunk in the disk 14 when the remainder is equal to or greater than a predetermined numerator value of the usage ratio of the memory 11.

［キャッシュ保存処理の手順］
次に、実施例２に係るキャッシュ保存処理の手順を、図９Ａ〜図９Ｄを参照して説明する。図９Ａ〜図９Ｄは、実施例２に係るキャッシュ保存処理のフローチャートを示す図である。図９Ａおよび図９Ｂに示すフローチャートに関し、図５Ａおよび図５Ｂに示すフローチャートと同一の動作については同一符号を示すことで、その重複する動作の説明については略記する。なお、重複中の先頭のチャンクは「重複先頭チャンク」といい、「ＨＥＡＤＣＨＵＮＫ」で表わすものとする。重複チャンクの直前の重複チャンクは「直前重複チャンク」といい、「ＰＲＥＣＨＵＮＫ」で表わすものとする。現在の出現位置は、「現出現位置」といい、「ＯＦＦＳＥＴ」で表わすものとする。メモリ１１に記憶されたチャンクの数は、「ｄａｔａｎｕｍ」で表わすものとする。新規のチャンクの数は、「チャンクカウンタ」といい、「ｃｏｕｎｔｅｒ」で表わすものとする。ＨＥＡＤＣＨＵＮＫ、ＰＲＥＣＨＵＮＫ、ＯＦＦＳＥＴ、ｄａｔａｎｕｍおよびｃｏｕｎｔｅｒは、記憶部１０に一時的に記憶されるローカルエリアである。また、メモリ１１の利用比率は、ｎ／ｍ（ｎ，ｍ：正の整数）であるとする。 [Cache saving processing procedure]
Next, the procedure of the cache storage process according to the second embodiment will be described with reference to FIGS. 9A to 9D. 9A to 9D are flowcharts of the cache storage process according to the second embodiment. Regarding the flowcharts shown in FIGS. 9A and 9B, the same operations as those in the flowcharts shown in FIGS. 5A and 5B are denoted by the same reference numerals, and the description of the overlapping operations will be omitted. Note that the first chunk being duplicated is referred to as “duplicate first chunk” and is represented by “HEADCHUNK”. The duplicate chunk immediately before the duplicate chunk is called “immediate duplicate chunk”, and is represented by “PRECHUNK”. The current appearance position is referred to as “current appearance position” and is represented by “OFFSET”. The number of chunks stored in the memory 11 is represented by “datanum”. The number of new chunks is referred to as “chunk counter” and is represented by “counter”. HEADCHUNK, PRECHUNK, OFFSET, dataumum and counter are local areas temporarily stored in the storage unit 10. Further, the usage ratio of the memory 11 is assumed to be n / m (n, m: a positive integer).

図９Ａに示すように、チャンク分割部２１は、ＨＥＡＤＣＨＵＮＫおよびＰＲＥＣＨＵＮＫにｎｕｌｌを設定し、ＯＦＦＳＥＴ、ｄａｔａｎｕｍおよびｃｏｕｎｔｅｒに０を設定する（ステップＳ１１Ａ）。チャンク分割部２１は、データの受信を待つ（ステップＳ１２）。 As shown in FIG. 9A, the chunk division unit 21 sets null to HEADCHUNK and PRECHUNK, and sets 0 to OFFSET, datatum, and counter (step S11A). The chunk division unit 21 waits for data reception (step S12).

続いて、制御部２０は、選択したチャンクに対して、キャッシュ保存処理を行う（ステップＳ１７）。なお、キャッシュ保存処理のフローチャートについては、図９Ｂ〜図９Ｄで説明する。そして、制御部２０は、全てのチャンクにおける処理が終了したか否かを判定する（ステップＳ１８）。全てのチャンクにおける処理が終了していない場合には（ステップＳ１８；Ｎｏ）、制御部２０は、次のチャンクを選択させるべく、ステップＳ１６に移行する。 Subsequently, the control unit 20 performs a cache storage process on the selected chunk (step S17). The flowchart of the cache storage process will be described with reference to FIGS. 9B to 9D. And the control part 20 determines whether the process in all the chunks was complete | finished (step S18). If the processing has not been completed for all the chunks (step S18; No), the control unit 20 proceeds to step S16 to select the next chunk.

図９Ｂに示すように、ハッシュ計算部２２は、選択したチャンクに対して、ＳＨＡ１を用いてハッシュ値を算出する（ステップＳ２１）。ハッシュ計算部２２は、算出したハッシュ値をローカルエリアであるＳＨＡＶＡＬに設定する。 As shown in FIG. 9B, the hash calculator 22 calculates a hash value for the selected chunk using SHA1 (step S21). The hash calculation unit 22 sets the calculated hash value in the SHAVAL that is the local area.

そして、重複判定部３１は、選択したチャンクの重複を探索する（ステップＳ２２Ａ）。なお、チャンクの重複探索処理のフローチャートについては、図９Ｄで説明する。 And the duplication determination part 31 searches for duplication of the selected chunk (step S22A). A flowchart of the chunk duplication search process will be described with reference to FIG. 9D.

そして、重複判定部３１は、チャンクの重複探索処理に基づいて、選択したチャンクが新規であるか否かを判定する（ステップＳ２３）。例えば、重複判定部３１は、チャンクの重複探索処理により、選択したチャンクが見つからなかったか否かを判定する。選択したチャンクが見つからなかった場合には、選択したチャンクは新規である。選択したチャンクがみつかった場合には、選択したチャンクは新規でない。 And the duplication determination part 31 determines whether the selected chunk is new based on the duplication search process of a chunk (step S23). For example, the duplication determination unit 31 determines whether the selected chunk was not found by the chunk duplication search process. If the selected chunk is not found, the selected chunk is new. If the selected chunk is found, the selected chunk is not new.

そして、選択したチャンクが新規である場合には（ステップＳ２３；Ｙｅｓ）、サンプリングチャンク登録部３２は、サンプリングチャンク登録処理を実行する（ステップＳ２４Ａ）。すなわち、サンプリングチャンク登録処理は、選択したチャンクをサンプリングし、サンプリングに応じてメモリ１１またはディスク１４へ登録する。なお、サンプリングチャンク登録処理のフローチャートについては、図９Ｃで説明する。 If the selected chunk is new (step S23; Yes), the sampling chunk registration unit 32 executes a sampling chunk registration process (step S24A). That is, in the sampling chunk registration process, the selected chunk is sampled and registered in the memory 11 or the disk 14 according to the sampling. A flowchart of the sampling chunk registration process will be described with reference to FIG. 9C.

そして、サンプリングチャンク登録部３２は、ローカルエリアであるＨＥＡＤＣＨＵＮＫ（重複先頭チャンク）、ＯＦＦＳＥＴ（現出現位置）、ｄａｔａｎｕｍ（メモリ１１に記憶されたチャンクの数）およびｃｏｕｎｔｅｒ（チャンクカウンタ）を調整する（ステップＳ２５Ａ）。例えば、ＨＥＡＤＣＨＵＮＫに、ｎｕｌｌが設定される。ＯＦＦＳＥＴに、選択したチャンクの長さを加算して得られた値が設定される。ｄａｔａｎｕｍに、選択したチャンク分の１を加算して得られた値が設定される。ｃｏｕｎｔｅｒに、選択したチャンク分の１を加算して得られた値が設定される。 Then, the sampling chunk registration unit 32 adjusts HEADCHUNK (overlapping head chunk), OFFSET (current appearance position), dataum (number of chunks stored in the memory 11), and counter (chunk counter), which are local areas (step). S25A). For example, null is set in HEADCHUNK. A value obtained by adding the length of the selected chunk to OFFSET is set. A value obtained by adding 1 to the selected chunk is set to dataum. A value obtained by adding 1 to the selected chunk is set in the counter.

そして、新規登録部２４は、新規のチャンクの情報を重複管理情報１１１の先頭に設定する（ステップＳ２６）。この新規のチャンクが最新で使われたからである。そして、新規登録部２４は、キャッシュ保存処理におけるキャッシュ移動処理をさせる。なお、キャッシュ保存処理におけるキャッシュ移動処理のフローチャートについては、図５Ｃで説明したので、その重複する動作の説明については省略する。 Then, the new registration unit 24 sets the new chunk information at the head of the duplication management information 111 (step S26). This is because this new chunk has been used recently. Then, the new registration unit 24 performs a cache transfer process in the cache saving process. Since the flowchart of the cache transfer process in the cache storage process has been described with reference to FIG. 5C, the description of the overlapping operation is omitted.

ステップＳ２３において、選択したチャンクが新規でない場合には（ステップＳ２３；Ｎｏ）、重複管理部２５は、重複管理処理を実行する。なお、重複管理処理のフローチャート（ステップＳ２７〜Ｓ３５）については、図５Ｂで説明したので、その重複する動作の説明については省略する。 In step S23, when the selected chunk is not new (step S23; No), the duplication management unit 25 executes duplication management processing. In addition, since the flowchart (steps S27 to S35) of the duplication management process has been described with reference to FIG. 5B, description of the overlapping operation is omitted.

図９Ｃに示すように、サンプリングチャンク登録部３２は、以下のように、サンプリングチャンク登録処理を実行する。サンプリングチャンク登録部３２は、選択したチャンクの保存先を決定すべく、ｃｏｕｎｔｅｒ（チャンクカウンタ）をメモリ１１の利用比率の分母の値ｍで割った余りを算出する（ステップＳ５１）。 As illustrated in FIG. 9C, the sampling chunk registration unit 32 performs the sampling chunk registration process as follows. The sampling chunk registration unit 32 calculates a remainder obtained by dividing the counter (chunk counter) by the denominator value m of the usage ratio of the memory 11 in order to determine the storage destination of the selected chunk (step S51).

そして、サンプリングチャンク登録部３２は、余りが０以上且つメモリ１１の利用比率の分子の値であるｎより小さいか否かを判定する（ステップＳ５２）。余りが０以上且つメモリ１１の利用比率の分子の値であるｎより小さい場合には（ステップＳ５２；Ｙｅｓ）、サンプリングチャンク登録部３２は、選択したチャンクをメモリ１１に保存する（ステップＳ５３）。 Then, the sampling chunk registration unit 32 determines whether or not the remainder is 0 or more and smaller than n which is the numerator value of the usage ratio of the memory 11 (step S52). If the remainder is not less than 0 and smaller than n, which is the numerator value of the usage ratio of the memory 11 (step S52; Yes), the sampling chunk registration unit 32 stores the selected chunk in the memory 11 (step S53).

そして、サンプリングチャンク登録部３２は、選択したチャンクの情報をメモリ１１上の重複管理情報１１１に登録する（ステップＳ５４）。例えば、ハッシュ値（ＳＨＡ１）１１１ａに、ＳＨＡＶＡＬに設定された値が設定される。出現位置１１１ｂに、ＯＦＦＳＥＴに設定された値が設定される。長さ１１１ｃに、チャンクの長さが設定される。出現回数１１１ｄに、「１」が設定される。継続１１１ｅに、「０」が設定される。 Then, the sampling chunk registration unit 32 registers the information of the selected chunk in the duplication management information 111 on the memory 11 (Step S54). For example, the value set in SHAVAL is set in the hash value (SHA1) 111a. A value set to OFFSET is set at the appearance position 111b. The chunk length is set to the length 111c. “1” is set in the appearance count 111d. “0” is set in the continuation 111e.

そして、サンプリングチャンク登録部３２は、余りがメモリ１１の利用比率の分子の値であるｎを１減じた値であるか否かを判定する（ステップＳ５５）。すなわち、サンプリングチャンク登録部３２は、今回受信した入力データの複数チャンクの中でメモリ１１に記憶される最後のチャンクであるか否かを判定する。一例として、メモリ１１の利用比率が２／５である場合とする。分割されたチャンクａ，ｂ，ｃの余りが０，１，２であるとする。すると、チャンクｂの余りは１であり、チャンクｂの余りが、メモリ１１の利用比率の分子の値である２を１減じた値１である。したがって、今回受信した入力データの複数チャンクの中でチャンクｂがメモリ１１に記憶される最後のチャンクである。 Then, the sampling chunk registration unit 32 determines whether or not the remainder is a value obtained by subtracting 1 which is a numerator value of the usage ratio of the memory 11 (step S55). That is, the sampling chunk registration unit 32 determines whether or not it is the last chunk stored in the memory 11 among the plurality of chunks of the input data received this time. As an example, it is assumed that the usage ratio of the memory 11 is 2/5. It is assumed that the remainders of the divided chunks a, b, and c are 0, 1, and 2. Then, the remainder of the chunk b is 1, and the remainder of the chunk b is a value 1 obtained by subtracting 1 which is 2 that is the numerator value of the utilization ratio of the memory 11. Therefore, chunk b is the last chunk stored in memory 11 among the plurality of chunks of input data received this time.

そして、余りがメモリ１１の利用比率の分子の値であるｎを１減じた値である場合には（ステップＳ５５；Ｙｅｓ）、サンプリングチャンク登録部３２は、選択したチャンクを、間引きチャンク位置保存用のチャンクとする（ステップＳ５６）。すなわち、サンプリングチャンク登録部３２は、選択したチャンクを、後続するチャンクの間引き位置を更新するチャンクとする。そして、サンプリングチャンク登録部３２は、サンプリングチャンク登録処理し、ステップＳ２５Ａに戻る。 If the remainder is a value obtained by subtracting n, which is the numerator value of the utilization ratio of the memory 11 (step S55; Yes), the sampling chunk registration unit 32 stores the selected chunk for thinning chunk position storage. (Step S56). That is, the sampling chunk registration unit 32 sets the selected chunk as a chunk for updating the thinning position of the subsequent chunk. And the sampling chunk registration part 32 performs a sampling chunk registration process, and returns to step S25A.

一方、余りがメモリ１１の利用比率の分子の値であるｎを１減じた値でない場合には（ステップＳ５５；Ｎｏ）、サンプリングチャンク登録部３２は、選択したチャンクを、間引きチャンク位置保存用のチャンクとしない。そして、サンプリングチャンク登録部３２は、サンプリングチャンク登録処理を終了し、ステップＳ２５Ａに戻る。 On the other hand, when the remainder is not a value obtained by subtracting 1 which is the numerator value of the usage ratio of the memory 11 (step S55; No), the sampling chunk registration unit 32 stores the selected chunk for saving the thinned chunk position. Don't chunk. And the sampling chunk registration part 32 complete | finishes a sampling chunk registration process, and returns to step S25A.

ステップＳ５２において、余りが０以上且つメモリ１１の利用比率の分子の値であるｎより小さくない場合には（ステップＳ５２；Ｎｏ）、サンプリングチャンク登録部３２は、選択したチャンクをディスク１４に保存する（ステップＳ５７）。 In step S52, when the remainder is not less than 0 and not smaller than n which is the numerator value of the utilization ratio of the memory 11 (step S52; No), the sampling chunk registration unit 32 stores the selected chunk in the disk 14. (Step S57).

そして、サンプリングチャンク登録部３２は、選択したチャンクの情報をディスク１４上の重複管理情報２１１に登録する（ステップＳ５８）。例えば、ハッシュ値（ＳＨＡ１）２１１ａに、ＳＨＡＶＡＬに設定された値が設定される。出現位置２１１ｂに、ＯＦＦＳＥＴに設定された値が設定される。長さ２１１ｃに、チャンクの長さが設定される。出現回数２１１ｄに、「１」が設定される。継続２１１ｅに、「０」が設定される。 Then, the sampling chunk registration unit 32 registers the selected chunk information in the duplication management information 211 on the disk 14 (step S58). For example, the value set in SHAVAL is set in the hash value (SHA1) 211a. A value set to OFFSET is set in the appearance position 211b. The chunk length is set in the length 211c. “1” is set in the appearance count 211d. In the continuation 211e, “0” is set.

そして、サンプリングチャンク登録部３２は、間引きチャンク位置保存用のチャンクの間引き位置を更新する（ステップＳ５９）。例えば、重複管理情報１１１の、間引きチャンク位置保存用のチャンクに対する間引きチャンク位置１１１ｆに、選択したチャンクの重複管理情報２１１への位置が設定される。そして、サンプリングチャンク登録部３２は、サンプリングチャンク登録処理を終了し、ステップＳ２５Ａに戻る。 Then, the sampling chunk registration unit 32 updates the thinning position of the chunk for saving the thinned chunk position (step S59). For example, the position of the selected chunk in the duplication management information 211 is set in the thinning chunk position 111f for the chunk for saving the thinned chunk position of the duplication management information 111. And the sampling chunk registration part 32 complete | finishes a sampling chunk registration process, and returns to step S25A.

図９Ｄに示すように、重複判定部３１は、以下のように、チャンクの重複探索処理を実行する。重複判定部３１は、メモリ１１で、選択したチャンクの重複を探索する（ステップＳ６１）。例えば、重複判定部３１は、メモリ１１上の重複管理情報１１１と、選択したチャンクのハッシュ値とを用いて、選択したチャンクの重複を探索する。 As illustrated in FIG. 9D, the duplication determination unit 31 performs chunk duplication search processing as follows. The duplication determination unit 31 searches the memory 11 for duplication of the selected chunk (step S61). For example, the duplication determination unit 31 searches for duplication of the selected chunk using the duplication management information 111 on the memory 11 and the hash value of the selected chunk.

そして、重複判定部３１は、メモリ１１で、選択したチャンクが見つかったか否かを判定する（ステップＳ６２）。選択したチャンクが見つかった場合には（ステップＳ６２；Ｙｅｓ）、重複判定部３１は、選択したチャンクを間引きチャンク探索用のチャンクとする（ステップＳ６３）。そして、重複判定部３１は、選択したチャンクが見つかったことをパラメータとしてステップＳ２３に戻る。 And the duplication determination part 31 determines whether the selected chunk was found in the memory 11 (step S62). When the selected chunk is found (step S62; Yes), the duplication determination unit 31 sets the selected chunk as a chunk for thinning-out chunk search (step S63). Then, the duplication determination unit 31 returns to step S23 with the fact that the selected chunk has been found as a parameter.

一方、選択したチャンクが見つからなかった場合には（ステップＳ６２；Ｎｏ）、重複判定部３１は、間引きチャンク探索用のチャンクの間引きチャンク位置を探索する（ステップＳ６４）。すなわち、重複判定部３１は、ディスク１４で、選択したチャンクの重複を探索する。例えば、重複判定部３１は、重複管理情報１１１に記憶された間引きチャンク探索用のチャンクの間引きチャンク位置１１１ｆと、ディスク１４上の重複管理情報２１１とを用いて、選択したチャンクの重複を探索する。 On the other hand, when the selected chunk is not found (step S62; No), the duplication determination unit 31 searches for a thinned-out chunk position for the thinned-out chunk search chunk (step S64). That is, the duplication judgment unit 31 searches the disc 14 for duplication of the selected chunk. For example, the duplication determination unit 31 searches for duplication of the selected chunk using the decimation chunk position 111f for the decimation chunk search chunk stored in the duplication management information 111 and the duplication management information 211 on the disk 14. .

そして、重複判定部３１は、ディスク１４で、選択したチャンクが見つかったか否かを判定する（ステップＳ６５）。選択したチャンクが見つからなかった場合には（ステップＳ６５；Ｎｏ）、重複判定部３１は、選択したチャンクが見つからなかったことをパラメータとしてステップＳ２３に戻る。選択したチャンクが見つかった場合には（ステップＳ６５；Ｙｅｓ）、重複判定部３１は、選択したチャンクが見つかったことをパラメータとしてステップＳ２３に戻る。 Then, the duplication determination unit 31 determines whether or not the selected chunk has been found on the disk 14 (step S65). When the selected chunk is not found (step S65; No), the duplication determination unit 31 returns to step S23 with the parameter that the selected chunk is not found as a parameter. When the selected chunk is found (step S65; Yes), the duplication determination unit 31 returns to step S23 with the fact that the selected chunk is found as a parameter.

［実施例２の効果］
上記実施例２によれば、通信装置１は、メモリ１１およびディスク１４を有する。通信装置１は、メモリ１１の利用比率に応じて、処理対象のデータを、メモリ１１およびディスク１４のいずれかのキャッシュに記憶する。かかる構成によれば、通信装置１は、メモリ１１の容量が小さい場合であっても、ディスク１４をともに用いてデータを記憶する。したがって、通信装置１は、データの重複の特性がＲＬＣ（rare and long chunk series）であるデータの一部を失わないで、重複除去を実現することができる。 [Effect of Example 2]
According to the second embodiment, the communication device 1 includes the memory 11 and the disk 14. The communication device 1 stores the data to be processed in one of the caches of the memory 11 and the disk 14 in accordance with the usage ratio of the memory 11. According to such a configuration, the communication device 1 uses the disk 14 to store data even when the capacity of the memory 11 is small. Therefore, the communication device 1 can realize deduplication without losing a part of data whose data duplication characteristic is RLC (rare and long chunk series).

［その他］
なお、通信装置１は、既知のパーソナルコンピュータ、ワークステーション等の情報処理装置に、上記した重複判定部２３、新規登録部２４、重複管理部２５、キャッシュ移動部２６などの各機能を搭載することによって実現することができる。 [Others]
Note that the communication device 1 is equipped with functions such as the above-described duplication determination unit 23, new registration unit 24, duplication management unit 25, and cache migration unit 26 in an information processing apparatus such as a known personal computer or workstation. Can be realized.

また、上記実施例１，２では、通信装置１は、過去に送信したデータを記憶するキャッシュ保存領域を階層化する。そして、通信装置１が、上位のキャッシュ（例えばメモリ１１）が満杯であるタイミングで、データの重複の特性に応じて、データを所定の下位のキャッシュ（例えば、ディスク１２、ＳＳＤ１３）に移動する場合を説明した。しかしながら、通信装置１の場合に限定されず、情報処理装置が、過去に処理したデータを記憶するキャッシュ保存領域を階層化する。そして、情報処理装置が、上位のキャッシュ（例えばメモリ１１）が満杯であるタイミングで、データの重複の特性に応じて、データを所定の下位のキャッシュ（例えば、ディスク１２、ＳＳＤ１３）に移動する場合であっても良い。例えば、情報処理装置内の、ＣＰＵと主記憶装置との間にあるキャッシュメモリに適用しても良い。このキャッシュメモリをキャッシュ保存領域とすれば良い。 Moreover, in the said Example 1, 2, the communication apparatus 1 hierarchizes the cache preservation | save area | region which memorize | stores the data transmitted in the past. When the communication device 1 moves the data to a predetermined lower cache (for example, the disk 12 or the SSD 13) according to the data duplication characteristic at the timing when the upper cache (for example, the memory 11) is full. Explained. However, the present invention is not limited to the case of the communication device 1, and the information processing device hierarchizes a cache storage area for storing data processed in the past. When the information processing apparatus moves the data to a predetermined lower cache (for example, the disk 12 or the SSD 13) according to the data duplication characteristic at the timing when the upper cache (for example, the memory 11) is full. It may be. For example, the present invention may be applied to a cache memory between the CPU and the main storage device in the information processing apparatus. This cache memory may be used as a cache storage area.

また、図示した装置の各構成要素は、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、装置の分散・統合の具体的態様は図示のものに限られず、その全部または一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、チャンク分割部２１とハッシュ計算部２２とを１個の部として統合しても良い。一方、キャッシュ移動部２６を、キャッシュが満杯であるか否かを判定する判定部と、キャッシュが満杯である場合に該当データを移動する移動部とに分散しても良い。また、記憶部１０を通信装置１の外部装置に記憶するようにしても良いし、記憶部１０を記憶した外部装置を通信装置１とネットワーク経由で接続するようにしても良い。 In addition, each component of the illustrated apparatus does not necessarily need to be physically configured as illustrated. In other words, the specific mode of device distribution / integration is not limited to that shown in the figure, and all or part of the device is functionally or physically distributed / integrated in an arbitrary unit according to various loads or usage conditions. Can be configured. For example, the chunk division unit 21 and the hash calculation unit 22 may be integrated as one unit. On the other hand, the cache transfer unit 26 may be distributed to a determination unit that determines whether or not the cache is full and a transfer unit that moves the corresponding data when the cache is full. In addition, the storage unit 10 may be stored in an external device of the communication device 1, or the external device storing the storage unit 10 may be connected to the communication device 1 via a network.

また、上記実施例で説明した各種の処理は、あらかじめ用意されたプログラムをパーソナルコンピュータやワークステーション等のコンピュータで実行することによって実現することができる。そこで、以下では、図１に示した通信装置１と同様の機能を実現するキャッシュ保存プログラムを実行するコンピュータの一例を説明する。図１０は、キャッシュ保存プログラムを実行するコンピュータの一例を示す図である。 The various processes described in the above embodiments can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. Therefore, in the following, an example of a computer that executes a cache storage program that realizes the same function as that of the communication device 1 illustrated in FIG. 1 will be described. FIG. 10 is a diagram illustrating an example of a computer that executes a cache storage program.

図１０に示すように、コンピュータ２００は、各種演算処理を実行するＣＰＵ２０３と、ユーザからのデータの入力を受け付ける入力装置２１５と、表示装置２０９を制御する表示制御部２０７とを有する。また、コンピュータ２００は、記憶媒体からプログラムなどを読取るドライブ装置２１３と、ネットワークを介して他のコンピュータとの間でデータの授受を行う通信制御部２１７とを有する。また、コンピュータ２００は、各種情報を一時記憶するメモリ２０１と、ＨＤＤ２０５を有する。そして、メモリ２０１、ＣＰＵ２０３、ＨＤＤ２０５、表示制御部２０７、ドライブ装置２１３、入力装置２１５、通信制御部２１７は、バス２１９で接続されている。 As illustrated in FIG. 10, the computer 200 includes a CPU 203 that executes various arithmetic processes, an input device 215 that receives input of data from the user, and a display control unit 207 that controls the display device 209. The computer 200 also includes a drive device 213 that reads a program and the like from a storage medium, and a communication control unit 217 that exchanges data with other computers via a network. The computer 200 also includes a memory 201 that temporarily stores various types of information and an HDD 205. The memory 201, CPU 203, HDD 205, display control unit 207, drive device 213, input device 215, and communication control unit 217 are connected by a bus 219.

ドライブ装置２１３は、例えばリムーバブルディスク１４１用の装置である。ＨＤＤ２０５は、キャッシュ保存プログラム２０５ａおよびキャッシュ保存関連情報２０５ｂを記憶する。 The drive device 213 is a device for a removable disk 141, for example. The HDD 205 stores a cache storage program 205a and cache storage related information 205b.

ＣＰＵ２０３は、キャッシュ保存プログラム２０５ａを読み出して、メモリ２０１に展開し、プロセスとして実行する。かかるプロセスは、通信装置１の各機能部に対応する。キャッシュ保存関連情報２０５ｂは、重複管理情報１１１に対応する。そして、例えばリムーバブルディスク１４１が、重複管理情報１１１などの各情報を記憶する。 The CPU 203 reads the cache storage program 205a, expands it in the memory 201, and executes it as a process. Such a process corresponds to each functional unit of the communication device 1. The cache storage related information 205 b corresponds to the duplication management information 111. For example, the removable disk 141 stores each piece of information such as the duplication management information 111.

なお、キャッシュ保存プログラム２０５ａについては、必ずしも最初からＨＤＤ２０５に記憶させておかなくても良い。例えば、コンピュータ２００に挿入されるフレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカード等の「可搬用の物理媒体」に当該プログラムを記憶させておく。そして、コンピュータ２００がこれらからキャッシュ保存プログラム２０５ａを読み出して実行するようにしても良い。 Note that the cache storage program 205a is not necessarily stored in the HDD 205 from the beginning. For example, the program is stored in a “portable physical medium” such as a flexible disk (FD), a CD-ROM, a DVD disk, a magneto-optical disk, or an IC card inserted into the computer 200. Then, the computer 200 may read and execute the cache storage program 205a from these.

以上の実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）コンピュータに、
処理対象のデータが新規である場合に、前記処理対象のデータを第１のキャッシュメモリに記憶し、
前記第１のキャッシュメモリに記憶されたデータ毎に、処理対象のデータと重複する重複回数および処理対象のデータと重複する際に継続して重複する長さを示す継続重複長を重複履歴として管理し、
新規のデータを処理する際、前記第１のキャッシュメモリに記憶されたデータの数が上限である場合に、所定の削除対象のデータを抽出し、
前記重複履歴で管理された前記削除対象のデータの重複回数および継続重複長に応じて、前記削除対象のデータを、第２のキャッシュメモリおよび第３のキャッシュメモリのいずれかのキャッシュメモリに移動する
処理を実行させることを特徴とするキャッシュ保存プログラム。 (Supplementary note 1)
If the data to be processed is new, store the data to be processed in the first cache memory;
For each data stored in the first cache memory, the number of times of duplication that overlaps the data to be processed and the continuous duplication length that indicates the length of duplication that continues when overlapping with the data to be processed are managed as a duplication history And
When processing new data, if the number of data stored in the first cache memory is the upper limit, the predetermined data to be deleted is extracted,
The data to be deleted is moved to one of the second cache memory and the third cache memory according to the number of times of duplication and the continuous duplication length of the data to be deleted managed in the duplication history. A cache storage program characterized by causing processing to be executed.

（付記２）前記移動する処理は、前記削除対象のデータの継続重複長が第１の閾値より大きい場合に、前記削除対象のデータを第２のキャッシュメモリおよび第３のキャッシュメモリのうち低速の第２のキャッシュメモリに移動する
処理を実行させることを特徴とする付記１に記載のキャッシュ保存プログラム。 (Additional remark 2) When the continuous duplication length of the data to be deleted is larger than a first threshold, the process to move the data to be deleted at a lower speed among the second cache memory and the third cache memory. The cache storage program according to appendix 1, wherein a process of moving to a second cache memory is executed.

（付記３）前記移動する処理は、前記削除対象のデータの継続重複長が第１の閾値以下である場合、且つ前記削除対象のデータの重複回数が第２の閾値より大きい場合に、前記削除対象のデータを第２のキャッシュメモリおよび第３のキャッシュメモリのうち高速の第３のキャッシュメモリに移動する
処理を実行させることを特徴とする付記１に記載のキャッシュ保存プログラム。 (Supplementary Note 3) The moving process is performed when the continuous duplication length of the data to be deleted is equal to or less than a first threshold and the duplication count of the data to be deleted is larger than a second threshold. The cache storage program according to appendix 1, wherein a process of moving target data to a high-speed third cache memory among the second cache memory and the third cache memory is executed.

（付記４）前記移動する処理は、前記削除対象のデータの継続重複長が第１の閾値以下である場合、且つ前記削除対象のデータの重複回数が第２の閾値以下である場合に、前記削除対象のデータを破棄する
処理を実行させることを特徴とする付記１に記載のキャッシュ保存プログラム。 (Supplementary Note 4) The process of moving, when the continuous duplication length of the data to be deleted is less than or equal to a first threshold, and when the number of duplications of the data to be deleted is less than or equal to a second threshold, The cache storage program according to appendix 1, wherein processing for discarding data to be deleted is executed.

（付記５）前記第１のキャッシュメモリが、１次キャッシュメモリと２次キャッシュメモリを有する場合に、前記処理対象のデータを前記第１のキャッシュメモリに記憶する際、前記１次キャッシュメモリの利用比率に応じて、前記処理対象のデータを、前記１次キャッシュメモリおよび前記２次キャッシュメモリのいずれかのキャッシュメモリに記憶する
処理を実行させることを特徴とする付記１に記載のキャッシュ保存プログラム。 (Supplementary Note 5) When the first cache memory has a primary cache memory and a secondary cache memory, when the processing target data is stored in the first cache memory, use of the primary cache memory The cache storage program according to appendix 1, wherein a process of storing the data to be processed in either the primary cache memory or the secondary cache memory is executed according to a ratio.

（付記６）コンピュータが、
処理対象のデータが新規である場合に、前記処理対象のデータを第１のキャッシュメモリに記憶し、
前記第１のキャッシュメモリに記憶されたデータ毎に、処理対象のデータと重複する重複回数および処理対象のデータと重複する際に継続して重複する長さを示す継続重複長を重複履歴として管理し、
新規のデータを処理する際、前記第１のキャッシュメモリに記憶されたデータの数が上限である場合に、所定の削除対象のデータを抽出し、
前記重複履歴で管理された前記削除対象のデータの重複回数および継続重複長に応じて、前記削除対象のデータを、第２のキャッシュメモリおよび第３のキャッシュメモリのいずれかのキャッシュメモリに移動する
各処理を実行することを特徴とするキャッシュ保存方法。 (Appendix 6)
If the data to be processed is new, store the data to be processed in the first cache memory;
For each data stored in the first cache memory, the number of times of duplication that overlaps the data to be processed and the continuous duplication length that indicates the length of duplication that continues when overlapping with the data to be processed are managed as a duplication history And
When processing new data, if the number of data stored in the first cache memory is the upper limit, the predetermined data to be deleted is extracted,
The data to be deleted is moved to one of the second cache memory and the third cache memory according to the number of times of duplication and the continuous duplication length of the data to be deleted managed in the duplication history. A cache storage method characterized by executing each processing.

（付記７）処理対象のデータが新規である場合に、前記処理対象のデータを第１のキャッシュメモリに登録する登録部と、
前記第１のキャッシュメモリに記憶されたデータ毎に、処理対象のデータと重複する重複回数および処理対象のデータと重複する際に継続して重複する長さを示す継続重複長を重複履歴として管理する管理部と、
新規のデータを処理する際、前記第１のキャッシュメモリに記憶されたデータの数が上限である場合に、所定の削除対象のデータを抽出する抽出部と、
前記重複履歴で管理された前記削除対象のデータの重複回数および継続重複長に応じて、前記削除対象のデータを、第２のキャッシュメモリおよび第３のキャッシュメモリのいずれかのキャッシュメモリに移動する移動部と、
を有することを特徴とする情報処理装置。 (Supplementary Note 7) When the data to be processed is new, a registration unit that registers the data to be processed in the first cache memory;
For each data stored in the first cache memory, the number of times of duplication that overlaps the data to be processed and the continuous duplication length that indicates the length of duplication that continues when overlapping with the data to be processed are managed as a duplication history A management department to
When processing new data, when the number of data stored in the first cache memory is an upper limit, an extraction unit that extracts predetermined deletion target data;
The data to be deleted is moved to one of the second cache memory and the third cache memory according to the number of times of duplication and the continuous duplication length of the data to be deleted managed in the duplication history. A moving part;
An information processing apparatus comprising:

（付記８）送信対象のデータが新規である場合に、前記送信対象のデータを第１のキャッシュメモリに登録する登録部と、
前記第１のキャッシュメモリに記憶されたデータ毎に、送信対象のデータと重複する重複回数および送信対象のデータと重複する際に継続して重複する長さを示す継続重複長を重複履歴として管理する管理部と、
新規のデータを処理する際、前記第１のキャッシュメモリに記憶されたデータの数が上限である場合に、所定の削除対象のデータを抽出する抽出部と、
前記重複履歴で管理された前記削除対象のデータの重複回数および継続重複長に応じて、前記削除対象のデータを、第２のキャッシュメモリおよび第３のキャッシュメモリのいずれかのキャッシュメモリに移動する移動部と、
を有することを特徴とする通信装置。 (Supplementary Note 8) When the data to be transmitted is new, a registration unit that registers the data to be transmitted in the first cache memory;
For each data stored in the first cache memory, the number of times of duplication overlapping with the data to be transmitted and the continuous duplication length indicating the length of duplication when overlapping with the data to be transmitted are managed as a duplication history. A management department to
When processing new data, when the number of data stored in the first cache memory is an upper limit, an extraction unit that extracts predetermined deletion target data;
The data to be deleted is moved to one of the second cache memory and the third cache memory according to the number of times of duplication and the continuous duplication length of the data to be deleted managed in the duplication history. A moving part;
A communication apparatus comprising:

１通信装置
１０記憶部
１１メモリ
１１１，２１１重複管理情報
１２ディスク
１３ＳＳＤ
２０制御部
２１チャンク分割部
２２ハッシュ計算部
２３，３１重複判定部
２４新規登録部
２５重複管理部
２６キャッシュ移動部
２７送信部
３２サンプリングチャンク登録部 DESCRIPTION OF SYMBOLS 1 Communication apparatus 10 Memory | storage part 11 Memory 111,211 Duplication management information 12 Disk 13 SSD
DESCRIPTION OF SYMBOLS 20 Control part 21 Chunk division part 22 Hash calculation part 23,31 Duplication determination part 24 New registration part 25 Duplication management part 26 Cache movement part 27 Transmission part 32 Sampling chunk registration part

Claims

On the computer,
If the data to be processed is new, store the data to be processed in the first cache memory;
For each data stored in the first cache memory, the number of times of duplication that overlaps the data to be processed and the continuous duplication length that indicates the length of duplication that continues when overlapping with the data to be processed are managed as a duplication history And
When processing new data, if the number of data stored in the first cache memory is the upper limit, the predetermined data to be deleted is extracted,
The data to be deleted is moved to one of the second cache memory and the third cache memory according to the number of times of duplication and the continuous duplication length of the data to be deleted managed in the duplication history. A cache storage program characterized by causing processing to be executed.

In the moving process, when the continuous duplication length of the data to be deleted is larger than a first threshold value, the data to be deleted is transferred to a second cache having a low speed among the second cache memory and the third cache memory. The cache storage program according to claim 1, wherein a process of moving to a memory is executed.

The moving process is performed when the continuous duplication length of the data to be deleted is equal to or smaller than a first threshold and the duplication frequency of the data to be deleted is larger than a second threshold. The cache storage program according to claim 1, wherein a process of moving to a high-speed third cache memory out of the second cache memory and the third cache memory is executed.

The moving process includes the deletion target data when the continuous duplication length of the deletion target data is equal to or smaller than a first threshold and when the number of times of duplication of the deletion target data is equal to or smaller than a second threshold. The cache storage program according to claim 1, wherein a process for discarding is executed.

When the first cache memory includes a primary cache memory and a secondary cache memory, when storing the data to be processed in the first cache memory, the first cache memory depends on a usage ratio of the primary cache memory. The cache storage program according to claim 1, wherein a process of storing the processing target data in any one of the primary cache memory and the secondary cache memory is executed.

Computer
If the data to be processed is new, store the data to be processed in the first cache memory;
For each data stored in the first cache memory, the number of times of duplication that overlaps the data to be processed and the continuous duplication length that indicates the length of duplication that continues when overlapping with the data to be processed are managed as a duplication history And
When processing new data, if the number of data stored in the first cache memory is the upper limit, the predetermined data to be deleted is extracted,
The data to be deleted is moved to one of the second cache memory and the third cache memory according to the number of times of duplication and the continuous duplication length of the data to be deleted managed in the duplication history. A cache storage method characterized by executing each processing.

A registration unit for registering the processing target data in the first cache memory when the processing target data is new;
For each data stored in the first cache memory, the number of times of duplication that overlaps the data to be processed and the continuous duplication length that indicates the length of duplication that continues when overlapping with the data to be processed are managed as a duplication history A management department to
When processing new data, when the number of data stored in the first cache memory is an upper limit, an extraction unit that extracts predetermined deletion target data;
The data to be deleted is moved to one of the second cache memory and the third cache memory according to the number of times of duplication and the continuous duplication length of the data to be deleted managed in the duplication history. A moving part;
An information processing apparatus comprising: