WO2018092288A1

WO2018092288A1 - Storage device and control method therefor

Info

Publication number: WO2018092288A1
Application number: PCT/JP2016/084371
Authority: WO
Inventors: 伊織米川; 啓池田; 竹内　久治
Original assignee: 株式会社日立製作所
Priority date: 2016-11-18
Filing date: 2016-11-18
Publication date: 2018-05-24

Abstract

[Problem] To provide a storage device and control method therefor that can reduce movement processing costs for source data of de-duplicated data. [Solution] A storage device that executes de-duplication processing is configured so as to manage copy attribute information, as to whether a logical volume has served as a copy source for a copy, on a logical volume unit basis, and to execute de-duplication processing so as to leave the source data on the logical volume that has served as a copy source, on the basis of the copy attribute information; and as a result of the foregoing, the likelihood of source data movement in conjunction with source data updates or logical volume deletions can be reduced. Consequently, it is possible to realize a storage device and control method which can reduce movement processing costs for source data of de-duplicated data.

Description

Storage apparatus and control method thereof

The present invention relates to a storage apparatus and its control method, and is suitable for application to a storage apparatus equipped with a deduplication function, for example.

Conventionally, storage devices are required to store a large amount of data at low cost. A deduplication function is widely used as a function of a storage apparatus for satisfying such a request (see, for example, Patent Document 1 and Patent Document 2). When the storage device detects that multiple data with the same content exists in the storage device, the deduplication function leaves only one of them in the storage device and deletes all the remaining data. It is a function to do.

In the following, the processing executed by the storage device based on the deduplication function (processing that leaves only one piece of data with the same content in the storage device in the storage device and deletes all remaining data) is duplicated. This is called exclusion processing. Further, data left in the storage device in the storage apparatus by this deduplication processing is called original data.

In recent years, various techniques have been proposed in connection with such a deduplication function. For example, in Patent Document 1, when performing deduplication of data, among files stored in a plurality of volumes, duplication is performed on a plurality of volumes in order to avoid further concentration of the load on a high-load volume. The files stored in this way are determined as aggregation target files, a plurality of volumes storing the aggregation target files are identified, and one of the plurality of identified volumes is selected based on the load of the plurality of identified volumes. It has been proposed to select one or more volumes as an aggregation volume and delete the aggregation target files stored in the unselected volumes.

Patent Document 2 discloses a duplication determination unit that determines whether or not the storage target data is already stored in the storage device in order to suppress the performance degradation of the storage device equipped with the deduplication function. A storage destination determination unit that determines a storage destination of non-duplicate data that is non-duplicated storage target data, and a data storage control unit that stores non-duplicate data in a storage device that is the determined storage destination. It is disclosed that a destination determination unit determines a storage location of duplicate data that is determined to be related to non-duplicate data according to a predetermined criterion, and determines a storage location of non-duplication data based on the determination result Yes.

JP 2009-80671 A JP2015-170345A

By the way, in a storage device equipped with a deduplication function, when deleting the volume that stores the original data of the deduplicated data, or when updating the original data, the deduplicated data is included. It is necessary to move the original data for another file or the like to another volume in advance.

However, such movement of the original data has a problem of requiring a lot of time and resources. For this reason, it is required to reduce the cost of the original data transfer process in the actual operation of the deduplication function.

The present invention has been made in consideration of the above points, and intends to propose a storage apparatus and a control method thereof that can reduce the migration processing cost of the original data of the deduplicated data.

In order to solve this problem, in one embodiment of the present invention, a logical volume is provided as a storage area to a higher-level device, and one of the data having the same content is provided to the data stored in the logical volume. In the storage device that executes the deduplication process that leaves the original data and deletes other data, a management unit that manages copy attribute information on whether or not the copy source has been copied in units of logical volumes; Based on the copy attribute information, a deduplication processing execution unit is provided for executing the deduplication processing so as to leave the original data in the logical volume that has become the copy source.

In another embodiment of the present invention, a logical volume is provided to a host device as a storage area, and one of the data having the same content is left as original data for the data stored in the logical volume. In the storage apparatus control method for executing deduplication processing for deleting other data, the storage apparatus manages copy attribute information indicating whether or not the copy apparatus has been a copy source in units of logical volumes. A first step and a second step in which the storage apparatus executes the deduplication processing so as to leave the original data in the logical volume that has become the copy source, based on the copy attribute information. And so on.

According to the present invention, it is possible to reduce the resource consumption caused by the movement of the original data of the deduplicated data, and to reduce the movement processing cost of the original data.

It is a block diagram which shows the whole structure of the storage apparatus by this Embodiment. It is a block diagram which shows the logical structure of a storage apparatus. It is a conceptual diagram with which it uses for description of a 1st use case. It is a conceptual diagram with which it uses for description of a 2nd use case. It is a conceptual diagram with which it uses for description of a 3rd use case. It is a conceptual diagram which shows schematic structure of an address conversion table. It is a conceptual diagram which shows schematic structure of virtual volume information. It is a conceptual diagram which shows schematic structure of local copy pair information. It is a conceptual diagram which shows schematic structure of FPT. It is a flowchart which shows the process sequence of the deduplication process by this Embodiment. It is a flowchart which shows the process sequence of a deduplication execution process. It is a flowchart which shows the process sequence of an original data movement process.

Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

(1) Configuration of Storage Device According to this Embodiment In FIG. 1, reference numeral 1 denotes a storage device according to this embodiment as a whole. The storage device 1 includes a channel adapter package 3, a microprocessor board 4 and a cache memory package 5 that form a storage controller 2, and a hard disk unit 6 that provides a storage area to the storage controller 2.

The channel adapter package 3 includes one or a plurality of channel adapters (not shown). Each channel adapter is an interface that performs protocol control during communication with the host device 8 via the network 7 and includes a port. A unique WWW (World Wide Name) for identifying the port on the network 7 is assigned to the port.

The microprocessor board 4 is a board on which a CPU 11 having one or a plurality of microprocessors 10 each composed of a CPU (Central Processing Unit) core and a processor memory 12 composed of a semiconductor memory are mounted. Each microprocessor 10 of the CPU 11 has a local memory 10A. In the local memory 10A, a microprogram that is a program for the microprocessor 10 to execute various processes and virtual volume information 23 described later are loaded and stored from a shared memory area 22 described later of the cache memory package 5. The processor memory 12 is a memory that is shared and used by the microprocessors 10. Page update frequency information 13 and an address conversion table 14 to be described later are stored and held in the processor memory 12.

The cache memory package 5 includes a plurality of DIMMs (Dual In-line Memory Module) 20. The DIMM 20 is a memory module in which a plurality of semiconductor memories such as DRAM (Dynamic Random Access Memory) are mounted on a printed circuit board. A part of the storage area provided by each semiconductor memory that constitutes each of these DIMMs 20 is used as a cache memory area 21 that temporarily holds data to be read / written to a storage device 30 that will be described later that constitutes the hard disk unit 6, The remaining area is used as a shared memory area 22 for storing control information and the like shared by the microprocessors 10 of the CPU 11. Virtual volume information 23 and local copy pair information 24 described later are stored and held in this shared memory area 22.

The hard disk unit 6 includes a plurality of storage devices 30. The storage device 30 is an expensive and high-performance disk device such as an FC (Fibre-Chanel) disk or a SAS (Serial-Attached SCSI) disk, an inexpensive and low-performance disk device such as a SATA (Serial-AT-Attachment) disk, or an SSD. (Solid State Drive) etc.

FIG. 2 shows a logical configuration of the storage apparatus 1. As shown in FIG. 2, in the storage apparatus 1, one or more storage devices 30 constituting the hard disk unit 6 are managed as a RAID (Redundant Arrays of Inexpensive Disks) group RG, and one or more RAID groups are managed. A storage area provided by each storage device 30 constituting each RG is managed as a pool PL. The storage area in the pool PL is managed in units of a partial area having a predetermined size (for example, 42 MB). Hereinafter, this partial area is referred to as “page” or “physical page”.

Each pool PL is associated with one or a plurality of virtual logical volumes (hereinafter referred to as “virtual volumes”) VVOLs formed using Thin Provisioning technology. It is provided to the host device 8 as a storage area for reading and writing data. Hereinafter, this virtual volume VVOL (storage space provided to the host apparatus 8) may be referred to as “overwrite space”.

A unique identifier (hereinafter referred to as “LUN (Logical Unit Number)”) is assigned to each virtual volume VVOL. The storage area of the virtual volume VVOL is managed in units of a partial area called a logical block having a predetermined size (for example, 512 bytes). Each logical block is given a unique identifier (hereinafter referred to as “LBA (LogicalＢＡBlock」 Address) ”). Furthermore, the storage area of the virtual volume VVOL is managed by being divided into partial areas having the same size as the physical page, which are configured by a plurality of logical blocks. Hereinafter, this partial area is referred to as a “virtual page”.

Data read / write from the host device 8 to the virtual volume VVOL includes the LUN of the virtual volume VVOL, the LUN of the first logical block in the area where data is read / written in the virtual volume VVOL, and the data length of the data Is issued by issuing to the storage apparatus 1 a read request or a write request designating.

When the storage device 1 receives such a read request or write request, the microprocessor 10 having the lowest load at that time in the CPU 11 of the storage controller 2 is assigned as a person in charge of processing the read request or write request.

Then, the assigned microprocessor 10 has a case where the request given from the host device 8 at that time is a write request, and no physical page is assigned to the virtual page to which the data specified in the write request is written. In this case, an unused physical page is allocated to the virtual page from the pool PL associated with the virtual volume VVOL. Then, the microprocessor 10 writes the data from the host device 8 to the physical page assigned to the virtual page.

Further, the microprocessor 10 is in the case where the request given from the host device 8 at that time is a read request or a write request, and the physical page is in the read / write destination area of the data designated in the read request or the write request. When it is allocated, the data is read from the physical page and transferred to the host device 8 that is the source of the read request (in the case of a read request), or the data given from the host device 8 to the physical page Is written (in the case of a write request).

In the case of this embodiment, the user can make a setting to apply that data deduplication should be performed on the virtual volume VVOL. In the following, a virtual volume VVOL that has been set as appropriate will be referred to as a deduplication-compatible volume.

In the deduplication-compatible volume, the area in the virtual page is managed by being divided into partial areas called “chunks” having a predetermined size (for example, 8 KB) that is an integral multiple of the logical block in order from the top of the virtual page. Each chunk is given a unique address (hereinafter referred to as LA (Logical Address)).

In the storage controller 2 (FIG. 1), the microprocessor 10 having the lowest load at that time of the CPU 11 is asynchronous with the I / O processing for the read request and write request from the host device 8 for each deduplication-compatible volume. Deduplication is performed by determining whether or not the same contents are in units of chunks at a predetermined period (for example, 50 msec period), and for chunks with the same contents, only one chunk data is left and the other chunk data is deleted Execute the process.

Here, at the time of duplication determination, if two data are compared in bit units or byte units, the determination process takes a long time. Therefore, the microprocessor 10 provides a check code including a feature amount of a small size (for example, about 8 bytes) calculated based on the data to be compared, such as a hash value calculated using a hash function. Calculate and perform duplication determination between chunks using the calculated check code. In the following embodiment, a check code generated from data of one chunk is referred to as “FPK (FingerPrint Key)”.

When the microprocessor 10 detects duplication of certain data for the first time as a result of the duplication determination, only the data of one chunk among the chunks having the same content is left as the original data, and the data of the other chunks having the same content is stored. delete. At this time, the microprocessor 10 performs compression processing using a reversible compression algorithm such as the LZW algorithm on the data to be left as the original data, and for the data to be deleted, the LA of the chunk in which the data is stored In association with FPK, it is registered and managed in a table stored in a dedicated virtual volume VVOL. Hereinafter, this table is referred to as “FPT (FingerPrintingerkey Table)” 31 (FIG. 2).

Note that the compressed data of the original data generated by the compression processing is stored in a location different from the physical page in which the uncompressed data is stored (hereinafter referred to as “additional writing space”). The write-once space is not a storage space accessible by the host device 8, but a storage space (virtual volume VVOL) that can be used only by the storage controller 2. The write-once space is used for the storage controller 2 to store the compressed data in the storage device 30. The compressed data is stored in the additional writing space by additional writing. The correspondence between the LA of the original data stored in the FPT 31 and the address in the write-once space where the compressed data of the original data is stored is a table (hereinafter referred to as an address conversion table) 14 (FIG. 1). ).

When all the data stored in each chunk of a certain virtual page is compressed and written to the write-once space, the physical page assigned to that virtual page is released. As a result, the storage capacity can be effectively used.

Note that if there is an update data write request (that is, an update request) from the host device 8 to the virtual page in which the data written from the host device 8 has been moved to the write space, the update data is compressed and added. Added to the writing space. However, as another embodiment, the storage apparatus 1 again assigns a physical page to the virtual page in the overwrite space, decompresses the data moved to the write-once space, and decompresses the data to the physical page assigned to the virtual page. The data on the physical page may be updated (overwritten).

(2) Deduplication function according to the present embodiment By the way, the storage apparatus 1 of the present embodiment includes a deduplication function for executing deduplication processing as described above, as a part of the deduplication function. The placement position of the original data and the virtual volume VVOL of the migration destination when the migration of the original data is necessary after that is determined as the virtual volume VVOL estimated to have the least risk of the migration of the original data. The function is installed.

Specifically, when the storage apparatus 1 executes deduplication processing for a certain amount of data for the first time, the data, for example, as shown in FIG. 3, copies master data to generate a plurality of pieces of data having the same contents. If the data is a use case (hereinafter referred to as the first use case), the master image data (hereinafter referred to as master data) is left as the original data, and the master data is copied. The deduplication processing is executed so as to delete the remaining data (hereinafter referred to as copy data) generated.

An example of such a first use case is VDI (Virtual Desktop Infrastructure). In VDI operation, most data other than user data is duplicated between users, and the frequency of updating duplicate data is low. Also, it is usually difficult to imagine that a virtual volume storing master data (hereinafter referred to as a master volume) is deleted prior to the copy destination virtual volume. Therefore, at the time of the first deduplication process, the original data is concentrated on the master volume and the copy data is deduplicated, so that the movement of the original data is difficult to occur and the virtual volume in which the copy data is stored is deleted. It is considered that the processing time can be shortened.

Note that whether or not the data to be deduplicated is in the first use case described above can be determined based on the issuance information of the XCOPY command that is a data replication command. That is, in the case of the first use case, it can be determined that the virtual volume VVOL designated as the copy source in the XCOPY command is the master volume in the first use case. When the XCOPY command is issued, the master volume can be easily identified during deduplication processing by giving the information “XCOPY copy source” to the virtual volume VVOL that is the copy source. can do.

On the other hand, when the storage apparatus 1 can determine that the deduplicated data is backup data obtained by regular backup (hereinafter, this use case is referred to as a second use case) If it becomes necessary to move the original data to another backup volume due to an update of the original data or deletion of a virtual volume that stores the original data, the destination of the original data Of the other candidate backup volumes, the original data is moved to the last updated backup volume.

For example, as shown in FIG. 4, a backup volume (virtual volume VVOL) for each day of the week is prepared, and a certain backup target data (hereinafter referred to as backup target data) is backed up daily to these backup volumes. think of. Of course, the backup destination of the backup target data in this case is the backup volume corresponding to the day of the week.

In FIG. 4, deduplication processing is performed on data stored in a total of seven backup volumes for one week, and data “A” stored in the backup volume for Wednesday is left as the original data. An example is shown in which data having the same content (data “A”) in the backup volume of the day of the week is deduplicated.

Suppose here that the data to be backed up is updated on Wednesday. In this case, if the data “A” is updated to the data “B” by updating the backup target data, the data “B” is backed up to the data “A” stored in the Wednesday backup volume. Before being overwritten, it is necessary to move the original data of the data “A” to another backup volume. This is because when the data “B” is overwritten with the data “A”, the data “A” included in the backup data for one week until then cannot be restored.

At this time, as the destination of the original data of data “A”, the backup volume for Tuesday, which is the earliest in the future, is not the backup volume for Thursday where the original data is moved immediately after the update on the next day. The volume is appropriate. In other words, in the second use case, it can be said that it is preferable that the migration destination of the original data is a backup volume updated last.

In this case, the backup operation is often realized by the local copy function. Therefore, the determination of the migration destination of the original data in the second use case can be performed using the pair information (latest operation time) of the local copy function.

On the other hand, when the storage device 1 needs to move the original data due to overwriting of the original data or deletion of the virtual volume storing the original data, the use case of the original data is periodically For use cases that cannot be determined to be backup operations (hereinafter referred to as the third use case), as shown in FIG. Move the original data.

In this way, the migration risk of the original data due to the data update can be reduced by moving the migration destination of the original data to the virtual page with the lowest update frequency among the migration destination candidates. This method is used not only when determining the destination of the original data in use cases other than the second use case, but also when determining the location of the original data in the first deduplication process (first use case). Can also be applied.

As means for realizing the above functions, in the storage apparatus 1 of the present embodiment, as shown in FIG. 1, page update frequency information 13 and an address conversion table 14 are stored in the processor memory 12 of the microprocessor board 4. Are stored, and the virtual volume information 23 and the local copy pair information 24 are stored in the shared memory area 22 of the cache memory package 5. Further, as described above with reference to FIG. 2, a virtual volume VVOL (hereinafter referred to as an FPT volume) that can be used only by the storage controller 2 is defined in the storage apparatus 1, and an FPT 31 is included in this FPT volume. Stored.

The page update frequency information 13 has a table structure in which the number of updates (update frequency) within a predetermined time (for example, several seconds to several hours) for each chunk of each virtual volume VVOL is stored. The page update frequency information 13 is updated so that the value of the update frequency is incremented by 1 every time the data written in the virtual volume VVOL is updated by the microprocessor 10 (FIG. 1) in charge of the processing.

The address conversion table 14 is a table used for managing the movement destination of each chunk when the chunk data on the overwrite space is moved to the write-once space. In the address conversion table 14, when the data stored in the overwrite space is compressed and stored in the write-once space, the address of the overwrite space in which the data was stored by the microprocessor 10 in charge of the processing, The address of the additional writing space where the compressed data of the original data is stored is stored in association with each other.

Actually, as shown in FIG. 6, the address conversion table 14 includes an overwrite space address column 14A and a write space address column 14B. The overwrite space address column 14A stores the address (LA) on the overwrite space of the chunk that has been compressed and moved to the write space, and the write space address column 14B stores the corresponding chunk. The destination address (PA) in space is stored.

The virtual volume information 23 is information used to manage each virtual volume VVOL defined in the storage apparatus 1, and as shown in FIG. 7, a volume number column 23A, a capacity column 23B, a deduplication setting column 23C and XCOPY attribute column 23D.

The volume number column 23A stores all identification numbers (volume numbers) assigned to the respective virtual volumes VVOL defined in the storage apparatus 1, and the capacity column 23B sets the corresponding virtual volume VVOL. Stored capacity is stored.

In the deduplication setting column 23C, information indicating whether or not the corresponding virtual volume VVOL is set as a deduplication-compatible volume (in FIG. 7, “present” when set, “when not set”). “Nothing”) is stored.

Further, the XCOPY attribute column 23D stores an attribute indicating whether or not the corresponding virtual volume VVOL is a copy source of a copy based on the XCOPY command (hereinafter referred to as an XCOPY attribute). FIG. 7 shows an example in which the character string “XCOPY copy source” is stored when the corresponding virtual volume VVOL is the copy source based on the XCOPY command. This information is registered by the microprocessor 10 (FIG. 1) in charge of controlling the copy processing when the virtual volume becomes the XCOPY copy source.

The local copy pair information 24 is information for managing each local copy pair defined in the storage apparatus 1, and as shown in FIG. 8, a volume number column 24A, a pair attribute column 24B, and a pair operation time column 24C. The table structure includes a pair number column 24D and a partner volume number column 24E.

In the volume number column 24A, volume numbers of all virtual volumes VVOL defined in the storage apparatus 1 are stored. In the pair attribute column 24B, when the corresponding virtual volume VVOL is set as a copy pair of another virtual volume VVOL and a local copy, the primary volume (primary VOL) that is the copy source of the copy pair and the copy destination Information indicating which secondary volume (secondary VOL) is stored is stored. If the corresponding virtual volume VVOL is not set as a copy pair with any virtual volume VVOL, nothing is stored in the pair attribute column 24B.

In the pair operation time column 24C, when the corresponding virtual volume VVOL is set as a copy pair with another virtual volume VVOL, a predetermined operation such as formation of the copy pair or resync (resynchronization) is performed last. Stored time is stored.

The pair number column 24D stores the number of counterpart virtual volumes VVOL for which the corresponding virtual volume VVOL is set as a copy pair. The counterpart volume number column 24E stores the volumes of these counterpart virtual volumes VVOL. All numbers are stored.

On the other hand, the FPT 31 is a table for managing the FPK of each chunk in each deduplication corresponding volume calculated at the time of deduplication processing. As shown in FIG. 9, different FPKs calculated in the deduplication processing are used. Each column 31A is configured.

The uppermost row (hereinafter referred to as FPK row) 31B of each column 31A stores the corresponding FPK value, and each row below the FPK row 31B in each column 31A (hereinafter referred to as “FPK row 31B”). These are called LA rows.) 31C stores LAs of all chunks in which the PFK values of the stored data match the FPK values stored in the FPK row 31B.

Therefore, in the FPT 31, for each column 31A, LA of all the chunks in which the FPK data having the same value as the FPK stored in the FPK row 31B of the column 31A detected by the deduplication process is stored. As a result, only the data of one of these chunks is left as original data (compressed and stored in the write-once space), and the data of other chunks in the same column 31A is deleted by the deduplication processing. Will be.

In the case of the present embodiment, in each column 31A of the FPT 31, the LA of the chunk in which the stored data is left as the original data is stored in the top LA row 31C. Therefore, in the example of FIG. 9, it is shown that the data whose FPK is “FPK1” is deduplicated and the original data is stored in the virtual page to which LA “LA1” is assigned. .

In the case of this embodiment, when the original data is updated or when the virtual volume VVOL storing the original data is deleted, the original data is stored in the next LA row 31C of the same column 31A. Moved to LA. Therefore, in the example of FIG. 9, when the data (original data) stored in the virtual page of LA “LA1” is updated, or when the virtual volume VVOL having the virtual page “LA1” is deleted. , The original data stored in the LA virtual page “LA1” is moved to the LA virtual page “LA1031”.

(3) Various Processes Related to Deduplication Function Next, specific processing contents of various processes related to the above-described deduplication function of this embodiment will be described. Needless to say, the following various processes are executed by the microprocessor 10 having the smallest load of the CPU 11 based on the microprogram stored in the local memory 10A.

(3-1) Deduplication Processing FIG. 10 shows one of the microprocessors 10 (FIG. 1) of the CPU 11 (FIG. 1) based on an activation command given periodically (for example, 50 msec) from a scheduler (not shown). The process procedure of the deduplication process performed by this is shown.

When the CPU 11 receives the activation command, any one of the microprocessors 10 in the CPU 11 starts the deduplication process shown in FIG. 10, and first performs a step from the deduplication corresponding volume defined in the storage apparatus 1. One deduplication-compatible volume (hereinafter referred to as a target volume) to be processed after S2 is determined (S1).

Note that the method of determining the target volume may be either a method of determining at random from the deduplication-compatible volume or a method of determining in a predetermined order from the deduplication-compatible volume. As the former method, a predetermined prime number is added to the volume number of the deduplication-compatible volume that was the last target volume in the deduplication process performed earlier or the volume number of the previous target volume in this deduplication process. A method is conceivable in which the added value is obtained and the deduplication-compatible volume to which the volume number of that value is assigned is used as the target volume. As the latter method, a method of determining a deduplication-compatible volume as a target volume in ascending or descending order of volume numbers is conceivable.

Subsequently, the microprocessor 10 acquires the XCOPY attribute stored in the XCOPY attribute column 23D (FIG. 7) corresponding to the target volume in the virtual volume information 23 described above with reference to FIG. 7 (S2), and acquires the acquired XCOPY attribute. Based on the above, it is determined whether or not the target volume has been the copy source of the copy executed according to the XCOPY command by that time (S3). If the microprocessor 10 obtains a negative result in this determination, it proceeds to step S5.

On the other hand, when the microprocessor 10 obtains a positive result in the determination at step S3, it performs deduplication on the data stored in the target volume (S4), and thereafter all the deduplications in the storage device 1 are performed. It is determined whether or not the processing of steps S1 to S4 has been executed for the corresponding volume (S4).

If the microprocessor 10 obtains a negative result in this determination, it returns to step S1, and thereafter, the deduplication corresponding volume determined as the target volume in step S1 is sequentially changed to another unprocessed deduplication corresponding volume. While switching, the processing from step S1 to step S5 is repeated.

By repeating such steps S1 to S5, deduplication is performed on the data stored in each deduplication-compatible volume that has become the copy source of the copy executed in accordance with the XCOPY command. .

Then, the microprocessor 10 obtains a positive result in step S5 by completing the deduplication on the data stored in all the deduplication-compatible volumes that have become the copy source of the copy executed in accordance with the XCOPY command. Deduplication-compatible volumes for which deduplication execution processing has not been performed (deduplication-compatible volumes that have not been the copy source of copies executed according to the XCOPY command until that time) are subject to processing in step S6 and subsequent steps. One deduplication volume (target volume) to be determined is determined.

Even if the method for determining the target volume in this step S6 is also a method in which the deduplication execution process is randomly determined from among the unprocessed deduplication corresponding volumes, the deduplication execution process is not yet processed. Any of the methods of determining in a predetermined order from the above may be used.

Next, the microprocessor 10 performs duplication processing on the data stored in the target volume determined in step S6 (S7), and thereafter stored in all the deduplication-compatible volumes defined in the storage apparatus 1. It is determined whether or not deduplication has been performed on data (S8).

If the microprocessor 10 obtains a negative result in this determination, the microprocessor 10 returns to step S6, and then sequentially switches the deduplication corresponding volume determined as the target volume in step S6 to another unprocessed deduplication corresponding volume. However, the processing from step S6 to step S8 is repeated.

By such repeated processing of step S6 to step S8, deduplication is executed on data stored in each deduplication-compatible volume that has never become a copy source of a copy executed in accordance with the XCOPY command.

The microprocessor 10 obtains a positive result in step S8 by completing the deduplication on the data stored in all the deduplication-compatible volumes that have not become the copy source of the copy executed in accordance with the XCOPY command. And this deduplication process is complete | finished.

In addition, FIG. 11 shows specific processing contents of the deduplication execution process executed by the microprocessor 10 in step S4 and step S7 of the deduplication process.

When the microprocessor 10 proceeds to step S4 or step S7 of the deduplication process, the microprocessor 10 starts the deduplication execution process shown in FIG. 11, and first selects the target of the process after step S11 from the virtual pages in the target volume. One virtual page (hereinafter referred to as a target page) is determined (S10). The method for determining the target page may be either a method for determining at random from the virtual pages of the target volume or a method for determining in a predetermined order from the virtual pages of the target volume.

Subsequently, the microprocessor 10 determines whether or not deduplication is necessary for the data of the target page determined in step S10 (hereinafter referred to as target data) (S11). In the case of the present embodiment, in order to avoid frequent updating of the original data compressed and stored in the write-once space, duplication occurs when the target data is updated within a predetermined time (for example, 1 hour). We do not do exclusion. Therefore, in step S11, the microprocessor 10 refers to the page update frequency information 13 (FIG. 1) and deduplicates the target data based on whether or not the target page is updated within a predetermined time. It is determined whether or not to perform.

If the microprocessor 10 obtains a negative result in this determination, it proceeds to step S21. On the other hand, when the microprocessor 10 obtains a positive result in the determination at step S11, the microprocessor 10 calculates a hash value of the target data using a predetermined hash function as the FPK of the target data (S12).

Next, the microprocessor 10 sequentially compares the hash value (FPK) of the target data calculated in step S12 with each FPK stored in the FPT 31 (FIG. 9) (S13), and the hash value (FPK) of the target data Is matched with any FPK already registered in the FPT 31 (S14).

Obtaining a negative result in this determination means that the hash value (FPK) of the target data has not been registered in the FPT 41 yet. Thus, at this time, the microprocessor 10 newly registers the hash value calculated in step S12 in the FPT 31 as the FPK of the target page, and at the top FPK row 31C of the column 31A (FIG. 9) corresponding to the FPK in the FPT 31. The LA of the target page is stored in (S18).

The microprocessor 10 then compresses the target data, writes the compressed data thus obtained to the write-once space, and the LA of the target page and the address (PA) where the compressed data in the write-once space is stored. After the correspondence relationship is registered in the address conversion table 14 (S19), the process proceeds to step S20.

On the other hand, obtaining a positive result in the determination in step S14 means that the hash value (FPK) of the target data is already registered in the FPT 31. However, even in this case, the target data and the data for which the FPK registered in the FPT 31 is the same as the hash value of the target data are not always completely the same. Therefore, at this time, the microprocessor 10 compares the target data with data in which the FPK having the same value as the hash value of the target data is registered in the FPT 31 (S15).

Specifically, the microprocessor 10 determines that the FPK having the same value as the hash value of the target data is the LA of the chunk in which the original data of the data registered in the FPT 31 is stored (the top FPK in the column 31A of the FPK in the FPT 31). (LA registered in the row 31C) is acquired from the FPT 31, the address conversion table 14 (FIG. 6) is referred to, an address on the additional write space corresponding to the LA is acquired, and the address in the additional write space is acquired. Read the compressed data of the original data from the position. The microprocessor decompresses the read compressed data to restore the original data before compression, and compares the restored original data with the data of the target page.

Thereafter, the microprocessor 10 determines whether or not the target data and the original data of the data registered in the FPT 31 match the target data and the FPK having the same value as the hash value of the target data based on the comparison result of step S15. Judgment is made (S16).

If the microprocessor 10 obtains a negative result in this determination, it compresses the target data, writes the compressed data thus obtained to the write-once space, and stores the LA of the target page and the compressed data in the write-once space. After the correspondence relationship with the address (PA) is registered in the address conversion table 14 (S19), the process proceeds to step S20.

On the other hand, when the microprocessor 10 obtains a positive result in the determination at step S16, it additionally registers the LA of the target page in the last FPK row 31C of the FPK column 31A having the same value as the hash value of the target data in the FPT 31. Thereafter, the page in the overwrite space where the target data is stored is discarded (the data on the page is deleted) (S20).

Subsequently, the microprocessor 10 determines whether or not the processing of step S11 to step S20 has been executed for all virtual pages in the target volume (S21). If the microprocessor 10 obtains a negative result in this determination, it returns to step S10. Thereafter, the microprocessor 10 sequentially switches the target page determined in step S10 to another unprocessed virtual page in the target volume, step S10 to step S21. Repeat the process.

Then, when the microprocessor 10 eventually obtains a positive result in step S21 by completing the processing of steps S11P to S20 for all virtual pages in the target volume, it ends this deduplication execution processing and deduplication processing. Return to.

As described above, in the storage apparatus 1 according to the present embodiment, the copy source made in accordance with the XCOPY command is the copy source before the deduplication-compatible volume that has never been the copy source made in accordance with the XCOPY command. In order to execute the deduplication execution processing for a certain deduplication-compatible volume, the LA of the virtual page of the deduplication-compatible volume that has been the copy source of the copy made according to the XCOPY command in the FPT 31 is executed according to the XCOPY command. It is registered at a position higher than the LA of the virtual page of the deduplication corresponding volume that has never been the copy source of the broken copy.

In this case, in the storage device 1 of the present embodiment, the virtual page data in which LA is stored in the uppermost FA row 31C of the FPT 31 is left as the original data as described above. 11, the LA of the virtual page of the deduplication-compatible volume that has become the copy source of the copy performed in accordance with the XCOPY command is stored in the uppermost FA row 31C of the FPT 31, As a result, the data stored in the deduplication-compatible volume that has become the copy source of the copy made in accordance with the XCOPY command is left as the original data and stored in the deduplication target volume that is the copy destination of the copy The deduplicated data will be deduplicated.

(3-2) Original Data Migration Processing On the other hand, FIG. 12 shows that original data is transferred from the current virtual volume VVOL to another virtual volume VVOL of another migration destination candidate according to deletion of the virtual volume VVOL or overwriting of the original data. The processing procedure of the original data movement process executed by any of the microprocessors 10 of the CPU 11 when a situation to be moved occurs will be shown. When such a situation occurs, the microprocessor 10 moves the original data to another virtual volume VVOL as a migration destination candidate according to the processing procedure shown in FIG.

In practice, when such a situation occurs, the microprocessor 10 starts the original data movement process shown in FIG. 12, and first, from the local copy pair information 24 (FIG. 8), the virtual data storing the original data to be moved is stored. Information about a local copy pair in which a volume VVOL (hereinafter referred to as an original data storage volume) is used as a data copy source or copy destination is acquired (S30).

Specifically, the microprocessor 10 stores the information of the record (row) corresponding to the original data storage volume in the local copy pair information 24 and the volume number of the original data storage volume in the partner volume number column 24E (FIG. 8). Get information on all records (rows).

Subsequently, based on the information acquired in step S30, the microprocessor 10 determines whether the original data storage volume is set as a secondary volume (copy destination virtual volume VVOL) of a copy pair with another virtual volume VVOL. Is determined (S31). This determination is made by determining whether or not the pair attribute stored in the pair attribute column 24B (FIG. 8) of the record of the original data storage volume acquired in step S30 is “secondary volume”.

The negative result obtained in this determination is that the original data storage volume is not set to a copy pair for local copy with any virtual volume VVOL, or the original data storage volume is used as a primary volume and other virtual volumes VVOL and local volumes. It means that the copy pair is set. Thus, at this time, the microprocessor 10 determines that the use case of the original data is the third use case described above, and proceeds to step S35.

On the other hand, if the microprocessor 10 obtains a positive result in the determination at step S31, the volume set as the primary volume in the copy pair in which the original data storage volume is set as the secondary volume (hereinafter referred to as a specific volume). ) Are acquired from the local copy pair information 24 (S32). Specifically, the row information of the record (row) corresponding to the specific volume is acquired from the local copy pair information 24.

Then, based on the information acquired in step S32, the microprocessor is set to a copy pair that is a virtual volume VVOL other than the original data storage volume and has the specific volume as the primary volume, among the migration destination candidates of the original data. It is determined whether or not there is a virtual volume VVOL (secondary volume) (S33). This determination is made based on whether the volume number of the virtual volume VVOL other than the volume number of the original data storage volume is stored in the counterpart volume number column 24E (FIG. 8) of the information acquired in step S32. .

Obtaining a positive result in this determination means that there are a plurality of copy pairs whose copy source is a specific volume, and that the original data storage volume exists as a secondary volume of one of the copy pairs. means.

Thus, at this time, the microprocessor 10 determines that the use case of the original data is the second use case, and refers to the local copy pair information 24 (FIG. 8), so that a plurality of copy volumes having a specific volume as the copy source are referred to. Of the secondary volumes in the copy pair, the secondary volume other than the original data storage volume and having the latest updated time is determined as the migration destination of the original data (S34).

Specifically, in this step S34, the microprocessor 10 sets a record pair of each virtual volume VVOL to which a volume number other than the volume number of the original data storage volume detected in step S33 in the local copy pair information 24 is assigned. The virtual volume VVOL with the latest time stored in the operation time column 24C (FIG. 8) is determined as the migration destination of the original data. Then, the microprocessor proceeds to step S37.

On the other hand, obtaining a negative result in the determination in step S33 means that the specific volume is not set to a copy pair of a virtual volume VVOL other than the original data storage volume and the local copy. Thus, at this time, the microprocessor 10 determines that the use case of the original data is the third use case, and the virtual page that can be the destination of the original data in each virtual volume VVOL that can be the destination of the original data. The update frequencies are acquired from the page update frequency information 13 (FIG. 1), and these are compared (S35).

The “virtual volume VVOL that can be the migration destination of the original data” here is a deduplication-compatible volume in which data having the same content as the original data is deduplicated, and “a virtual page that can be the migration destination of the original data”. Is a virtual page in which data having the same content as the original data in the deduplication target volume is deduplicated.

Then, based on the comparison result of step S35, the microprocessor 10 selects the virtual page with the lowest update frequency from the virtual pages that can be the migration destination of the original data in each virtual volume VVOL that can be the migration destination of the original data. Is determined as the destination (S36).

Subsequently, the microprocessor 10 copies the original data to the destination of the original data determined in step S34 or S35 (S37). Note that this copying is performed by copying the compressed data of the original data stored in the write-once space so that it is newly written in the write-once space. At this time, the microprocessor 10 uses the address (PA) on the write-once space associated with the source LA of the original data in the address conversion table 14 (FIG. 6) as the copy destination of the original data in the write-once space. Rewrite to address (PA).

Further, the microprocessor 10 copies the address (PA) in the write-once space of the deduplicated data having the same FPK as the original data in the address conversion table 14 to the address of the copy destination of the original data (in step S37) (PA) (S38).

Next, the microprocessor 10 determines the LA of the virtual page in which the original data stored in the first LA row 31C of the column 31A of the FPK corresponding to the original data among the columns 31A of the FPT 31 has been stored. Delete (S39).

Further, the microprocessor 10 stores the LA of the virtual page to which the original data is moved in the first LA row 31C of the column 31A in the FPT 31, and the LA other than the LA of the virtual page stored in the column 31A. If necessary, the LA stored in the column 31A is moved so as to be pre-packed (S40).

Then, the microprocessor 10 thereafter ends this original data movement process.

(4) Effects of the present embodiment As described above, the storage apparatus 1 of the present embodiment can determine that the data to be deduplicated is copied by copying (first use case). Deduplication processing is performed so that the original data remains in the copy source virtual volume.

The storage apparatus 1 is a case (second use case) in which the original data of the deduplicated data exists in any one of the plurality of backup destination virtual volumes VVOL, and when the original data is updated, When deleting the virtual volume VVOL, the original data is moved to the last updated virtual volume VVOL among the other backup destination virtual volumes VVOL. Further, when the use case of the deduplicated data is other than the second use case, the storage apparatus 1 moves the original data to the virtual page with the lowest update frequency.

Therefore, according to the present storage device 1, it is difficult for the original data to move due to the update of the original data or the deletion of the virtual volume VVOL. As a result, the consumption of resources caused by the movement of the original data of the deduplicated data Can be reduced, and the movement processing cost of the original data can be reduced.

Further, according to the present storage device 1, when it is estimated that the first use case described above, the deduplication processing is executed so that the original data is left in the copy source virtual volume. In addition, the probability that the original data remains in the deduplication-compatible volume is low. Therefore, the probability that the original data needs to be moved when deleting the deduplication-compatible volume can be reduced as much as possible, and thus the processing time (average) required for deleting the deduplication-compatible volume can be shortened. be able to.

(5) Other Embodiments In the above-described embodiments, the case where the present invention is applied to the storage apparatus 1 configured as shown in FIG. 1 has been described. However, the present invention is not limited to this. In addition, the present invention can be widely applied to storage apparatuses having various configurations equipped with a deduplication function.

Further, in the above-described embodiment, the copy attribute information indicating whether or not the copy has been a copy source has been managed as a logical volume unit, and the copy source based on the copy attribute information. When the deduplication processing execution unit that performs deduplication processing to leave the original data in a certain logical volume and the situation that the original data should be moved to another logical volume occurs, the next original data moves The microprocessor 10 of the CPU 11 executes the microprogram on the logical volume that is estimated to have the least risk of being determined as the migration destination of the original data and the original data migration unit that migrates the original data to the determined logical volume. However, the present invention is not limited to this, and these pipes are embodied. Parts, some or all of the deduplication processing execution unit and the original data moved may be configured by its dedicated hardware.

The present invention can be widely applied to storage apparatuses equipped with a deduplication function.

DESCRIPTION OF SYMBOLS 1 ... Storage device, 2 ... Storage controller, 8 ... Host device, 10 ... Microprocessor, 11 ... CPU, 12 ... Processor memory, 13 ... Page update frequency information, 14 ... Address conversion table, 22 ... Shared memory area, 23 ... Virtual volume information, 24 ... Local copy pair information, 30 ... Storage device, 31 ... FPT, VVOL ... Virtual volume.

Claims

A deduplication process that provides a logical volume as a storage area to a host device and leaves one of the data with the same content as the original data and deletes other data from the data stored in the logical volume. In the storage device to be executed,
A management unit that manages copy attribute information on whether or not a copy source has been created in units of logical volumes;
A deduplication processing execution unit configured to execute the deduplication processing so as to leave the original data in the logical volume that has been the copy source based on the copy attribute information. .
When a situation where the original data should be moved to another logical volume occurs, the logical volume that is estimated to have the least risk of the next original data being moved is determined as the movement destination of the original data. The storage apparatus according to claim 1, further comprising an original data moving unit that moves the original data to the determined logical volume.
The original data moving unit
As the logical volume that is estimated to have the least risk of the next movement of the original data, the logical volume that is the movement source of the original data is the logical volume that is the data copy destination in the first copy pair, and When the logical volume that is the data copy source in the first copy pair and the second copy pair are set, and there is a logical volume that is the data copy destination in the second copy pair, The storage apparatus according to claim 2, wherein the most recently updated logical volume is determined.
The original data moving unit
The migration source logical volume of the original data is the logical copy destination logical volume in the first copy pair, and the data copy source logical volume and the second copy pair in the first copy pair are the same. The update frequency is the highest among the partial areas of all the logical volumes that can be set as candidates for the migration destination of the original data, except when there is a logical volume that is a data copy destination in the second copy pair. The storage apparatus according to claim 3, wherein a partial area having a low is determined as a movement destination of the original data.
A deduplication process that provides a logical volume as a storage area to a host device and leaves one of the data with the same content as the original data and deletes other data from the data stored in the logical volume. In the storage device control method to be executed,
A first step of managing, on a logical volume basis, copy attribute information indicating whether or not the storage device has become a copy source;
The storage apparatus comprises a second step of executing the deduplication processing so as to leave the original data in the logical volume that has become the copy source based on the copy attribute information. A storage device control method.
In the event that the storage apparatus should move the original data to another logical volume, the logical volume estimated to have the least risk of the next movement of the original data is transferred to the original data The storage apparatus control method according to claim 5, further comprising a third step of deciding the migration destination and moving the original data to the decided logical volume.
In the third step, the storage device
As the logical volume that is estimated to have the least risk of the next movement of the original data, the logical volume that is the movement source of the original data is the logical volume that is the data copy destination in the first copy pair, and When the logical volume that is the data copy source in the first copy pair and the second copy pair are set, and there is a logical volume that is the data copy destination in the second copy pair, The storage apparatus control method according to claim 6, wherein the most recently updated logical volume is determined.
In the third step, the storage device
The migration source logical volume of the original data is the logical copy destination logical volume in the first copy pair, and the data copy source logical volume and the second copy pair in the first copy pair are the same. The update frequency is the highest among the partial areas of all the logical volumes that can be set as candidates for the migration destination of the original data, except when there is a logical volume that is a data copy destination in the second copy pair. The storage apparatus control method according to claim 7, further comprising: determining a partial area having a low value as a movement destination of the original data.