WO2015129035A1 - ストレージシステム - Google Patents
ストレージシステム Download PDFInfo
- Publication number
- WO2015129035A1 WO2015129035A1 PCT/JP2014/055111 JP2014055111W WO2015129035A1 WO 2015129035 A1 WO2015129035 A1 WO 2015129035A1 JP 2014055111 W JP2014055111 W JP 2014055111W WO 2015129035 A1 WO2015129035 A1 WO 2015129035A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- replication
- unit
- update
- update frequency
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
- G06F3/0619—Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2053—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
- G06F11/2094—Redundant storage or storage space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2097—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/065—Replication mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Definitions
- Embodiments of the present invention relate to a storage system.
- the storage system has a problem that when data replication is performed, the network communication bandwidth becomes a bottleneck, and congestion is induced in communication between the primary site and the backup site.
- a problem to be solved by the present invention is to provide a storage system capable of preventing congestion in communication (data transfer) between a primary site and a backup site when performing data replication. It is.
- the storage system includes a storage unit and a control unit.
- the storage unit stores data.
- the control unit performs replication of the updated data based on the data update frequency.
- FIG. 1 is a block diagram showing a configuration example of a storage system.
- the storage system 1000a includes a host computer 400 and a storage device 100a at the primary site.
- the storage system 1000a includes a host computer 500 and a storage device 200 at a backup site.
- the storage device 100a at the primary site and the storage device 200 at the backup site are connected via a communication line (network) 600.
- the data stored in the storage device 100a at the primary site is replicated (replicated) to the storage device 200 at the backup site by performing replication via the communication line 600.
- the backup site may be located at a remote location away from the primary site for disaster prevention purposes. For example, when a disaster occurs in the primary site, the backup site can execute the site switching process and take over the primary site process.
- the host computer 400 reads or writes data by outputting an I / O (Input / Output) command to the storage apparatus 100a.
- a unit of data read or written to the storage apparatus 100 by an I / O command from the host computer 400 is a data block of a predetermined size (4 KB to 64 KB).
- a data block that is a unit of data read from or written to the storage device 100a is simply referred to as data.
- the storage device 100a includes a control device 110a and a disk array device 180.
- the control device 110a includes a host interface 120, a control unit 130a, a program memory 140, a buffer memory 150, a network interface 160, and a disk interface 170.
- Each unit of the storage apparatus 100a is connected to each other via an internal bus.
- the disk array device 180 is composed of N hard disk drives 190-1 to 190-N.
- a RAID Redundant Arrays ofpInexpensive Disks
- N hard disk drives 190-1 to 190-N In order to simplify the description, it is assumed that one RAID is constructed in the disk array device 180, and there is one logical disk (logical volume) provided in this RAID. Of course, a plurality of RAIDs may be constructed in the disk array device 180. Similarly, a plurality of logical disks may be provided in one RAID.
- a logical disk is an area that is logically recognized as a disk drive by a host computer. In more detail, the host computer 400 reads / writes data from / to the storage device 100a.
- the host computer 400 reads / writes data from / to the logical disk.
- the host computer 400 reads / writes data from / to the logical disk to the expression that the host computer 400 reads / writes data from / to the storage device 100a.
- items common to the hard disk drives 190-1 to 190-N are denoted by “hard disk drive 190” by omitting the reference numerals 1 to N.
- All or a part of the disk array array device 180 may be configured by SSD (Solid State Drive).
- the disk interface 170 is connected to the hard disk drive 190.
- the host interface 120 is connected to the host computer 400 via a network.
- the program memory 140 stores a program (firmware) for operating the control unit 130a.
- the buffer memory 150 is a storage unit (work memory) used for various operations.
- the buffer memory 150 temporarily stores data stored in the disk array device 180 (primary storage).
- the buffer memory 150 temporarily stores data read from the disk array device 180.
- the buffer memory 150 stores metadata for managing statistical information.
- the statistical information is, for example, information indicating the update frequency of data recorded in the disk array device 180. Details of the metadata according to the first embodiment stored in the buffer memory 150 will be described later with reference to FIGS.
- the control unit 130a is constituted by a CPU (: Central Processing Unit).
- the frequency detection unit 131a, the policy management unit 132a, and the replication execution unit 133 of the control unit 130a are realized when the CPU executes a program stored in the program memory 140.
- the frequency detection unit 131a, the policy management unit 132a, and the replication execution unit 133 may be implemented (implemented) by hardware.
- the frequency detection unit 131a acquires the data update frequency by detecting the number of data updates in the storage device 100a.
- the frequency detection unit 131a registers the detected number of updates in the counter table as an update frequency (see FIG. 2).
- the count value of the data written to the address “0” of the storage apparatus 100a is registered as twice. This indicates that the data has been updated twice.
- entries corresponding to all addresses of the storage apparatus 100a are provided.
- the policy management unit 132a determines a policy related to reversion (hereinafter referred to as a replication policy) for data based on the data update frequency. More specifically, a replication policy for performing replication every time data is updated is referred to as “transmission for each writing”. On the other hand, a replication policy for performing replication at a predetermined cycle (for example, a 24-hour cycle) is referred to as “transmission every cycle”.
- the policy management unit 132a stores a replication policy determined for each data in a replication management table (see FIG. 3).
- the predetermined cycle may be a cycle predetermined as a recovery point target (RPO: Recovery : Point Objective).
- the recovery point target is a target recovery time point, and means a time (timing) at which data backup is acquired or a cycle at which data backup is acquired. This period is arbitrarily set based on the frequency at which data is updated in the system and how far the data is recovered when a disaster, accident or failure occurs in the system.
- data whose count value is other than “0” that is, the counter value is 1 or more
- the counter table that is, all updated data is transmitted to the backup site.
- the primary site and the backup site are synchronized.
- FIG. 3 shows an example of a replication policy management table (frequency journal table) that is metadata.
- the replication policy “transmission for each write” is defined for the data written to the address “0”.
- a replication policy “transmission every period” is defined for the data written to the address “A”.
- a replication policy “transmission for each writing” is defined for data written to the address “2A”.
- a replication policy “transmission every period” is defined for the data written to the address “3A”.
- entries corresponding to all addresses of the storage apparatus 100a are provided.
- the replication execution unit 133 executes replication of written (updated) data based on a replication policy set for each data. More specifically, the replication execution unit 133 determines whether or not the replication policy of the written (updated) data is transmission for each write based on the replication policy management table.
- the replication execution unit 133 When the policy of the written (updated) data is transmission for each write except for the time set as the recovery point target, the replication execution unit 133 writes the written (updated) data and the data
- the written address (primary site address) of the storage apparatus 100a is transmitted to the backup site via the network interface 160 together with the replication command. As a result, data replication is executed.
- the replication execution unit 133 displays all the data whose count value in the counter table is 1 or more and the address (primary site address) of the storage apparatus 100a to which this data is written. And the replication command are transmitted to the backup site via the network interface 160.
- the time set as the recovery point target is a periodic time (for example, a 5-minute cycle, a 24-hour cycle).
- the host computer 500 is connected to the host interface 220 of the storage apparatus 210.
- the host computer 500 reads or writes data by outputting an I / O command to the storage apparatus 210.
- the storage device 200 includes a control device 210 and a disk array device 280.
- the control device 210 includes a host interface 220, a control unit 230, a program memory 240, a buffer memory 250, a network interface 260, and a disk interface 270.
- Each unit of the storage apparatus 200 is connected to each other via an internal bus.
- the program memory 240 stores a program (firmware) for operating the control unit 230.
- the buffer memory 250 is a storage unit (work memory) used for various operations.
- the buffer memory 250 temporarily stores data stored in the disk array device 280 (primary storage).
- the buffer memory 250 temporarily stores data read from the disk array device 280.
- the buffer memory 250 stores an address table. Details of this address table will be described later with reference to FIG.
- the control unit 230 is composed of a CPU.
- the control unit 230 is realized by the CPU executing a program stored in the program memory 240.
- the control unit 230 may be implemented (implemented) by hardware.
- the disk array device 280 includes M hard disk drives 290 and a temporary storage area unit 300.
- a RAID is constructed by the hard disk drives 290-1 to 290-M.
- one RAID is constructed in the disk array device 280, and there is one logical disk (logical volume) provided in this RAID.
- a plurality of RAIDs may be constructed in the disk array device 280.
- a plurality of logical disks may be provided in one RAID.
- the hard disk drives 290-1 to 290-M are connected to the disk interface 270 of the control device 210.
- hard disk drive 290 items common to the hard disk drives 290-1 to 290-M are denoted by “hard disk drive 290” by omitting the reference numerals 1 to M.
- the disk array array 280 may be configured by SSD (Solid State Drive). More specifically, the host computer 500 reads / writes data from / to the storage device 200 is to read / write data from / to a logical disk.
- the host computer 500 reads and writes data to and from the logical disk, and the expression that the host computer 500 reads and writes data to and from the storage device 200 is unified.
- the disk interface 270 is connected to the hard disk drive 290.
- the temporary storage area unit 300 is composed of, for example, one hard disk drive and is connected to the disk interface 270 of the control device 210.
- the temporary storage area unit 300 may be composed of one logical disk provided in a RAID constructed in the disk array device 280.
- the control unit 230 temporarily stores the data to be replicated received by the network interface 260 via the communication line 600 in the buffer memory 250 together with the address (primary site address) of the data in the storage device 100a.
- the control unit 230 stores the data in the temporary storage area unit 300. Transfer to and memorize.
- the data temporarily stored in the temporary storage area unit 300 is finally stored in the disk array device 280 by the control unit 230 (replication).
- the control unit 230 associates the primary site address of the data stored in the temporary storage area unit 300 with the address indicating the storage position in the temporary storage area unit 300 and registers it in the address table provided in the buffer memory 250. .
- Fig. 4 shows an example of an address table.
- the address table includes a primary site address and a temporary storage area part address indicating a storage position in the temporary storage area part 300.
- the primary site address “0” is associated with the temporary storage area address “B1”. Further, the primary site address “3A” and the temporary storage area part address “B2” are associated with each other. Further, the primary site address “A” and the temporary storage area part address “B3” are associated with each other.
- the control unit 230 stores the data in the primary site of the disk array 280. Store at the same address as the address.
- FIG. 5 is a time chart showing an example of replication processing at the primary site. The horizontal axis indicates time.
- FIG. 5 shows operation procedures in the counter table (see FIG. 2) and the replication policy management table (see FIG. 3).
- Time T0 is an initial time for starting the replication operation.
- Time T1 is the time taken as the recovery point target (RPO) when a predetermined time has elapsed from time T0.
- T2 is the time taken as the recovery point target (RPO).
- the frequency detection unit 131a sets the count values of data stored in all addresses registered in the counter table to the value 0 and clears them. (Step S1)
- the policy management unit 132a sets the replication policy of each data to “transmission for each writing” and initializes the data stored in all addresses registered in the replication policy management table. (Step S2)
- the frequency detection unit 131a sets the value 1 to the count value (see FIG. 2) of the counter table corresponding to the written data. Add and increment the count value. (Step S3)
- the replication execution unit 133 displays the written data and the address (primary site address) where the data was written, along with the replication command, in the network interface 160. To the backup site (step S4). Thereafter, when data is written in the storage apparatus 100a by the time T1 which is the next recovery point target, the processes in the above steps S3 and S4 are similarly performed.
- the replication execution unit 133 transmits all data whose count value in the counter table is 1 or more to the backup site via the network interface 160 for replication along with the replication command. (Step S5)
- the policy management unit 132a initializes the replication policy associated with the data written to all addresses registered in the replication policy management table by setting “transmission for each write”. Thereafter, the policy management unit 132a changes the replication policy associated with the data written to the address whose count value in the counter table is 100 or more to “transmission every period”. (Step S6)
- Step S7 the frequency detection unit 131a clears the count values of the data written in all the addresses registered in the counter table to the value 0.
- FIG. 6 is a flowchart showing an example of replication processing at the primary site.
- the frequency detection unit 131a clears the count value of the data written in all addresses registered in the counter table (see FIG. 2) to the value 0. (Step Sa1)
- the policy management unit 132a sets the replication policy of data written in all addresses registered in the replication policy management table (see FIG. 3) to “transmission for each write” and initializes it. (Step Sa2)
- the frequency detection unit 131a waits for data writing to the disk array device 180, that is, data update. (Step Sa3)
- the frequency detection unit 131a adds the value 1 to the count value (see FIG. 2) of the counter table corresponding to the written (updated) data, and increments the count value. (Step Sa4)
- the replication execution unit 133 determines whether or not the replication policy of the written (updated) data is “transmission for each write” based on the replication policy management table. When the replication policy of the written data is “transmission for each write”, the replication execution unit 133 sends the written data and the primary site address of this data together with the replication command via the network interface 160 to the backup site. Send to. (Step Sa5)
- the replication execution unit 133 determines whether or not the current time has reached the recovery point target (RPO) time at which a predetermined time has elapsed from the time T0 (step Sa6). When the current time has reached the time of the recovery point target (RPO) (step Sa6: Yes), the replication execution unit 133 advances the process to step Sa7.
- step Sa6 if the current time has not reached the time of the recovery point target (RPO) (step Sa6: No), the replication execution unit 133 returns the process to step Sa3. (Step Sa6)
- the replication execution unit 133 sends the data written to all addresses whose count value (see FIG. 2) in the counter table is 1 or more and the primary site address of this data through the network interface 160 together with the replication command. Send to backup site. (Step Sa7)
- the policy management unit 132a initializes the replication policy of data written to all addresses registered in the replication policy management table (see FIG. 3) to “transmission for every writing”. Furthermore, the policy management unit 132a changes the replication policy of data written to an address whose count value (see FIG. 2) in the counter table is 100 or more to “transmission every period”.
- the target of data whose replication policy is changed to “transmission every cycle” is set to have a count value of 100 or more.
- the data is not limited to data having a count value of 100 or more.
- the system administrator may arbitrarily determine the replication policy of data with more than one count value to be changed to “transmission every period” depending on the system configuration, the system operation method, and the like. In other words, it is important to change the replication policy of data determined to have a high update frequency to “transmission every period”. (Step Sa8)
- the frequency detection unit 131a clears the count value of the data written in all the addresses registered in the counter table to the value 0. (Step Sa9) After the processing of step Sa9 is executed, the processing returns to step Sa3.
- FIG. 7 is a flowchart showing an example of replication processing at the backup site.
- the control unit 230 waits for a replication command transmitted from the primary site. (Step Sb1)
- the control unit 230 determines whether or not the replication command transmitted from the primary site is for replicating data in which the replication policy is set to transmit every write (see FIG. 3). (Step Sb2)
- step Sb2 When the replication policy is a data replication command that is set to transmit every write (step Sb2: Yes), the control unit 230 advances the process to step Sb6. On the other hand, if the replication policy is not a data replication command set to send every write, that is, if the replication policy is a data replication command set to send every cycle (step Sb2: No), the control unit 230 advances the process to step Sb3.
- step Sb3 the control unit 230 reads all the data temporarily stored in the temporary storage area unit 300 based on the temporary storage area unit address registered in the address table (see FIG. 4).
- the control unit 230 writes the data read from the temporary storage area unit 300 to the disk array device 280.
- the address at which the control unit 230 writes data to the disk array device 280 is the same address as the primary site address associated with the temporary storage area unit address from which the data was read in the address table. Replication is performed by writing this data to the disk array device 280.
- the control unit 230 receives the data attached to the replication command (replication command determined in step Sb2) transmitted from the primary site and the primary site address of the data.
- the control unit 230 writes the received data at the same address as the received primary site address of the disk array device 280. The data received by this data writing is replicated. (Step Sb4)
- the control unit 230 determines whether all the data attached to the replication command and the primary site address of the data have been received. When all the data and the primary site address of the data are received (step Sb5: Yes), the control unit 230 returns the process to step Sb1. On the other hand, when the data not received and the primary site address of the data remain (step Sb5: No), the control unit 230 returns the process to step Sb4. (Step Sb5)
- step Sb6 the control unit 230 receives the data attached to the replication command transmitted from the primary site and the primary site address of the data.
- the control unit 230 stores the received data in the temporary storage area unit 300.
- the control unit 230 registers the received data in the temporary storage area unit 300 (temporary storage area unit address) and the received primary site address in association with each other in the address table (see FIG. 4).
- the control unit 230 determines whether all the data attached to the replication command and the primary site address of the data have been received. When all the data and the primary site address of the data have been received (step Sb7: Yes), the control unit 230 returns the process to step Sb1. On the other hand, when the data not received and the primary site address of the data remain (step Sb7: No), the control unit 230 returns the process to step Sb6. (Step Sb7)
- FIG. 8 is a flowchart showing an example of site switching processing at the backup site. For example, when a disaster occurs in the primary site, the backup site can execute the site switching process and take over the primary site process.
- the backup site storage apparatus 200 stops the replication process in order to perform the site switching process. (Step Sc1)
- the control unit 230 reads all the data temporarily stored in the temporary storage area unit 300 based on the temporary storage area unit address registered in the address table (see FIG. 4).
- the control unit 230 writes the data read from the temporary storage area unit 300 to the disk array device 280.
- the address at which the control unit 230 writes data to the disk array device 280 is the same address as the primary site address associated with the temporary storage area unit address from which the data was read in the address table. Replication is performed by writing this data to the disk array device 280. (Step Sc2)
- the host computer 500 takes over the processing of the host computer 400 at the primary site and accesses the storage apparatus 200. (Step Sc3)
- a data replication policy is determined based on the update frequency, and data replication is performed based on the replication policy.
- a replication policy the following two are set to distinguish when data is replicated.
- One policy is “transmission at every writing” for replication every time data is updated.
- Another policy is “transmission every period” for replicating data at a predetermined period.
- FIG. 9 is a block diagram showing a configuration example of the storage system.
- the storage system 1000b includes a host computer 400 and a storage device 100b at the primary site.
- the data replication system 1000b includes a host computer 500 and a storage device 200 at a backup site.
- the storage device 100b at the primary site and the storage device 200 at the backup site are connected via a communication line (network) 600.
- the data stored in the storage device 100b at the primary site is replicated (replicated) to the storage device 200 at the backup site by performing replication via the communication line 600.
- the storage device 100b includes a control device 110b and a disk array device 180.
- the control device 110b includes a host interface 120, a control unit 130b, a program memory 140, a buffer memory 150, a network interface 160, and a disk interface 170.
- Each unit of the storage device 100b is connected to each other via an internal bus.
- the host interface 120 is connected to the host computer 400 via a network.
- the program memory 140 stores a program (firmware) for operating the control unit 130b.
- the buffer memory 150 is a storage unit (work memory) used for various operations.
- the buffer memory 150 temporarily stores data stored in the disk array device 180 (primary storage).
- the buffer memory 150 temporarily stores data read from the data block of the disk array device 180.
- the buffer memory 150 stores metadata for managing statistical information.
- the statistical information is, for example, information indicating the update frequency of data recorded in the disk array device 180. Details of the metadata according to the second embodiment will be described later with reference to FIG.
- the control unit 130b is constituted by a CPU (: Central Processing Unit).
- the frequency detection unit 131b, the policy management unit 132b, and the replication execution unit 133 of the control unit 130b are realized when the CPU executes a program stored in the program memory 140.
- the frequency detection unit 131a, the policy management unit 132b, and the replication execution unit 133 may be implemented (implemented) by hardware.
- the frequency detection unit 131b detects the time zone when the data is updated in the storage device 100b and registers it in the 2-bit map.
- Fig. 10 shows an example of a 2-bit map.
- the update time zone of all data written to the storage device 100b is registered in association with the address where the data is written.
- the update time zone registered in the 2-bit map is expressed by 2 bits. More specifically, it is expressed as follows.
- 01b Not updated 01b: First time zone (indicating that data was updated from 00 minutes to 19 minutes) 10b: Second time zone (indicating that data was updated from the 20th hour to the 39th hour time zone) 11b: Third time zone (indicating that the data was updated from 40 minutes to 59 minutes per hour)
- the update time zone of the data written at address 0 is recorded as value 2 in association with address 0 in the leftmost square of the top row. That is, it is registered that the data written in the address 0 is updated in the second time zone.
- the time zone for updating the data written in the address A in association with the address A is recorded as a value 3 in the rightmost square of the uppermost row. That is, it is registered that the data written in the address A is updated in the third time zone.
- the time zone for updating the data written in the address 3A in association with the address 3A is recorded as a value 0 in the rightmost square of the third row.
- the update time zone of all data written to the storage device 100b is expressed and registered in 2 bits, so that the capacity can be realized with a small capacity without being large. .
- entries corresponding to all addresses of the storage apparatus 100b are provided.
- the metadata may be stored in a storage unit of ternary logic.
- FIG. 11 is a flowchart showing replication processing at the primary site.
- the flowchart shown in FIG. 11 is repeatedly executed.
- RPO recovery point target
- the control unit 130b refers to the clock information including date information that the system (not shown) has, and waits until the current time is one hour 00 minutes, 20 minutes, or 40 minutes.
- the control unit 130b advances the process to step Sd2 when the current time is 00 minutes per hour, 20 minutes, or 40 minutes. (Step Sd1)
- the control unit 130b determines whether the current time is 00 minutes per hour, 20 minutes or 40 minutes. When it is 00 minutes per hour (step Sd2: 00 minutes per hour), the control unit 130b advances the process to step Sd3. When it is 20 minutes per hour (step Sd2: 20 minutes per hour), the control unit 130b advances the process to step Sd5. If it is 40 minutes per hour (step Sd2: 40 minutes per hour), the control unit 130b advances the process to step Sd7. (Step Sd2)
- the replication execution unit 133 scans the 2-bit map from its head and searches for updated data in 20 minutes 40 minutes or more before the current time 00 minutes. Are read from the disk array device 180. That is, the replication execution unit 133 reads from the disk array device 180 the data written in the address registered with the value 1 (01b in the 2-bit representation) in association with the 2-bit map. The replication execution unit 133 transmits a replication command with the read data and an address (primary site address) where the read data is recorded to the backup site. By transmitting this replication command to the backup site, the data updated in 20 minutes, which is 40 minutes or more before the current time of 00 minutes, is considered to be data with a low update frequency, and the backup process is performed. The replication command sent to this backup site is a short cycle processing replication command. Thereafter, the replication execution unit 133 initializes the 2-bit map by setting the value registered in association with the read data to 0. (Step Sd3)
- the replication execution unit 133 determines whether or not all of the two bitmaps have been scanned (step Sd4). When all of the two bitmaps have been scanned (step Sd4: Yes), the process proceeds to step Sd9. If the end of the 2-bit map has not been scanned (step Sd4: No), the process returns to step Sd3.
- the replication execution unit 133 searches the data updated in 20 minutes, 40 minutes or more before the current time 20 minutes with reference to the 2-bit map, and finds the searched data in the disk array.
- Read from device 180 That is, the replication execution unit 133 reads from the disk array device 180 the data written in the address registered in association with the value 2 (10b in 2-bit representation) with reference to the 2-bit map.
- the replication execution unit 133 transmits a replication command with the read data and an address (primary site address) where the read data is recorded to the backup site.
- the data updated in 20 minutes, which is 40 minutes or more before the current time of 20 minutes is regarded as data with a low update frequency, and the backup process is performed.
- the replication command sent to this backup site is a short cycle processing replication command.
- the replication execution unit 133 initializes the 2-bit map by setting the value registered in association with the read data to 0. (Step Sd5)
- the replication execution unit 133 determines whether or not all of the two bitmaps have been scanned (step Sd6). When all of the two bit maps have been scanned (step Sd6: Yes), the process proceeds to step Sd9. If the end of the 2-bit map has not been scanned (step Sd6: No), the process returns to step Sd5.
- the replication execution unit 133 searches for data updated in 20 minutes 40 minutes or more before the current time 40 minutes with reference to the 2-bit map, and searches for the searched data in the disk array.
- Read from device 180 That is, the replication execution unit 133 reads from the disk array device 180 the data written in the address registered with the value 3 (11b in the 2-bit representation) associated with the 2-bit map.
- the replication execution unit 133 transmits a replication command with the read data and an address (primary site address) where the read data is recorded to the backup site.
- the replication command sent to this backup site is a short cycle processing replication command.
- the replication execution unit 133 initializes the 2-bit map by setting the value registered in association with the read data to 0. (Step Sd7)
- the replication execution unit 133 determines whether or not all of the two bitmaps have been scanned (step Sd8). When all of the two bitmaps have been scanned (step Sd8: Yes), the process proceeds to step Sd9. If the end of the 2-bit map has not been scanned (step Sd8: No), the process returns to step Sd7.
- the control unit 130b refers to clock information including date information possessed by a system (not shown), and determines whether or not the current time has passed a predetermined time of midnight as a recovery point target (RPO).
- the control unit 130b can determine that the new midnight has passed by referring to the clock information by recording the date information when the last midnight has passed. (Step Sd9)
- step Sd9: Yes the policy management unit 132b advances the process to step Sd10.
- the fact that the current time has passed midnight means that the time of the recovery point target (RPO) has been reached.
- step Sd9: No the policy management unit 132b returns the process to step Sd1.
- the replication execution unit 133 reads data written from the disk array device 180 by referring to the 2-bit map and writing at an address associated with a value other than 0 (00b in 2-bit representation). That is, the replication execution unit 133 registers a value corresponding to 1 (01b in the 2-bit representation), 2 (10b in the 2-bit representation), or 3 (11b in the 2-bit representation) in the 2-bit map.
- the data written at the written address is read from the disk array device 180.
- the replication execution unit 133 transmits a replication command with the read data and an address (primary site address) where the read data is recorded to the backup site.
- Step Sd10 By transmitting such a replication command to the backup site, data having a high update frequency that has not been backed up in steps Sd3, Sd5, and Sd7 has been backed up.
- the replication command transmitted to this backup site is a long cycle processing replication command.
- the replication execution unit 133 initializes the 2-bit map by setting the value registered in association with the read data to 0. (Step Sd10)
- FIG. 12 is a flowchart showing replication processing at the backup site.
- the control unit 230 waits for reception of a replication command transmitted from the primary site. (Step Se1)
- the control unit 230 determines whether or not the replication command transmitted from the primary site is a short cycle processing replication command. That is, the control unit 230 determines that it is a replication command for short cycle processing when the current time is either 00 minutes, 20 minutes or 40 minutes. However, if the current time is 0:00 am, the control unit 230 can determine that the command is a long-period replication command.
- step Se2 When the replication command is a short cycle process (step Se2: Yes), the control unit 230 advances the process to step Se6. On the other hand, when it is not a replication command for short cycle processing (step Se2: No), the control unit 230 advances the processing to step Se3. In this case, the control unit 230 can determine that the received replication command is a replication command for long-period processing. (Step Se2)
- step Se6 the control unit 230 receives the data attached to the replication command transmitted from the primary site and the primary site address of the data.
- the control unit 230 stores the received data in the temporary storage area unit 300.
- the control unit 230 registers the received data in the temporary storage area unit 300 (temporary storage area unit address) in association with the received primary site address in the address table (see FIG. 4).
- the control unit 230 determines whether all the data attached to the replication command and the primary site address of the data have been received. When all the data and the primary site address of the data have been received (step Se7: Yes), the control unit 230 returns the process to step Se1. On the other hand, when the data that has not been received and the primary site address of the data remain (step Se7: No), the control unit 230 returns the process to step Se6. (Step Se7)
- step Se3 the control unit 230 reads all data temporarily stored in the temporary storage area unit 300 based on the temporary storage area unit address registered in the address table (see FIG. 4).
- the control unit 230 writes the data read from the temporary storage area unit 300 to the disk array device 280.
- the address at which the control unit 230 writes data to the disk array device 280 is the same address as the primary site address associated with the temporary storage area unit address from which the data was read in the address table. Replication is performed by writing this data to the disk array device 280.
- the control unit 230 receives the data attached to the replication command for the long-cycle process transmitted from the primary site (case determined as No in Step Se2) and the primary site address of the data.
- the control unit 230 writes the received data at the same address as the received primary site address of the disk array device 280. The data received by this data writing is replicated. (Step Se4)
- the control unit 230 determines whether all the data attached to the replication command and the primary site address of the data have been received. When all the data and the primary site address of the data have been received (step Se5: Yes), the control unit 230 returns the process to step Se1. On the other hand, when the data that has not been received and the primary site address of the data remain (step Se5: No), the control unit 230 returns the process to step Se4. (Step Se5)
- the time zone in which the data is updated is registered in the 2-bit map, and two types of data replication are realized based on the update time zone of the data registered in the 2-bit map.
- One is to replicate data updated at 20 minutes 40 minutes or more before the current time at a predetermined time every hour, for example, 00 minutes every hour, 20 minutes every hour, and 40 minutes every hour. This is a short-cycle replication.
- the other is to replicate all the updated data at a predetermined time, for example, a recovery point target (RPO) time. This is a long-cycle process replication.
- RPO recovery point target
- metadata for managing replication is realized by a 2-bit map. Therefore, it can be realized with a small storage capacity without enlarging the capacity of the metadata.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
Abstract
Description
以下、第1実施形態におけるストレージトステムについて、図面を参照して説明する。図1には、ストレージシステムの構成例が、ブロック図により示されている。
ホストコンピューター400は、ストレージ装置100aにI/O(Input/Output)命令を出力してデータの読み出し又は書き込みを行う。ホストコンピューター400からのI/O命令によりストレージ装置100に対して読み出し又は書き込みされるデータの単位は、所定サイズ(4KB~64KB)のデータブロックである。以後、このストレージ装置100aに対して読み出し又は書き込みされるデータの単位であるデータブロックを単にデータと呼ぶ。
ホストコンピューター500は、ストレージ装置210のホストインタフェース220と接続されている。ホストコンピューター500は、ストレージ装置210にI/O命令を出力してデータの読み出し又は書き込みを行う。
以下、第2実施形態におけるストレージシステムについて、図面を参照して説明する。第2実施形態では、データレプリケーションを実行するためのメタデータが、第1実施形態と相違する。以下では、第1実施形態との相違点についてのみ説明する。
ストレージ装置100bは、制御装置110bと、ディスクアレイ装置180とを備える。制御装置110bは、ホストインタフェース120と、制御部130bと、プログラムメモリ140と、バッファメモリ150と、ネットワークインタフェース160と、ディスクインタフェース170とを備える。ストレージ装置100bの各部は、内部バスを介して、互いに接続されている。
01b:第1の時間帯(毎時00分から19分の時間帯にデータが更新されたことを示す)
10b:第2の時間帯(毎時20分から39分の時間帯にデータが更新されたことを示す)
11b:第3の時間帯(毎時40分から59分の時間帯にデータが更新されたことを示す)
図11は、プライマリサイトでのレプリケーション処理を示すフローチャートである。図11に示すフローチャートは、繰り返し実行される。図11では、一例として、毎日の午前0時がリカバリポイント目標(RPO)と予め定められているものとして説明する。
Claims (6)
- プライマリサイトに設けられた、データを記録するための第1のストレージ装置と、バックアップサイトに設けられた、データを記録するための第2のストレージ装置とを持つストレージシステムにおいて、
前記第1のストレージ装置は、
データを記憶するための記録部と、
前記記録部に記録されたデータの更新頻度に基づいて、レプリケーションするために前記記憶部に記録されたデータをバックアップサイトへ転送する制御部と、
を具備するストレージシステム。 - 前記制御部は、
前記記録部に記録されたデータの更新頻度を検出する更新頻度検出部と、
前記更新頻度検出部で検出された更新頻度を前記記憶部に記憶されたデータ毎に対応付けて記憶する更新頻度記憶部と、
前記更新頻度記憶部に記憶された更新頻度に基づいて、前記記録部に記録されたデータ毎にレプリケーションポリシーを決定するポリシー決定部と、
前記ポリシー決定部で決定されたレプリケーションポリシーを前記記憶部に記憶されたデータ毎に対応付けて記憶するレプリケーションポリシー記憶部と、
前記レプリケーションポリシー記憶部に記憶されたレプリケーションポリシーに基づいて、レプリケーションするためにバックアップサイトへ前記記録部に記録されたデータを転送するレプリケーション実行部と
を具備する
請求項1記載のストレージシステム。 - 前記レプリケーション実行部は、
更新される毎にレプリケーションするとのポリシーが前記レプリケーションポリシー記憶部に対応付けて記憶されているデータが更新されたとき、当該更新されたデータをレプリケーションするためにバックアップサイトに転送する
請求項2記載のストレージシステム。 - 前記レプリケーション実行部は、
所定時間経過毎に到来する時刻において、前記更新頻度記憶部に記憶されている更新頻度に基づいて、更新頻度が1以上の前記記録部に記録されたデータを、レプリケーションするためにバックアップサイトに転送する、
前記更新頻度検出部は、
前記レプリケーション実行部が、更新頻度が1以上の前記記録部に記録されたデータをバックアップサイトに転送した後に、前記更新頻度記憶部に記憶されている全ての更新頻度をゼロに設定する
請求2記載のストレージシステム。 - 前記レプリケーション実行部は、
所定時間経過毎に到来する時刻において、前記更新頻度記憶部に記憶されている更新頻度に基づいて、更新頻度が1以上の前記記録部に記録されたデータを、レプリケーションするためにバックアップサイトに転送する、
前記ポリシー決定部は、
前記レプリケーション実行部が、更新頻度が1以上の前記記録部に記録されたデータをバックアップサイトに転送した後に、
前記更新頻度記憶部に対応付けて記憶されている更新頻度が所定値以下の前記記録部に記録されたデータに対して、更新される毎にレプリケーションするとのポリシーを決定する
請求項2記載のストレージシステム。 - 前記制御部は、
1時間を第1の所定時間間隔で区切った複数の時間帯のうちのどの時間帯にデータが更新されたかを検出する更新時間帯検出部と、
前記更新時間帯検出部が検出した更新時間帯を示す更新時間帯情報を前記記憶部に記憶されたデータ毎に対応付けて記憶する更新時間帯情報記憶部と、
前記更新時間帯情報記憶部を参照し、前記更新時間帯情報記憶部に記憶されている更新時間帯情報に基づいて、前記記憶部に記憶されたデータをレプリケーションするためにバックアップサイトへ転送するレプリケーション実行部と、
を具備し、
前記レプリケーション実行部は、
前記1時間を第1の所定時間間隔で区切った時刻毎において、当該時刻よりも所定数の前記時間間隔前の時刻を起点として当該時刻よりも前記第1の所定時間間隔前までの時間帯に更新されたデータを、更新頻度が低いデータとしてレプリケーションするためにバックアップサイトへ転送し、
第2の所定時間経過毎に到来する時刻において、更新時間帯情報としてデータが更新されたことが設定されている全てのデータを、更新頻度が高いデータとしてレプリケーションするためにバックアップサイトへ転送する
前記更新時間帯検出部は、
前記レプリケーション実行部がバックアップサイトへ転送したデータに対応付けて前記更新時間帯情報記憶部に記憶されている更新時間帯情報を更新なしと設定する
請求項1記載のストレージシステム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201480035760.4A CN105339886B (zh) | 2014-02-28 | 2014-02-28 | 储存系统 |
US14/900,409 US9910608B2 (en) | 2014-02-28 | 2014-02-28 | Storage system with update frequency based replication |
JP2016504975A JP6030269B2 (ja) | 2014-02-28 | 2014-02-28 | ストレージシステム |
PCT/JP2014/055111 WO2015129035A1 (ja) | 2014-02-28 | 2014-02-28 | ストレージシステム |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2014/055111 WO2015129035A1 (ja) | 2014-02-28 | 2014-02-28 | ストレージシステム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015129035A1 true WO2015129035A1 (ja) | 2015-09-03 |
Family
ID=54008400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/055111 WO2015129035A1 (ja) | 2014-02-28 | 2014-02-28 | ストレージシステム |
Country Status (4)
Country | Link |
---|---|
US (1) | US9910608B2 (ja) |
JP (1) | JP6030269B2 (ja) |
CN (1) | CN105339886B (ja) |
WO (1) | WO2015129035A1 (ja) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10802747B2 (en) * | 2017-11-30 | 2020-10-13 | Veritas Technologies Llc | Performing backup operations using replicas |
US11609775B2 (en) | 2019-04-30 | 2023-03-21 | Rubrik, Inc. | Systems and methods for continuous data protection comprising storage of completed I/O requests intercepted from an I/O stream using touch points |
US11663089B2 (en) | 2019-04-30 | 2023-05-30 | Rubrik, Inc. | Systems and methods for continuous data protection |
US11663092B2 (en) | 2019-04-30 | 2023-05-30 | Rubrik, Inc. | Systems and methods for continuous data protection |
US11500664B2 (en) | 2019-04-30 | 2022-11-15 | Rubrik, Inc. | Systems and method for continuous data protection and recovery by implementing a set of algorithms based on the length of I/O data streams |
US11061601B2 (en) * | 2019-04-30 | 2021-07-13 | Rubrik, Inc. | Systems and methods for continuous data protection |
US11436251B2 (en) * | 2020-10-02 | 2022-09-06 | EMC IP Holding Company LLC | Data size based replication |
TWI788084B (zh) * | 2021-11-03 | 2022-12-21 | 財團法人資訊工業策進會 | 運算裝置以及資料備份方法 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10308755A (ja) * | 1997-04-21 | 1998-11-17 | Alcatel Alsthom Co General Electricite | ネットワークにインストールされたデータ受信ステーションを備えるシステム |
JP2006236019A (ja) * | 2005-02-25 | 2006-09-07 | Hitachi Ltd | データコピー方式の切替方法 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6397307B2 (en) * | 1999-02-23 | 2002-05-28 | Legato Systems, Inc. | Method and system for mirroring and archiving mass storage |
US7130874B2 (en) * | 2002-03-12 | 2006-10-31 | International Business Machines Corporation | Method, system, and program for maintaining data in a distributed computing environment for processing transaction requests |
JP4806037B2 (ja) | 2009-01-26 | 2011-11-02 | 株式会社東芝 | データ記憶システム及び非同期レプリケーション方法 |
CN102932621B (zh) * | 2011-08-08 | 2015-05-20 | 杭州海康威视数字技术股份有限公司 | 一种存储数据的方法及装置 |
-
2014
- 2014-02-28 US US14/900,409 patent/US9910608B2/en active Active
- 2014-02-28 WO PCT/JP2014/055111 patent/WO2015129035A1/ja active Application Filing
- 2014-02-28 JP JP2016504975A patent/JP6030269B2/ja active Active
- 2014-02-28 CN CN201480035760.4A patent/CN105339886B/zh active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10308755A (ja) * | 1997-04-21 | 1998-11-17 | Alcatel Alsthom Co General Electricite | ネットワークにインストールされたデータ受信ステーションを備えるシステム |
JP2006236019A (ja) * | 2005-02-25 | 2006-09-07 | Hitachi Ltd | データコピー方式の切替方法 |
Also Published As
Publication number | Publication date |
---|---|
US20160357460A1 (en) | 2016-12-08 |
JP6030269B2 (ja) | 2016-11-24 |
CN105339886A (zh) | 2016-02-17 |
US9910608B2 (en) | 2018-03-06 |
CN105339886B (zh) | 2018-07-10 |
JPWO2015129035A1 (ja) | 2017-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6030269B2 (ja) | ストレージシステム | |
US10152527B1 (en) | Increment resynchronization in hash-based replication | |
JP4727437B2 (ja) | データベースを有するストレージシステムの記憶制御方法 | |
US10860447B2 (en) | Database cluster architecture based on dual port solid state disk | |
JP2005025683A (ja) | 記憶システム及び記憶装置システム | |
KR20150022630A (ko) | 스토리지 디바이스 및 데이터 처리 방법 | |
US10459813B2 (en) | System and device for synchronizing data in a plurality of devices | |
US20170060450A1 (en) | Programmable memory command sequencer | |
JP2013061795A (ja) | 記憶装置、コントローラ、およびリードコマンド実行方法 | |
US10162720B2 (en) | Copy-on-read process in disaster recovery | |
JP2016212551A (ja) | ストレージ制御装置、ストレージ制御プログラム、およびストレージシステム | |
CN103198050A (zh) | 提供高完整性处理的方法 | |
CN111045865A (zh) | 一种基于块复制的实时同步方法及系统 | |
US10901868B1 (en) | Systems and methods for error recovery in NAND memory operations | |
JP2012058863A (ja) | ディスク装置、および、ディスク装置へのデータ複製方法、プログラム | |
US10698638B2 (en) | Data transmission method and host system using the same | |
JP4936088B2 (ja) | ディスクアレイ装置、ディスクアレイシステム、及びキャッシュ制御方法 | |
US20100325373A1 (en) | Duplexing Apparatus and Duplexing Control Method | |
US20180356987A1 (en) | Information processing system, information processing apparatus, and information processing method | |
JP2012003621A (ja) | ディスクアレイ装置間のリモートコピー処理システム、処理方法、及び処理用プログラム | |
US20120215966A1 (en) | Disk array unit and control method thereof | |
JP5963324B2 (ja) | 仮想シーケンシャルアクセスボリュームのデータのコピー方法、システム | |
JP6927725B2 (ja) | ストレージ装置、レプリケーションシステム及びレプリケーション方法 | |
WO2015166741A1 (ja) | インメモリ管理システムおよびインメモリ管理用プログラム | |
US20150253996A1 (en) | Access control method and data storage device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201480035760.4 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14883882 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016504975 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14900409 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14883882 Country of ref document: EP Kind code of ref document: A1 |