WO2018055686A1

WO2018055686A1 - Information processing system

Info

Publication number: WO2018055686A1
Application number: PCT/JP2016/077790
Authority: WO
Inventors: 正義大原; 貴彦武田; 孝坂口; 陽介加藤; 憲亮成田
Original assignee: 株式会社日立製作所
Priority date: 2016-09-21
Filing date: 2016-09-21
Publication date: 2018-03-29

Abstract

An information processing system according to an embodiment of the present invention includes: a main system provided with a plurality of main storage devices; and a subsystem provided with a plurality of main storage devices, wherein the main storage device and the sub-storage device system each have one or more volumes and journal volumes. When receiving a write command including a time stamp, the main system creates and transmits, to the main storage device, a journal creation command including the write data and time stamp. The main storage device, which has received the journal creation command, stores the write data in the volume(s) and creates and stores, in the journal volume(s), a journal including the write data and time stamp. When acquiring a plurality of journals created by the main system, the subsystem determines, on the basis of the time stamp included in each journal, a journal that is transmittable to the sub-storage device, and creates and issues, to the sub-storage device, a journal restore command on the basis of the content of the determined journal. In addition, the sub-storage device, which has received the journal restore command, stores, in the volume included in the sub-storage device, the write data included in the journal restore command.

Description

Information processing system

The present invention relates to a volume copy technology between storage apparatuses.

Currently, many storage apparatuses provide reliability exceeding the reliability of a single HDD by adopting high reliability technology such as RAID (Redundant Array of Independent (or Inexpensive) Disks) technology. However, due to the recent evolution of the information society, there is a scene where the reliability that can be provided by RAID technology is insufficient.

There is a technique for storing data replicas in a plurality of storage devices as a technique for increasing the availability of the storage system. By storing data in a plurality of storage devices, data loss can be prevented even if one storage device stops due to a failure. For example, Patent Document 1 discloses a data processing system that stores data from a host computer in a plurality of storage systems. In the data processing system disclosed in Patent Document 1, a data replication process is performed by a storage system.

In the data processing system disclosed in Patent Document 1, data replication processing using a journal is performed in order to keep the data update order. The journal is information including update data and an identifier (for example, update time) indicating the data update order. In the data copy source storage system (primary storage system), when update data is written to the volume (primary volume), a journal is created and stored in the journal volume. In the data copy destination storage system (secondary storage system), when a journal created in the primary storage system is acquired, the updated data is reflected in the secondary storage system volume (secondary volume) according to the identifier such as the update time included in the journal. To do.

JP 2005-316684 A

In the data processing system disclosed in Patent Document 1, since the storage system is in charge of data replication processing, the load on the host computer does not increase, but the storage system bears the burden of data replication processing. Particularly when the amount of data is large, the processing load for creating and managing journals adversely affects performance.

An information processing system according to an embodiment of the present invention includes a primary system having a first processor, a first memory, and a plurality of primary storage devices, a second processor, a second memory, and writing to the primary storage device. And a secondary system having a plurality of secondary storage devices for storing replicated data. Each of the primary storage device and the secondary storage device has one or more volumes and a journal volume. When the first processor receives a write command including a time stamp, it creates a journal creation command including the write data and the time stamp. The primary storage device that has sent the primary storage device and received the journal creation command stores the write data in the volume, writes the write data and information about the write position of the write data, the sequence number and time stamp managed by the primary storage device Is created and stored in the journal volume.

When the second processor acquires a plurality of journals stored in journal volumes of a plurality of primary storage devices, the second processor determines a journal that can be transmitted to the secondary storage device based on a time stamp included in each of the journals. A journal restore command is created based on the determined contents of the journal and issued to the secondary storage device. The secondary storage device that has received the journal restore command stores the write data included in the journal restore command in the volume of the secondary storage device.

In the storage system of the present invention, the load of remote copy processing can be distributed to a plurality of CPUs, enabling performance improvement and ensuring data consistency.

1 is a configuration diagram of an information processing system including a storage system according to an embodiment of the present invention. It is a block diagram of SSD. 1 is a logical configuration diagram of a storage system. It is a conceptual diagram which shows the relationship between a RAID group and a logical volume. It is an example of a pair management table. It is a figure explaining the outline | summary of the structure of a journal. An example of a journal creation command and a journal restore command and its response information is shown. The contents of the JVOL management table are shown. It is a structural example of journal management information. Indicates the contents of the logical-physical conversion table. Indicates the contents of the physical page table. The flow of a write process is shown. The flow of journal transmission processing from the primary storage apparatus to the secondary storage apparatus is shown. The flow of journal restore processing is shown. It is a figure showing the concept of the journal restore process. It is an example of a structure of a generation management table. It is a flow of a time stamp update process. FIG. 10 is a part of a flow of journal transmission processing from the primary storage apparatus to the secondary storage apparatus in the storage system according to the second embodiment. FIG. 10 is a conceptual diagram illustrating a logical page to physical page mapping method in the SSD according to the third embodiment. FIG. 10 is a configuration diagram of an information processing system according to a fourth embodiment.

Hereinafter, an information processing system according to an embodiment of the present invention will be described with reference to the drawings. Note that the present invention is not limited to the embodiments described below.

In the following description, “program” may be used as the subject, but precisely, the program is executed by a CPU (processor) to perform a predetermined process. However, to prevent the explanation from becoming redundant, the program may be described as the subject. Therefore, in the following description, the process described with the program as the subject actually means that the CPU executes the process. Further, part or all of the program may be realized by dedicated hardware. Various programs may be installed in each apparatus by a program distribution server or a computer-readable storage medium. The computer-readable storage medium is a non-transitory computer-readable medium such as a non-volatile storage medium such as an IC card, an SD card, or a DVD.

First, various terms used in the embodiments described below will be described.

“Volume” refers to a storage area provided by a target device such as a storage device or storage device to a device (referred to as an initiator or a requester) that issues a request to the target device, such as a host or storage controller. It means (memory space). Also, the storage apparatus according to the embodiment described below creates one logical volume using volumes provided from a plurality of storage devices, and provides this logical volume to a requester such as a host. be able to. This logical volume is referred to herein as a “logical volume” or “logical device”.

“Remote copy” means a process of creating a copy of a storage device volume in a volume of another storage device. In the embodiment described below, the storage apparatus has a function of performing remote copy. When the storage apparatus receives write data for the volume from the host, the storage apparatus writes the write data to each volume of the two storage apparatuses.

Also, when data is stored in two volumes by remote copy, the volume in which data is stored from the host is called “primary volume” or “P-VOL”. A volume in which a copy of data written to the primary volume is stored is called a “secondary volume” or “S-VOL”. A pair of a primary volume and a secondary volume is called a “volume pair” or “pair”.

リモート When data is stored in the secondary volume by remote copy, it may be required to perform a write process that ensures data consistency. In this specification, when a plurality of data is stored in each secondary volume, if the order in which each data is written is the same as the order in which each data is written from the host computer to the primary volume, “data consistency is Called “guaranteed”.

Especially in mission critical systems, it is designed with the expectation that each data written to the storage device from the host computer will be stored in the storage medium of the storage device in the order in which write requests are issued, taking into account recovery in the event of a failure. Have been. For this reason, when the host computer first writes data a to the primary volume and then writes data b, it is expected that the data a is also written to the secondary volume of the secondary storage device before the data b. On the other hand, if data b is written before data a and a failure occurs immediately after data b is written, the replicated data stored in the secondary volume of the secondary storage device becomes inconsistent and the business continuation Things that cannot be done may occur.

∙ Regarding data consistency, guarantee of data consistency across multiple volumes (or volume pairs) may be required. The host computer writes data a to the first primary volume (referred to as P-VOL1) at time t1, and subsequently writes the second primary volume (referred to as P-VOL2) data b at time t2 (t2> t1). Assume that data is written and data c is written to P-VOL1 at time t3 (t3> t2). The pair volume of P-VOL1 is called S-VOL1, and the pair volume of P-VOL2 is called S-VOL2.

At this time, if the secondary storage device operates to write data c to S-VOL 1 after first writing data a to S-VOL 1 and subsequently writing data b to S-VOL 2, P-VOL 1 and S Data consistency across the -VOL1 pair and the P-VOL2 and S-VOL2 pair is guaranteed. Thus, the storage apparatus according to the embodiment described below can guarantee data consistency across a plurality of volume pairs. A set of volume pairs controlled so as to guarantee data consistency is called a “consistency group”. The consistency group may be abbreviated as “CTG”.

FIG. 1 shows a configuration of an information processing system (computer system) including a storage system according to an embodiment of the present invention. The information processing system includes a storage device 1a, a storage device 1b connected to the storage device 1a, one or more hosts 2, and a host 2 '.

The storage device 1 a is connected to the host 2 via a storage area network (SAN) 3. The host 2 is a computer that executes an application program used by a user, and operates as a requester that issues an I / O request to the storage apparatus 1a. In the first embodiment, the host 2 is a general-purpose computer such as a main frame or a personal computer. An application program or the like executed on the host 2 issues an I / O request such as a read command or a write command to a volume (logical volume) provided by the storage apparatus 1a. The host 2 'is a computer similar to the host 2, and is connected to the storage apparatus 1b via the SAN 3'. The host 2 ′ and the storage device 1 b are so-called standby computer systems, and are used when the host 2 or the storage device 1 a stops operating due to a failure or the like.

The copy manager 6 is the same computer as the host 2 and is connected to the storage apparatus 1a via the SAN 3. The copy manager 6 is a computer for issuing a control instruction related to the remote copy function described later, but is not an essential device in the computer system according to the first embodiment. The copy manager 6 will be described in a second embodiment described later.

SAN 3 is a network composed of, for example, fiber channel cables and fiber channel switches. The storage device 1a is connected to the storage device 1b via a wide area network (WAN) 4. However, the storage apparatus 1a and the storage apparatus 1b may be connected via the SAN 3. In this embodiment, the combination of the storage device 1a and the storage device 1b connected by the WAN 4 is referred to as a “storage system”.

The storage device 1a includes a storage controller (hereinafter also abbreviated as “controller”) 10 and a disk unit 11 including a plurality of storage devices. The storage controller 10 includes a CPU 12 that executes control such as I / O processing performed in the storage apparatus 1a, a memory 13, a front-end interface (FE I / F) 14 that is a data transfer interface with the host 2 and the storage apparatus 1b, It has a back-end interface (BE I / F) 15 and a management I / F 16 that are data transfer interfaces with the disk unit 11, and these are interconnected via an internal switch 17. In addition, the number of each component, such as CPU12 and FE I / F14, is not limited to the number shown by FIG.

The disk unit 11 is equipped with a plurality of storage devices for storing write data from the host 2, and each storage device is connected to the storage controller 10 via the BE I / F 15. The storage device is, for example, an SSD 21 that uses a nonvolatile semiconductor memory such as a flash memory as a storage medium, or an HDD 22 that uses a magnetic disk as a storage medium. Hereinafter, a case where all the storage devices mounted on the disk unit 11 are SSDs 21 will be described unless otherwise specified.

BE I / F 15 has at least an interface controller and a transfer circuit. The interface controller is a component for converting a protocol (SAS in one example) used by the storage device 20 into a communication protocol (PCI-Express as an example) used in the storage controller 10. The transfer circuit is used when the storage controller 10 transfers data (read, write) to the storage device.

FE I / F 14 has at least an interface controller and a transfer circuit, similarly to BE I / F 15. The interface controller of the FE I / F 14 is for converting the communication protocol (for example, fiber channel) used in the data transfer path between the host 2 and the storage controller 10 and the communication protocol used in the storage controller 10. belongs to.

The CPU 12 performs various controls of the storage device 1a. The memory 13 is used to store programs executed by the CPU 12 and various management information of the storage device 1a used by the CPU 12. The memory 13 is also used for temporarily storing I / O target data for the storage device. Hereinafter, the storage area in the memory 13 used for temporarily storing the I / O target data for the storage device is referred to as “cache”. The memory 13 is composed of a volatile storage medium such as DRAM or SRAM, but as another embodiment, the memory 14 may be composed of a non-volatile memory.

The storage device 1b is a device having the same components as the storage device 1a (the internal configuration is not shown in FIG. 1). However, the number of components (CPU 12 and the like) does not have to be the same as that of the storage device 1a. Hereinafter, when the functions and the like shared by the storage device 1a and the storage device 1b are described, the storage device 1a and the storage device 1b are not distinguished from each other and are described as “storage device 1”.

Although not explicitly shown in FIG. 1, a management host may also be connected to the storage device 1b. The management host may be a different terminal from the management host 5 connected to the storage apparatus 1a, or the management host 5 connected to the storage apparatus 1a may be connected to the secondary storage. Also good.

FIG. 2 is a diagram illustrating a configuration example of the SSD 21. The SSD 21 includes an SSD controller 200 and a plurality of FM chips 206. The SSD controller 200 includes a processor (CPU) 201, an upstream I / F 202, a downstream I / F 203, and a memory 204, which are interconnected via an internal connection switch 205.

The upstream I / F 202 is an interface controller for performing communication between the SSD 21 and the storage controller 10. The upstream I / F 202 is connected to the BE I / F 15 of the storage controller 10 via a transmission line (SAS link or PCI link). On the other hand, the downstream I / F 203 is an interface controller for performing communication between the SSD controller 200 and the FM chip 206.

The CPU 201 performs processing related to various commands coming from the storage controller 10. The memory 204 stores various management information such as a control program 1500 that is a program for controlling the SSD 21, a logical physical conversion table 1400, a physical page management table 1450, and sequence number management information 1170. A part of the memory 204 is also used as a buffer for temporarily storing write data transmitted from the storage controller 10 together with a write command and data read from the FM chip 206. Hereinafter, an area used as a buffer in the area of the memory 204 is referred to as a “buffer area”. As the memory 204, a volatile memory such as a DRAM is used. However, a nonvolatile memory may be used for the memory 202.

FM chip 206 is a non-volatile semiconductor memory chip such as a NAND flash memory. As is well known, reading / writing of data in the flash memory is performed for each area of a predetermined size (for example, 8 KB) called a page in a set of a plurality of cells. Data erasure is performed for each block which is a set of pages.

In general, “SSD” is a storage device using a semiconductor memory, particularly a nonvolatile semiconductor memory, and means a device having the same form factor as an HDD. However, in the present embodiment, SSD is used as a word meaning an entire storage device including a plurality of flash memories and a controller for controlling them, and the external shape and the like are limited to general HDD and SSD form factors. It is not something. In addition to the flash memory, the nonvolatile semiconductor memory used for the SSD is a magnetoresistive memory such as MRAM (Magnetoretic Random Access Memory), a resistance variable memory ReRAM (Resistivity Random Access Memory), and a ferroelectric memory. Various semiconductor memories such as a certain FeRAM (Ferroelectric random access memory) may be used.

Subsequently, the logical configuration of the storage system and the program executed in the storage system according to the first embodiment of the present invention will be described with reference to FIG. At least the I / O program 101 and the journal transmission program 102 are executed in the storage controller 10 of the storage apparatus 1a, and at least the journal read program 151 and the destage program 152 are executed in the storage controller 10 of the storage apparatus 1b. The storage device 1a also has management information of a pair management table 300, primary journal management information 600, and JVOL management table 800. The secondary storage device 1b has a pair management table 300, secondary journal management information 650, and a JVOL management table 800. These pieces of information are stored in the memory 13.

The storage device 1a may be configured to be able to execute programs (journal read program 151 and destage program 152) executed by the storage device 1b in addition to the programs described above. Conversely, the storage apparatus 1b may be configured to be able to execute programs (I / O program 101 and journal transmission program 102) executed by the storage apparatus 1a in addition to the programs described above. However, in the following description, an example will be described in which the storage apparatus 1a executes the I / O program 101 and the journal transmission program 102, and the storage apparatus 1b executes the journal read program 151 and the destage program 152.

First, before explaining these programs and management information, the volumes handled by the storage apparatus 1 and the remote copy function will be explained.

First, the volume will be explained. The SSD 21 installed in the storage apparatus 1 according to the present embodiment has a function of providing one or more volumes to the storage controller 10. In this embodiment, the volume that the SSD 21 provides to the storage controller 10 is referred to as a “physical volume”.

The physical volume will be described with reference to FIG. The storage apparatus 1a illustrated in FIG. 3 has at least three SSDs 21. In the following description, these three SSDs 21 are represented as SSD 21a-1, SSD 21a-2, and SSD 21a-3, respectively. Similarly, the storage apparatus 1b has at least three SSDs 21. These three SSDs 21 are represented as SSD 21b-1, SSD 21b-2, and SSD 21b-3, respectively.

Also, the objects 210-1, 211-1, 212-1, 213-1 shown in FIG. 3 are physical volumes formed by the SSD 21a-1, and objects 210-2, 211-2, 212-2, Reference numeral 213-2 is a physical volume formed by the SSD 21a-2, and objects 210-3, 211-3, 212-3, and 213-3 are physical volumes formed by the SSD 21a-3.

Also, the SSD 21b-1, SSD 21b-2, and SSD 21b-3 included in the storage apparatus 1b can define a plurality of physical volumes. Objects 210-1 ′, 211-1 ′, 212-1 ′, and 213-1 ′ are physical volumes formed by the SSD 21b-1, and objects 210-2 ′, 211-2 ′, 212-2 ′, and 213. −2 ′ is a physical volume formed by the SSD 21b-2, and objects 210-3 ′, 211-3 ′, 212-3 ′, and 213-3 ′ are physical volumes formed by the SSD 21b-3.

Note that the number of physical volumes formed by each SSD 21 is arbitrary, and is not limited to the number shown in FIG. Also, the number of users and the capacity of the physical volumes formed by each SSD 21 can be determined by the user (the administrator of the storage apparatus 1). In this embodiment, an example will be described in which the physical volumes 211-1, 211-2, 211-3 and the physical volumes 211-1 ', 211-2', 211-3 'have the same capacity. Also, the physical volumes 212-1, 212-2, 212-3, 212-1 ′, 212-2 ′, 212-3 ′ have the same capacity, and the physical volumes 212-1, 212-2, 212 are also the same. −3, 212-1 ′, 212-2 ′, and 212-3 ′ have the same capacity.

On the other hand, the storage controller 10 according to the present embodiment operates a plurality of physical volumes provided by the SSD 21 as RAID (Redundant Array of Independent (or Independent) Disks). In this embodiment, a physical volume group operated as RAID is called a RAID group.

An example of a RAID group will be described with reference to FIGS. For example, the storage controller 10 of the storage device 1a forms a RAID group using one physical volume provided by each of the SSDs 21a-1, 21a-2, 21a-3. For example, the storage apparatus 1a forms one RAID group using the physical volumes 211-1, 211-2, and 211-3, and uses the physical volumes 212-1, 212-2, and 212-3 to One RAID group is formed, and further, one RAID group is formed using the physical volumes 213-1, 213-2, and 213-3. Here, an example is described in which one RAID group is formed from three physical volumes, but the number of physical volumes used to form a RAID group may be arbitrary. Similarly, the storage apparatus 1b forms one RAID group from a plurality (three in this case) of physical volumes.

The storage controller 10 manages the storage space on the physical volume by dividing it into a plurality of fixed-size storage areas called stripe blocks (for example, 64 KB). In FIG. 4, each of a plurality of rectangular boxes in the physical volumes 212-1, 212-2, 212-3 represents a stripe block.

As is well known, redundant data (parity, etc.) generated using data is stored in the RAID group in addition to data. Of the stripe blocks, a stripe block in which data is stored is called a “data stripe”, and a stripe block in which redundant data is stored is called a “parity stripe”. In FIG. 4, the stripe blocks described as “P” such as (P0) and (P1) represent parity stripes, and the stripe blocks described as “D” such as 0 (D) are data Represents a stripe.

For example, when creating redundant data stored in the parity stripe (P0) located at the head of the physical volume 211-3, data located at the head of the other physical volumes (211-1 and 211-2) in the RAID group Redundant data is generated by performing an operation such as exclusive OR (XOR) on the data stored in the stripe (0 (D), 1 (D)). A parity stripe and a set of data stripes used to generate redundant data stored in the parity stripe (for example, in the above example, a set of parity stripe (P0) and data stripes 0 (D) and 1 (D)) ) Is called a “striped line”.

In the RAID group formed by the storage apparatus 1 according to the present embodiment, each stripe block belonging to one stripe line has the same location (address) in the storage space of the physical volume (21-1 to 211-3 in the example of FIG. 4). ). Further, the storage controller 10 forms a RAID group using one physical volume provided by each SSD 21. In other words, each physical volume belonging to the RAID group must be provided from a different SSD 21. This is because when one of the SSDs 21 fails and becomes inaccessible, the storage controller 10 uses the data (or redundant data) stored in the remaining SSDs 21 to cause the failed SSD 21. This is because the data stored in the disk must be restored.

The storage controller 10 further provides the host 2 with one or more logical storage spaces formed using the RAID group area. A logical storage space provided to the host 2 is called a “logical volume”. FIG. 4 shows the relationship between the RAID groups formed from the physical volumes 211-1, 211-2, 211-3 and logical volumes. A logical volume is a storage space that is formed using only data stripes in an area within a RAID group.

When the host 2 requests the storage apparatus 1 to write data, it issues a command (write command) including location information (address) on the logical volume to the storage apparatus 1. The storage controller 10 performs processing for converting the address included in the write command into the address of the physical volume.

For example, when the area on the logical volume and the data stripe on the RAID group have the relationship shown in FIG. 4, and the host 2 issues a write command specifying the head position on the logical volume, the storage controller 10 performs address conversion. By doing so, it is recognized that the data write position designated by the host 2 corresponds to the first data stripe (0 (D)) on the physical volume 211-1, and data is written to the recognized position. Further, the storage controller 10 has a physical volume (that is, physical volume 211-3) in which the parity stripe (P0) belonging to the same stripe line as the data stripe 0 (D) exists, and a position of the parity stripe (P0) (that is, physical volume 211). -3 at the beginning). Thereafter, the storage controller 10 generates redundant data to be written to the parity stripe (P0) according to the RAID technology, and writes the redundant data to the parity stripe (P0).

Next, an overview of the remote copy function will be explained. The storage apparatus 1 according to the first embodiment of the present invention performs remote copy using a journal. Since remote copy using a journal is a function disclosed in, for example, Patent Document 1 (or US Patent Application Publication No. 2005/0073887), only the outline of the portion related to the present invention will be described below. .

The remote copy function is a function for creating a copy (mirror copy) of data in the volume of the storage apparatus 1a in the volume of the storage apparatus 1b. Conversely, a copy of data written to the volume of the storage device 1b can be created in the volume of the storage device 1a by the remote copy function. However, in this embodiment, an example will be described in which data written from the host 2 to the volume of the storage device 1a is replicated to the storage device 1b unless otherwise specified. For this reason, the storage device 1a may be referred to as “primary storage device 1a”, and the storage device 1b may be referred to as “secondary storage device 1b”.

In this embodiment, the physical volume of the primary storage apparatus 1a that is the replication source is called a primary volume (P-VOL), and the physical volume of the storage apparatus 1b that is the replication destination of the P-VOL data is the secondary volume. This is called a volume (S-VOL). The S-VOL in which the P-VOL copy data is stored is called a “paired volume” with the P-VOL, or the “pair volume” of the P-VOL (in contrast, the S-VOL copy). The P-VOL in which the original data is stored may also be referred to as a volume having a pair relationship with the S-VOL).

In the example of FIG. 3, the physical volumes 211-1, 212-1, 212-1, 211-2, 212-2, 213-2, which the SSDs 21a-1, 21a-2, 21a-3 of the primary storage apparatus 1a have. 211-3, 212-3, 213-3 are P-VOLs, and physical volumes 211-1 ', 212-1', 213-1 included in the SSDs 21b-1, 21b-2, 21b-3 of the secondary storage system 1b. ', 211-2', 212-2 ', 213-2', 211-3 ', 212-3', 213-3 'are S-VOLs. In this embodiment, the description will be made on the assumption that among these physical volumes, physical volumes having the same reference number are in a pair relationship. For example, the physical volume 211-1 and the physical volume 211-1 'are in a pair relationship.

In remote copy using a journal, a volume for temporarily storing copy data is used. This volume is called a journal volume. In the storage apparatus 1 according to the present embodiment, each SSD 21 provides a journal volume 210 to the storage controller 10 in addition to the physical volume (211-1 etc.) used as the P-VOL and S-VOL described above. In FIG. 3, objects 210-1, 210-2, 210-3, 210-1 ', 210-2', 210-3 'are journal volumes. The journal volume may be abbreviated as “JVOL”.

The flow of copy processing by remote copy in the storage apparatus 1 according to the present embodiment is as follows. When the primary storage device 1a receives write data from the host 2, the storage controller 10 of the primary storage device 1a transmits a journal creation command to the SSD 21 having the P-VOL. The contents of the journal creation command will be described later.

In the storage device 1 according to the present embodiment, the SSD 21 has a function of creating a journal. Upon receiving the journal creation command, the SSD 21 stores write data in the P-VOL, creates a journal, and stores the journal in the JVOL. Since the SSD 21 creates a journal, the SSD 21 cannot write data to the P-VOL or JVOL that other SSDs 21 have. For example, the SSD 21a-1 can store a journal in the JVOL 210-1, and can only write to the P-VOL 211-1, 212-1 and 213-1.

The journal stored in the JVOL of the primary storage apparatus 1a is transmitted to the secondary storage apparatus 1b via the WAN 4. The SSD 21 of the secondary storage device 1b has a function of analyzing the contents of the journal and reflecting the data contained in the journal (a copy of the data stored in the P-VOL) in the S-VOL. The secondary storage device 1b uses this function to reflect the received journal data in the S-VOL. In this specification, the process of reflecting journal data in the S-VOL is referred to as “journal restore process” or simply “restore process”.

In the storage apparatus according to the present embodiment, a journal created by one SSD 21 of the primary storage apparatus 1a is transmitted to one SSD 21 of the secondary storage apparatus 1b. The relationship between the SSDs 21 of the primary storage apparatus 1a and the secondary storage apparatus 1b will be described with reference to FIG.

The primary storage device 1a has three SSDs 21 (21a-1, 21a-2, 21a-3), and the secondary storage device 1b has three SSDs 21 (21b-1, 21b-2, 21b-3). Of the three SSDs 21 of the primary storage system 1a, for example, a journal created by the SSD 21a-1 is transmitted to the SSD 21b-1, and similarly, a journal created by the SSD 21a-2 is transmitted to the SSD 21b-2. The journal created by the SSD 21a-3 is transmitted to the SSD 21b-3, so that the P-VOL copy of the SSD 21a-1 of the primary storage device 1a is stored in the physical volume of the SSD 21b-1.

The SSD 21 having the P-VOL (and journal volume) is called “primary SSD”, and the SSD 21 of the secondary storage device 1b to which the journal created by the primary SSD is transmitted is called “secondary SSD”. Further, the SSD 21 of the secondary storage device 1b to which the journal created by the primary SSD is transmitted may be referred to as an SSD having a pair relationship with the primary SSD (or a pair SSD of the primary SSD). Conversely, a primary SSD that transmits a journal stored in the secondary SSD may also be referred to as an SSD that is paired with the secondary SSD (or a secondary SSD of the secondary SSD). The number of primary SSDs and the number of secondary SSDs are equal.

FIG. 5 shows a configuration example of the pair management table 300 managed by the storage controller 10. The pair management table 300 is a table for managing information about volume pairs, and both the

storage apparatuses

1 a and 1 b have this pair management table 300. One row of the pair management table 300 represents information about one volume pair. Information set in the pair management table 300 can be set from the management host 5 (or host 2) by a user (an administrator of the storage apparatus 1). For example, when the user sets information from the management host 5 connected to the primary storage apparatus 1a, the same information is set in both the pair management table 300 of the primary storage apparatus 1a and the pair management table 300 of the secondary storage apparatus 1b. .

Information recorded in each column of the pair management table 300 will be described. The pair number 301 is an identification number assigned to the volume pair. The status 303 is information indicating the status of the volume pair (information indicating whether data replication is normally performed). PDKC # 304 is a serial number of the storage apparatus 1 in which the P-VOL exists in the volume pair, and P-VOL # 305 is an identifier of the P-VOL. Similarly, SDKC # 306 is the serial number of the storage apparatus 1 in which S-VOL exists in the volume pair, and S-VOL # 307 is the identifier of S-VOL.

An arbitrary rule may be adopted as the physical volume identifier assignment rule. In this embodiment, the identification of the SSD 21 in which the physical volume exists is added to the physical volume identifier such as P-VOL or S-VOL. An example in which a number is included will be described. As a result, by referring to the identifier of the physical volume, it is possible to identify which SSD 21 the physical volume exists in. As the identifier of the physical volume used in this embodiment, a value obtained by connecting two numerical values with a colon (:) is used. Of the two values connected by a colon, the value on the left represents the identification number of the SSD 21. On the other hand, the value on the right side is an identification number assigned to each of the plurality of physical volumes defined in the SSD 21, and this may be a unique number in the SSD 21. In the present embodiment, an identification number assigned to each physical volume (a numerical value on the right side of the colon among the physical volume identifiers) may be referred to as an “internal identification number”. In this embodiment, the internal identification number assigned to the journal volume is set to 00, and an example in which an integer value of 01 or more is used as the internal identification number of other physical volumes will be described.

An example will be described with reference to FIG. The P-VOL # 305 of the volume pair whose pair number 301 is “2” is “01:02”. This means that the P-VOL exists in the SSD 21 with the identification number “01” among the SSDs 21 in the primary storage (storage device 1a), and the internal identification number assigned to this P-VOL. Means “02”.

However, the format of the physical volume identifier described above is merely an example, and formats other than those described above may be used for the physical volume identifier. In the following description, the SSD 21 with the identification number x is denoted as “SSD # x” (x is a numerical value such as 01). Also, the assigned P-VOL (or S-VOL) with the internal identification number y is expressed as “P-VOL # y” (or “S-VOL # y”) (y is a numerical value such as 01) Is).

In the storage apparatus 1 according to the present embodiment, among a plurality of volume pairs, a set of volume pairs for which data consistency is guaranteed (write order is guaranteed) may be abbreviated as a consistency group (CTG). ) To manage as a group. The storage apparatus 1 can manage a plurality of CTGs, and each CTG is assigned a unique identification number. This identification number is called a CTG number, and the CTG number may be written as CTG #. The column CTG # 302 of the pair management table 300 stores the CTG # of the CTG to which the volume pair belongs. Volume pairs having the same value stored in the CTG # 302 belong to the same CTG, and the storage device 1 is controlled to ensure data consistency.

In the storage apparatus 1 according to the present embodiment, as described above, one physical volume is selected from each SSD 21 to form a RAID group. In this embodiment, CTGs must be formed so that each physical volume belonging to a RAID group always belongs to the same CTG. Further, physical volumes belonging to the same SSD 21 also belong to the same CTG.

In the example of FIG. 3, the physical volumes 211-1, 211-2, 211-3 of the storage apparatus 1a belong to the same RAID group. Therefore, the physical volumes 211-1, 211-2, 211-3 and their paired volumes (physical volumes 211-1 ', 211-2', 211-3 ') belong to the same CTG. Further, since the physical volumes 211-1, 212-1 and 213-1 belong to the same SSD 21, they belong to the same CTG.

Subsequently, an outline of the structure (format) of the journal stored in the journal volume 210 will be described with reference to FIG. In the storage device 1a according to the first embodiment of the present invention, the SSD 21 creates a journal.

In order to cause the SSD 21 to create a journal, the storage controller 10 of the storage device 1a issues a command for instructing the SSD 21 to create a journal. The SSD 21 according to this embodiment includes a command (referred to as a “journal creation command”) for instructing journal creation in addition to commands (read command and write command) supported by known storage devices such as HDDs. Supports a command for instructing restore processing (called "journal restore command"). Details of the journal creation command and the journal restore command will be described later. The journal creation command includes write data to be stored in the P-VOL and information necessary for creating a journal.

Upon receiving the journal creation command, the SSD 21 creates a journal according to the contents of the command and stores it in the journal volume, and also stores write data in the P-VOL.

The format of the journal stored in the journal volume 210 will be described with reference to FIG. The journal includes “journal data” that is write data from the host 2 and management information of the journal data. The management information of the journal data is called “management information” or “JNCB”. The size of management information is a fixed length, and the size of journal data varies depending on the length of write data written from the host 2.

In the SSD 21 according to this embodiment, journal data and management information are stored in different areas. An area in which journal data is stored is called a write data area 720, and an area in which management information is stored is called a management information area 710. Also, the head address of the management information area 710 is called “management information area head address”, and the head address of the write data area 720 is called “write data area head address”.

The management information area 710 and the write data area 720 are used like a kind of ring buffer. Journal data is written sequentially from the start address of the write data area. When journal data is written up to the end of the write data area 720 (end address of the journal volume), the storage apparatus 1 writes the journal from the beginning of the write data area 720 again. The same applies when the storage device 1 writes JNCB, and the JNCB is written sequentially from the start address of the management information area in the order in which it was created, and until the end of the management information area 710 (the address immediately before the start address of the write data area) Is written, the storage apparatus 1 again writes JNCB from the start address of the management information area.

The format of management information (JNCB) will be described. The JNCB includes SEQ # 701, Timestamp 702, VOL # 703, S-VOL # 703 ', LBA704, Length705, and pointer 706. Each journal is given a number indicating the order in which the journals were created. In this embodiment, the number assigned to this journal is called a “sequence number”. The sequence number may be abbreviated as “SEQ #”. A sequence number starting from 0 is used as the sequence number. That is, SEQ # of the journal created first is 0, and journals created thereafter are sequentially 1, 2, 3,. . . Are assigned consecutive numbers. SEQ # 701 is an area for storing a sequence number. The SEQ 21 assigned to each journal is determined by the SSD 21.

Timestamp 702 is the time at which the storage apparatus 1 a receives write data from the host 2, and this information is given from the host 2. Details will be described later.

VOL # 703, LBA 704, and Length 705 are information for specifying the write data write position from the host 2. Specifically, VOL # 703 and LBA 704 represent the position (address) on the P-VOL, and Length 705 represents the data length of the write data. That is, the write data from the host 2 is written in a continuous area starting from the address specified by the LBA 704 on the physical volume specified by the VOL # 703. The length of data to be written is equal to Length 705.

On the other hand, the S-VOL # 703 'stores the identification number of the P-VOL pair volume (that is, the physical volume that is paired with the physical volume specified by the VOL # 703).

The pointer 706 is information indicating a position on the journal volume where journal data is written, and is an LBA on the journal volume 210. The journal data is written in a continuous area starting from the position specified by the pointer 706, and the length of the written data is equal to Length 705. Since such information is included in the JNCB, the storage controller 10 can recognize where the journal data is stored in the journal volume by reading the JNCB and examining its contents. The data written at the position specified by VOL # 703 and LBA 704 and the data written at the position specified by pointer 706 are both the same data (that is, the write data included in the journal creation command). is there).

Next, the format of the journal creation command will be described with reference to FIG. The journal creation command 750 includes an operation code (Opcode) 751, Timestamp 752, VOL # 753, S-VOL # 753 ', LBA754, JLBA755, CBLBA756, Length757, and write data 758.

The operation code (Opcode) 751 is information indicating the type of command, and information indicating the journal creation command is stored in the Opcode 751 of the journal creation command. The SSD 21 performs journal creation processing when the Opcode 751 of the command received from the storage controller 10 is a code representing a journal creation command. The write data 758 is write data that the storage controller 10 requests to write, and the SSD 21 writes the data included in the write data 758 to the P-VOL and journal volume.

Timestamp 752 is the time when the storage apparatus 1a receives the write data from the host 2. In the computer system according to this embodiment, the write command issued from the host 2 to the storage apparatus 1 includes time (time when the write command is issued). The storage controller 10 creates a journal creation command in which the time included in the write command is stored in the Timestamp 752 and transmits it to the SSD 21. When the SSD 21 creates the JNCB, the SSD 21 stores the value included in the Timestamp 752 in the Timestamp 702.

VOL # 753 and LBA754 are information on the write data write destination position (position on the P-VOL). That is, VOL # 753 is the identifier of the physical volume to which the write data is written, and LBA754 is the LBA on the physical volume specified by VOL # 753. In accordance with this information, the SSD 21 stores the data stored in the write data 758 in the P-VOL. S-VOL # 753 'is an identifier of S-VOL (a pair volume of physical volumes specified by VOL # 753). Note that VOL # 753 also includes the identification number of the SSD 21 in which the P-VOL exists. Similarly, the S-VOL # 753 'includes the identification number of the SSD 21 in which the S-VOL exists.

On the other hand, JLBA 755 represents an address on the journal volume, and journal data is written in a continuous area starting from the address specified in JLBA 755. CBLBA 756 represents a write destination address of JNCB. This address is an address on the journal volume. When the SSD 21 creates the JNCB, the JNCB is stored at the address specified by the CBLBA 756. Length 757 is the data length of the write data (data included in the write data 758).

Much of the information included in the journal creation command 750 is information recorded in the JNCB. When the SSD 21 creates a JNCB, Timestamp 752, VOL # 753, S-VOL # 753 ', LBA754, JLBA755, and Length757 are JNCB's Timestamp702, VOL # 703, S-VOL # 703', LBA704g, th 706L, th 706L Stored in

FIG. 8 is a diagram showing the JVOL management table 800 in the present embodiment. The JVOL management table 800 is stored in the memory 13 of the primary storage apparatus 1a and secondary storage apparatus 1b.

The JVOL management table 800 is a table for managing information about the journal volume (JVOL) of the SSD 21. The JVOL management table 800 includes DEV # 801, a management information area head address 802, a write data area head address 803, and a JNCB latest address 804. , JNCB oldest address 805 and write data latest address 806. The identification number of the SSD 21 is stored in the DEV # 801, and each row of the JVOL management table 800 represents information on the JVOL in the SSD21 specified by the identification number stored in the DEV # 801.

Management information area start address 802, write data area start address 803, JNCB latest address 804, JNCB oldest address 805, and write data latest address 806 represent addresses on the journal volume shown in FIG. That is, the management information area head address 802 stores the LBA (Logical Block Address) at the head of the management information area of the JVOL. The write data area start address 803 stores the start LBA of the JVOL write data area.

The JNCB latest address 804 stores the location (LBA) where the JNCB should be stored when a new journal is stored in the JVOL. When the storage controller 10 issues a journal creation command, it adds the JNCB size to the JNCB latest address 804. The latest write data address 806 holds the leading LBA used for storing write data (journal data) when a new journal is stored.

The JNCB oldest address 805 stores the first LBA of the area where the oldest (the smallest SEQ #) JNCB is stored among the JNCBs of the journals remaining in the journal volume. The meaning of the JNCB oldest address 805 will be described with reference to FIG.

The journal stored in the journal volume of the primary storage apparatus 1a is transmitted to the secondary storage apparatus 1b after a while. Therefore, the JNCB oldest address 805 of the JVOL management table 800 of the primary storage device 1a represents the address where the oldest journal JNCB is stored among the journals not yet transmitted to the secondary storage device 1b.

On the other hand, the journal stored in the journal volume of the secondary storage device 1b is reflected in the S-VOL by performing journal restore processing. Therefore, the JNCB oldest address 805 of the JVOL management table 800 of the secondary storage device 1b represents the address where the oldest JNCB of the journals that have not been restored is stored.

And the area before JNCB oldest address 805 means that JNCB may be written. In FIG. 6, the area from the management information area head address to the JNCB oldest address and the area from the JNCB latest address to the write data area head address mean that the JNCB is writable. *

In the example of FIG. 8, the management information area of the journal volume included in the SSD 21 with the identification number 1 is from the beginning (LBA0) of the journal volume to the position 1299, and the journal data area starts from the position of the LBA 1300 of the journal volume. It is an area.

Subsequently, the journal management information will be described with reference to FIG. FIG. 9 shows an example of primary journal management information 600 and secondary journal management information 650. The primary journal management information 600 is a table for managing information about journals generated by the SSD 21 having P-VOL, while the secondary journal management information 650 is information about journals stored in the SSD 21 having S-VOL. It is a table for management. In the case of the computer system according to this embodiment, the primary storage apparatus 1a has only P-VOL, and the secondary storage apparatus 1b has only S-VOL. Therefore, the primary journal management information 600 has only the primary storage apparatus 1a. The secondary journal management information 650 is stored only in the secondary storage device 1b (stored only in the memory 13 of the secondary storage device 1b) (stored only in the memory 13 of the primary storage device 1a).

Information managed by the primary journal management information 600 will be described. Each row (record) of the primary journal management information 600 stores information about the journal created by each SSD 21. DEV # 601 is an identification number of the SSD 21. For example, when the DEV # 601 in a row in the primary journal management information 600 is x (x is a numerical value such as 01), the SSD 21 (SSD #) having the identification number x is included in that row. Information about the journal created in x) is stored.

Latest SEQ # 604 is the newest (largest) sequence number among the sequence numbers assigned to the journal created by the SSD 21. This information is transmitted from the SSD 21 to the storage apparatus 1 when the SSD 21 creates a journal. Transferred SEQ # 605 stores the newest (largest) sequence number among the sequence numbers created in the primary SSD and assigned to the journal that has been transmitted to the secondary SSD. The primary storage apparatus 1a updates the transferred SEQ # 605 every time a journal is transmitted to the secondary storage apparatus 1b.

Hereinafter, the latest SEQ # 604 in the row where DEV # 601 is x is referred to as “latest SEQ # 604 in SSD # x”, and the transferred SEQ # 605 in which DEV # 601 is in the row of x is “ This is called “transferred SEQ # 605 of SSD # x”.

Next, information managed by the secondary journal management information 650 will be described. Each row of the secondary journal management information 650 stores information about the journal received by the storage device 1 (in this embodiment, the secondary storage device 1b) having the SSD 21 with the identification number stored in the DEV # 651. Is done. Received SEQ # 654 is the newest (largest) sequence number among the sequence numbers assigned to the journal received by the secondary storage system 1b. In the arrived T / S 657, the time stamp (Timestamp 702) included in the JNCB of the journal with the latest Timestamp 702 among the journals received by the secondary storage device 1b is stored. This is the same as the time stamp (Timestamp 702) included in the JNCB of the journal with the received SEQ # 654.

Reflected SEQ # 656 is the newest (largest) sequence number among the sequence numbers of journals received by the secondary storage apparatus 1b and restored in the S-VOL of the secondary SSD. The restored T / S 658 stores the time stamp (Timestamp 702) included in the JNCB of the journal having the reflected SEQ # 656.

As in the case of the latest SEQ # 604 and the like, in the following, the received SEQ # 654, the reflected SEQ # 656, the arrived T / S657, and the restored T / S658 in the row where DEV # 651 is x are respectively shown. These are referred to as “SSD # x received SEQ # 654”, “SSD # x reflected SEQ # 656”, “SSD # x arrived T / S657”, and “SSD # x restored T / S658”.

Subsequently, the contents of the management information included in the SSD 21 will be described. FIG. 10 is a configuration example of the logical-physical conversion table 1400.

As described above, the SSD 21 forms one or more physical volumes and journal volumes and provides them to the storage controller 10. The SSD 21 employs a flash memory as a storage medium. As is well known, the minimum access (read / write) unit of the flash memory (FM chip 206) is a page (physical page). The size of the physical page is, for example, 8 KB. Therefore, the SSD 21 manages the storage area of the physical volume and the storage area of the journal volume that the SSD 21 provides to the storage controller 10 by dividing the storage area into the same size as the physical page. An area having the same size as the physical page is called a “logical page”. Then, the SSD 21 maps one physical page to one logical page. The logical / physical conversion table 1400 is a table for managing the mapping between logical pages and physical pages.

The SSD 21 according to the present embodiment manages each physical page in all the FM chips 206 included in the SSD 21 by assigning a unique number in the SSD 21. This number is called a physical page number (or physical page #). Further, the SSD 21 according to the present embodiment manages each logical page in the SSD 21 by assigning a unique identification number in the SSD. This identification number is called a logical page number (logical page #). The logical-physical conversion table 1400 stores block # and physical page # information of a physical page mapped to a certain logical page for each logical page.

The logical-physical conversion table 1400 has columns of VOL # 1401, LBA1402, logical page # 1403, status 1404, and physical page # 1405, as shown in FIG. Each record of the logical-physical conversion table 1400 stores information on the logical page specified by the logical page # 1403. VOL # 1401 is an identification number of a physical volume or journal volume, and LBA1402 is an address on the physical volume specified by VOL # 1401. In this embodiment, the identification number of the journal volume is defined as “00”, and a value other than “00” is used as the identifier of the physical volume other than the journal volume.

The logical page # 1403 stores the identification number (logical page number) of the logical page assigned to the address on the physical volume (or journal volume) specified by VOL # 1401 and LBA1402. Note that the correspondence between the set of VOL # 1401 and LBA1402 and the logical page # 1403 is determined in advance and does not change in the middle. The physical page # 1405 stores the identification number (physical page number) of the physical page mapped (allocated) to the logical page specified by the logical page # 1403.

The status 1404 stores information indicating whether a physical page is mapped to a logical page. No physical page is mapped to the logical page of the SSD 21 in the initial state. “Unallocated” is stored in the status 1404 of the logical page to which the physical page is not mapped (in this case, NULL (invalid value) is stored in the physical page # 1405 corresponding to the logical page).

For example, when the SSD 21 receives from the storage controller 10 a write command (or journal creation command) for requesting data writing to a logical page with a logical page number n, first, a physical page in which the write target data is not used (data has not yet been written). (A physical page that has not been written) is written (assuming p is the page number of the physical page in which the write target data is written). The SSD 21 stores the page number p in the physical page # 1405 in the row where the logical page # 1403 is n. At this time, the SSD 21 stores “allocation” in the status 1404 of the row in which the logical page # 1403 is n. As a result, the physical page is mapped to the logical page to be written. In the following description, the expression “mapping a physical page to a logical page” means that the SSD 21 performs the processing described here.

For example, referring to the row 1412 in FIG. 10, VOL # 1401 is “00”, LBA1402 is “0x0010”, logical page # 1403 is 1, physical page # 1405 is 20, and status 1404 is “allocation”. Therefore, in the logical-physical conversion table 1400 in FIG. 10, the physical page with the physical page number 20 is mapped to the logical page with the logical page number 1, and the logical page with the logical page number 1 is the journal. This represents an area corresponding to an area starting from LBA 0x0010 of the volume (an area equal to the logical page size).

As is well known, a physical page once written cannot be overwritten (if it is desired to overwrite the physical page, the entire block to which the physical page belongs needs to be erased once). For this reason, when the SSD 21 receives an update (overwrite) request for a certain logical page from the storage controller 10, the SSD 21 updates the physical data (new physical page) different from the physical page in which the pre-update data is written (referred to as the old physical page). Stored in a page). Then, the physical page number of the new physical page is stored in the physical page # 1405 corresponding to the logical page to be updated. Then, the old physical page is in a state in which the mapping is released (this state is managed by a physical page management table 1450 described later).

FIG. 11 is a diagram for explaining the configuration of the physical page management table. The physical page management table 1450 is a table for managing the state of the physical page, and each row (record) stores information about the physical page in the SSD 21. The physical page management table 1450 has columns of block # 1451, physical page # 1452, and status 1453.

The physical page # 1452 and the status 1453 are the same information as the physical page # 1405 and the status 1404 of the logical-physical conversion table 1400, respectively. That is, when a physical page is mapped to a logical page, the physical page number of the physical page is stored in the physical page # 1405 of the logical-physical conversion table 1400, and “assignment” is stored in the status 1404. At the same time, “assignment” is also stored in the status 1453 (in the physical page management table 1450) of the physical page mapped to the logical page. Block # 1451 is the identification number of the block to which the physical page belongs.

Also, after mapping to a logical page, “unallocated” is stored in the status 1453 of the physical page that has been unmapped. A physical page whose status 1453 is “unallocated” cannot be overwritten, and reclamation processing is performed at an appropriate timing. Then, “unused” is stored in the status 1453 of the physical page that has not yet been written. As the physical page mapped to the logical page, a physical page whose status 1453 is “unused” is used.

Next, the sequence number management information 1170 (FIG. 2) will be described. As described above, the SSD 21 according to this embodiment has a function of creating a journal, and the created journal includes a sequence number. The SSD 21 records the latest (maximum) sequence number among the sequence numbers assigned to the journal created by the SSD 21 in the memory 204. This information is called sequence number management information and is stored in the memory 204.

When creating the journal, the SSD 21 stores a value obtained by adding 1 to the value stored in the sequence number management information 1170 in the SEQ # 701 of the newly created journal. After creating the journal, the SSD 21 stores the value of SEQ # 701 of the created journal in the sequence number management information 1170, thereby updating the contents of the sequence number management information 1170.

From here, the flow of journal creation processing and journal restoration processing will be explained. FIG. 12 shows the flow of processing (write processing) performed when the primary storage apparatus 1a receives a write command from the host 2. In addition, in each drawing described below including FIG. 12, the alphabet “S” attached before the reference number means “step”.

In FIG. 12, when the CPU 12 of the storage controller 10 in the primary storage apparatus 1a executes the I / O program 101, Step 1001 to Step 1005 and Step 1011 to Step 1013 are executed. Steps 1051 to 1057 are processes executed by the CPU 201 of the SSD 21 in the primary storage apparatus 1a.

Step 1001: When the I / O program 101 receives a write command and write data from the host 2, it stores the write data in the cache.

Step 1002: The I / O program 101 performs address conversion processing of the write destination address (logical volume address) included in the received write command, so that the write data write destination physical volume and its physical volume Ask for the address.

Step 1003: The I / O program 101 refers to the pair management table 300 to determine whether the physical volume to which the write data is written, identified in step 1002, is a pair volume. If the physical volume to which the write data is written is not a pair volume (step 1003: N), the I / O program 101 performs a normal write process and ends the process. Since the processing in this case is publicly known, description thereof is omitted here. On the other hand, if the physical volume to which the write data is written is a pair volume (step 1003: Y), then step 1004 is performed.

Step 1004: The I / O program 101 creates redundant data (parity) using the write data. As is well known, in the RAID technique, it is sometimes necessary to read the pre-update data and the pre-update parity of the write data from the SSD 21 in order to create the parity. Therefore, if necessary, the I / O program 101 reads the pre-update data and the pre-update parity from the SSD 21 and then creates the parity.

Step 1005: The I / O program 101 issues a journal creation command shown in FIG. 7 to the SSD 21 having the write destination physical volume of write data and the SSD 21 having the parity write destination physical volume.

In the following, the processing from step 1051 to step 1057 will be described. As described above, in step 1005, the I / O program 101 issues a journal creation command to a plurality of SSDs 21. However, since the processing performed in each SSD 21 that has received the journal creation command is the same, in the following description, the processing performed in one SSD 21 that has received the journal creation command will be described.

Step 1051: The SSD 21 that has received the journal creation command issued by the storage controller 10 in Step 1005 determines the sequence number of the journal to be created. As described above, the SSD 21 determines a value obtained by adding 1 to the value stored in the sequence number management information 1170 as the sequence number of a newly created journal.

Step 1053: The SSD 21 writes data to the physical volume (P-VOL) specified by the journal creation command. The SSD 21 obtains the logical page # 1403 of the logical page corresponding to the position specified by the VOL # 753 and LBA754 of the journal creation command by referring to the logical / physical conversion table 1400. Subsequently, the SSD 21 refers to the physical page management table 1450, selects an unused physical page, and writes data to the selected physical page. The SSD 21 stores the physical page # of the physical page in which data is written in the physical page # 1405 corresponding to the previously obtained logical page # 1403, so that it is specified by the VOL # 753 and LBA754 of the journal creation command. A physical page is mapped to a logical page corresponding to a position.

Step 1054: The SSD 21 writes data to the journal volume. This process is the same as that described in step 1053. That is, the SSD 21 writes data to an unused physical page, and maps the physical page on which the data has been written to a logical page on the journal volume (a logical page corresponding to the position specified by JLBA 755).

Note that the data written to the physical volume (P-VOL) in step 1053 is the same as the data written to the journal volume in step 1054, and is the write data 758 included in the journal creation command. When the SSD 21 that has received the journal creation command is the SSD 21 that stores the write data received from the host 2, the write data 758 includes the write data received from the host 2. On the other hand, the SSD 21 that has received the journal creation command may be the SSD 21 that stores the parity created in step 1004. In that case, the write data 758 includes the parity created in step 1004.

Step 1055: The SSD 21 creates a JNCB. Since the contents included in JNCB have already been described with reference to FIG. 6 above, description thereof is omitted here.

Step 1056: The SSD 21 writes JNCB to the journal volume. This processing is also the same as that described in step 1054 and the like.

Step 1057: After writing the JNCB to the journal volume, the SSD 21 stores the value of SEQ # 701 of the created JNCB in the sequence number management information 1170, and then returns a response to the storage controller 10 that the journal creation processing has been completed. To do. The response returned to the storage controller 10 includes at least the sequence number determined in step 1051. For example, the SSD 21 may remove the write data 758 from the journal creation command received from the storage controller 10 in step 1051 and return the information with the added sequence number to the storage controller 10 instead.

When the response indicating that the journal creation processing is completed is returned from the SSD 21 to the storage controller 10, the I / O program 101 restarts the processing from step 1011. The I / O program 101 receives a response from all the SSDs 21 that issued the journal creation command in step 1005 (the SSD 21 that stores the write data received from the host 2 and the SSD 21 that stores the parity). Execute the process.

Step 1011: The I / O program 101 extracts the sequence number from the response received from the SSD 21, and stores the extracted sequence number in the latest SEQ # 604 of the primary journal management information 600.

Step 1013: The I / O program 101 returns a response to the effect that the write process is completed to the host 2, and ends the process.

Subsequently, a flow of processing (journal transmission processing) when a journal is transmitted from the primary storage apparatus 1a to the secondary storage apparatus 1b will be described with reference to FIG. The secondary storage device 1b periodically issues a journal read command (sometimes abbreviated as RDJNL command) to the primary storage device 1a. This processing is performed by the CPU 12 of the secondary storage device 1b executing the journal read program 151. In the primary storage apparatus 1a that has received the journal read command, the journal transmission program 102 is executed, whereby the primary storage apparatus 1a reads the journal stored in the journal volume of the SSD 21 in the primary storage apparatus 1a, Transmit to device 1b. By repeating this process, the journal is periodically transmitted from the primary storage apparatus 1a to the secondary storage apparatus 1b.

FIG. 13 shows the flow of processing performed when the secondary storage apparatus 1b issues one journal read command. However, since the journal read command is created for each SSD 21, for example, when the secondary storage device 1b has n secondary SSDs (n is an integer of 1 or more), the secondary storage device 1b has n journal read commands. Is issued to the primary storage apparatus 1a. When n journal read commands are issued, these processes do not need to be performed sequentially, and n processes may be executed in parallel (n processes in FIG. 13 are executed in parallel). .

Step 1101: The journal read program 151 creates a journal read command and sends it to the primary storage device 1a. The journal read command includes the identification number of the SSD 21 of the primary storage device 1a. The secondary storage device 1b (journal read program 151) refers to the pair management table 300 to identify all the identification numbers of the SSDs 21 that are paired with the secondary SSD of the secondary storage device 1b, and journal read for each of these SSDs 21. Issue a command. The primary storage device 1a that has received the journal read command reads the journal from the journal volume of the SSD 21 with the identification number specified by the journal read command and returns it to the secondary storage device 1b (steps 1201 to 1210).

Step 1201: When the journal transmission program 102 receives a journal read command from the secondary storage device 1b, it identifies the SSD 21 specified by the journal read command. In the following description, the identification number of the SSD 21 specified here is “x” (x is a numerical value such as 01).

The journal transmission program 102 specifies the SEQ # of the journal to be read first from the journal volume by referring to the transferred SEQ # 605 of the SSD # x from the primary journal management information 600 (transferred SEQ # 605). The value obtained by adding 1 to the value of is the SEQ # of the journal to be read). The journal transmission program 102 prepares a variable R, and substitutes the SEQ # of the journal specified here into the variable R.

Step 1203: The journal transmission program 102 compares the latest SEQ # 604 of SSD # x with the value of the variable R, and determines whether the variable R is the latest SEQ # 604 or less. If the variable R is less than or equal to the latest SEQ # 604 (step 1203: Y), the journal transmission program 102 next executes step 1204. If the variable R is not less than or equal to the latest SEQ # 604 (step 1203: N), then Step 1210 is performed.

Step 1205: The journal transmission program 102 obtains the address of the journal volume in which the JNCB of the journal to be read is stored. The size of the JNCB is a fixed length, and the JNCB is stored in order of the sequence number from the beginning of the management information area of the journal volume. Therefore, the address where the JNCB of the read target journal is stored can be obtained by simple calculation. Then, the journal transmission program 102 reads JNCB from the obtained journal volume address. At this time, the journal transmission program 102 reads the JNCB by issuing a read command to the SSD 21 (SSD # x) in which the journal to be read is stored. The JNCB read from the SSD 21 is temporarily stored in the cache.

Step 1207: The journal transmission program 102 analyzes the contents of the JNCB read in Step 1205 and refers to the JNCB pointer 706 to identify the address where the journal data is stored. Then, the journal transmission program 102 reads journal data from the journal volume by using the obtained journal data address. When reading, the journal transmission program 102 may issue a read command to the SSD #x as in step 1205. The read journal data is temporarily stored in the cache.

Step 1208, Step 1209: The journal transmission program 102 determines whether the total amount of journal data read from the journal volume exceeds a predetermined maximum transfer journal amount. The maximum transfer journal amount is a value that can be set by the user (the administrator of the storage apparatus 1). For example, the user can set the maximum transfer journal amount from the management host 5 to the storage apparatus 1.

If the total amount of journal data read from the journal volume exceeds the maximum transfer journal amount (step 1208: Y), step 1210 is performed next. On the other hand, when the total amount of journal data read from the journal volume is equal to or less than the maximum transfer journal amount (step 1208: N), the journal transmission program 102 adds 1 to the variable R (step 1209), and again from step 1203. Process.

Step 1210: The journal transmission program 102 transmits the journal (journal data and JNCB) read in the processing up to step 1209 to the secondary storage device 1b and updates the transferred SEQ # 605 of the SSD # x ( The largest sequence number among the sequence numbers of the transmitted journal is stored in the transferred SEQ # 605 of the SSD # x), and the processing is terminated.

Step 1105: The journal read program 151 that has received the journal from the primary storage apparatus 1a stores the journal in the cache of the secondary storage apparatus 1b.

Step 1107, Step 1108: The journal read program 151 refers to the JNCB of each journal received in Step 1105, and the SSD # x arrived T / S657 (in the secondary journal management information 650) in the Timestamp 702 included in each JNCB. It is determined whether there is a newer one (stored) (step 1107). If there is a JNCB that includes a Timestamp 702 that is newer than the T / S657 that has arrived for the SSD # x (Step 1107: Y), the journal read program 151 determines that the Timestamp 702 that is the latest among the Timestamp 702 included in each JNCB is the arrival of the SSD # x. By storing in the completed T / S657, the arrived T / S657 is updated (step 1108), and the process is terminated. If there is no JNCB that includes a Timestamp 702 that is newer than the arrived T / S657 of SSD # x (step 1107: N), the journal read program 151 ends the process without performing step 1108.

Subsequently, the flow of journal restore processing will be described with reference to FIG. The journal restoration process is performed by the CPU 12 of the secondary storage device 1b executing the destage program 152. The secondary storage device 1b periodically executes journal restore processing. The journal restore process is a process executed for each CTG. In FIG. 14, the flow of processing when the destage program 152 executes journal restore processing for a specific one CTG (for example, CTG with CTG # being “n”, hereinafter referred to as “CTG # n”). explain.

Step 1301: The destage program 152 refers to the arrived T / S 657 (plural) of the row in which the CTG # 652 is n among the rows in the secondary journal management information 650, and refers to the plurality of arrived T / S 657. The oldest time among the times stored in is identified. The time specified here is called a “restorable time stamp”.

Subsequently, the processing from step 1302 to step 1305 is performed. Steps 1302 to 1305 are performed for each SSD 21 in CTG # n.

Step 1302: The destage program 152 selects one SSD 21 that has not yet undergone the processing from step 1302 among the SSDs 21 belonging to CTG # n. In the following description, an example in which the identification number of the SSD 21 selected here is “x” will be described. The destage program 152 specifies all the journals that should be reflected in the SSD # x by referring to the JNCB of the journal that has arrived at the cache of the secondary storage device 1b.

Step 1303: The destage program 152 determines whether there is a journal before the timestamp that can be restored by the Timestamp 702 among the identified journals. If there is a journal that matches this condition (step 1303: Y), the destage program 152 executes step 1304. If there is no journal that matches this condition (step 1303: N), the destage program 152 next executes step 1306.

Step 1304: The destage program 152 creates a journal restore command and issues it to the SSD #x in order to restore each journal specified in step 1303.

Figure 7 shows the format of the journal restore command. The journal restore command is used for the storage controller 10 to cause the SSD 21 to perform journal restore processing. The SSD 21 according to the present embodiment supports a journal restore command in addition to the journal creation command described above. The journal restore command 850 includes an operation code (Opcode) 851, VOL # 852, LBA853, JLBA854, CBLBA855, Length856, JNCB857, and journal data 858.

Opcode 851 is information indicating the type of command, similar to Opcode 751 of the journal creation command. VOL # 852 and LBA853 are information regarding the write data write destination position. Specifically, VOL # 852 is the write data write destination physical volume (S-VOL) identification number, and LBA853 is the SVOL (identification number). Is the LBA on the VOL # 852 physical volume).

JLBA 854 and CBLBA 855 represent addresses on the journal volume, JLBA 854 represents the journal data write destination address, and CBLBA 855 represents the JNCB write destination address. Length 856 represents the length of journal data. The JNCB 858 and journal data 859 store the JNCB and journal data, respectively.

Thus, only the information about one journal is stored in the journal restore command. Therefore, when there are a plurality of journals to be restored in step 1303, the destage program 152 creates a journal restore command for each journal and issues it to the SSD #x. Further, when issuing a plurality of journal restore commands, the destage program 152 creates a journal restore command in order from the journal with the smallest SEQ # 701 and issues it to the SSD #x. As a result, the order in which each data written in the P-VOL is reflected in the S-VOL is the same as the order in which the data is written in the P-VOL.

Hereafter, processing performed by the SSD #x that has received the journal restore command will be described.

Step 1351, Step 1352: Upon receiving the journal restore command, SSD # x writes the JNCB 858 and journal data 859 in the journal restore command to the journal volume (step 1351). Subsequently, the SSD #x writes the journal data 859 to the S-VOL (step 1352).

Step 1353: The SSD #x returns response information to the effect that the restore processing is completed to the storage controller 10 of the secondary storage device 1b. The above is the description of the journal restore process performed by the SSD 21.

The format of the response information returned by SSD # x is shown in FIG. The response information includes SEQ # 881, Timestamp 882, and Length 883. SEQ # 881 and Timestamp 882 include SEQ # 701 and Timestamp 702 that were included in the JNCB of the journal that SSD # x has restored. Length 883 is the journal data length of the journal for which SSD # x has performed the restore process.

Return to the description of the destage program 152 again.

Step 1305: When the response information from the SSD #x arrives at the storage controller 10 of the secondary storage device 1b, the destage program 152 executes the SEQ that includes the reflected SEQ # 656 value of the secondary journal management information 650 in the response information. The value of # 881 is rewritten, and the value of restored T / S 658 is rewritten to the value of Timestamp 882 included in the response information.

Further, the destage program 152 updates the JNCB latest address 804 and the write data latest address 806 of the JVOL management table 800. Specifically, the destage program 152 updates the contents of the write data latest address 806 by adding the value of Length 883 included in the response information to the value of the write data latest address 806. The destage program 152 is updated by adding the JNCB size to the JNCB latest address 804.

Step 1306: The destage program 152 ends the process when all the SSDs 21 belonging to the CTG # n have been processed up to the step 1305 (step 1306: Y). If there is an SSD 21 that has not been processed up to step 1305 yet, the destage program 152 performs the processing from step 1302 again.

In the journal restore process described above, only the journal having the Timestamp 702 before the restoreable time stamp is the target of the restore process. The reason for this will be outlined with reference to FIG.

15,

elements

7001, 7002, 7003, and 7004 represent journal groups that are already stored in the cache of the secondary storage device 1b. A plurality of boxes described in FIG. 15 each represent a journal, and a numerical value such as “15:00” described in the journal represents a time stamp included in the journal.

The journal group in the element 7001 is a journal to be stored in the SSD # 1. Similarly, journal groups in the

elements

7002, 7003, and 7004 are journals to be stored in the SSDs # 2, # 3, and # 4, respectively. All of SSD # 1 to SSD # 4 belong to the same CTG.

In the example of FIG. 15, the newest time stamp among the journals to be stored in the SSD # 1 is “15:00”, and the latest time stamp among the journals to be stored in the SSD # 2 is “15: 02 ”, the latest time stamp among the journals to be stored in SSD # 3 is“ 15:03 ”, and the latest time stamp among the journals to be stored in SSD # 4 is“ 15:04 ”. It is. The time stamps listed here are stored in the arrived T / S 657 of each SSD 21 in the secondary journal management information 650. As described above, since the time stamp is the time when the write command is issued from the host 2 to the primary storage device 1a, the journal data included in each journal is written to the P-VOL at the time indicated by the time stamp. Means that. When these journal data are written (reflected) in the S-VOL of the secondary storage device 1b, data consistency can be guaranteed if they are written in the order of the time indicated by the time stamp.

In the state described in FIG. 15, the secondary storage device 1 b includes a journal that includes a time stamp after 15:00 among journals to be stored in the SSDs # 2, # 3, and # 4. However, among the journals to be stored in the SSD # 1, there is no journal including a time stamp later than 15:00. However, since journal transmission processing from the primary storage device 1a to the secondary storage device 1b is performed for each SSD 21, a journal including a time stamp later than 15:00 (to be stored in SSD # 1) (Journal) may arrive. For example, as shown in FIG. 15, a journal with a time stamp of 15:01 may exist in the SSD 21 of the primary storage device 1a in an untransmitted state to the secondary storage device 1b (7001 ′). ).

If all the journals that have arrived at the secondary storage device 1b are restored at this point, for example, if a failure occurs in the primary storage device 1a after this restore processing, the secondary storage device 1b performs a time stamp. The 15:03 journal (SSD # 3 journal) and the 15:04 time stamp (SSD # 3 journal) are reflected in the S-VOL, but the 15:01 time stamp is S- The state is not reflected in the VOL. Originally, the journal with a time stamp of 15:03 (SSD # 3 journal) and the journal with a time stamp of 15:04 (SSD # 3 journal) are reflected in the S-VOL. Although the state reflected in the S-VOL is in a data consistent state, if all the journals that have arrived at the secondary storage system 1b in the state of FIG. 15 are restored, the data consistency is guaranteed. become unable.

In the secondary storage apparatus according to the present embodiment, in order to ensure data consistency, the process of step 1301 described above is performed to identify the arrival time stamp of the journal to be stored in each SSD 21 in the CTG. Further, the time stamp of the oldest time among the identified arrival time stamps is determined as a restoreable time stamp. In the example of FIG. 15, the arrived T / S657 of SSD # 1 is “15:00”, the arrived T / S657 of SSD # 2 is “15:02”, and the arrived T / S657 of SSD # 3 is At “15:03”, the arrived T / S657 of SSD # 4 is “15:04”. Of each arrival time stamp 657, the oldest time is the arrival T / S657 (that is, 15:00) of SSD # 1. For this reason, the secondary storage system 1b treats only journals with a time stamp of 15:00 or earlier as restore processing targets. As a result, even if a failure occurs after the restore process, each S-VOL belonging to the CTG is in a state where data consistency is guaranteed.

The above is the remote copy process flow in the storage system according to the first embodiment of the present invention. In the storage system according to the first embodiment of the present invention, the storage device (SSD) included in the primary storage apparatus performs journal creation processing, so the processing load on the storage controller can be reduced. When the storage controller creates a journal, the storage controller needs to write write data to the storage device (P-VOL) and write a journal to the storage device (journal volume). In the storage apparatus according to the present embodiment, the journal creation command and the write data need only be transmitted once to the storage device, so that the data transfer amount between the storage controller and the storage device can be reduced.

Also, in the secondary storage apparatus, the storage device (SSD) performs journal restore processing, so the processing load on the storage controller of the secondary storage apparatus can be reduced.

Further, in the storage system according to the first embodiment, a time stamp is inserted into the journal, and when the secondary storage device causes each SSD to perform journal restore processing, the journal to be restored is selected based on the time stamp included in the journal. This makes it possible to guarantee data consistency of a plurality of S-VOLs.

Subsequently, a computer system according to the second embodiment will be described. Since the hardware configuration of the computer system according to the second embodiment is the same as that described in the first embodiment, illustration is omitted.

First, the copy manager 6 not described in the first embodiment will be described. The copy manager 6 is a general-purpose computer such as a personal computer, similar to the host 2, and is connected to the storage apparatus 1a via the SAN 3. The copy manager 6 does not issue a read command or a write command to the logical volume of the storage apparatus 1, but periodically issues a command called “Freeze command” to the storage apparatus 1. Details will be described later.

Further, in the computer system according to the first embodiment, the write command issued by the host 2 needs to include time information (time stamp). On the other hand, in the computer system according to the second embodiment, the time information may not be included in the write command issued by the host 2.

In the second embodiment, some programs executed in the storage apparatus are different from those in the first embodiment, and the management information and the like to be created are slightly different from those described in the first embodiment. However, since many points are the same as those described in the first embodiment, when various hardware elements, management information, or programs are described in the second embodiment, the same elements as those in the first embodiment are used in the first embodiment. The same reference numbers are used for explanation. The following description will mainly focus on the differences from the first embodiment.

FIG. 16 is an example of a generation management table included in the primary storage device 1a according to the second embodiment. The primary storage device 1a according to the second embodiment manages time stamps in the storage device 1a. The time stamp does not necessarily need to be information representing time, such as hours and minutes. Information that can identify the write order of the plurality of write data received from the host 2 may be given to the journal. Therefore, in the primary storage system 1a according to the second embodiment, a kind of serial number is used as the time stamp instead of the time information. This serial number is referred to as a time stamp in the second embodiment. Alternatively, this serial number may be referred to as a “generation number”. The generation number may be abbreviated as “generation #”. However, as another embodiment, time information (information representing time, such as hours and minutes) may be used instead of the generation number.

In the generation management table 900, time stamps are managed for each CTG, and therefore, information about the CTG is stored in each row of the generation management table 900. The contents of each column of the generation management table 900 will be described with reference to FIG. CTG # 901 stores CTG #, and DEV # 902 stores the identification number of the storage device (SSD21) belonging to the CTG specified by CTG # 901.

The time stamp 903 stores the generation number of the CTG specified by CTG # 901. The initial value of the time stamp 903 is 0. As described above, the time stamp 903 may be written as “generation # 903”.

Similar to the first embodiment, in the storage system of the second embodiment, the time stamp (generation number) is stored in the JNCB when the primary storage device 1a creates a journal, and when the secondary storage device 1b performs the journal restore process. Referenced.

Time stamp (generation number) update processing will be described with reference to FIG. In the primary storage device 1a according to the second embodiment, a generation update program is executed. On the other hand, as described above, the copy manager 6 executes a program for periodically issuing a Freeze command. In this embodiment, this program is called “RAID manager”. The time stamp is updated by executing the generation update program and the RAID manager. In FIG. 17, steps 2001 and 2002 are processes executed by the RAID manager, and steps 2101 to 2103 are executed by the generation update program. The RAID manager starts processing when, for example, a user (storage system administrator) instructs activation.

When the RAID manager is activated, the RAID manager issues a Freeze command to the primary storage apparatus 1a (step 2001). Subsequently, the RAID manager waits for a certain time (for example, 1 second or 0.1 second) (step 2002), and then executes step 2001 again. The RAID manager repeats this process.

On the other hand, the generation update program that has received the Freeze command issued by the RAID manager in step 2001 performs the following processing.

Step 2101: When the generation update program receives the Freeze command, the generation update program temporarily stops the I / O processing for all the logical volumes (and physical volumes constituting the logical volume) of the storage apparatus 1a. For example, in the storage device 1a, the write program or the like described in the first embodiment is executed, and the process shown in FIG. 12 is executed. However, the generation update program interrupts the process such as the write program. As a result, in the primary storage apparatus 1a, journal creation processing and the like are temporarily stopped.

Step 2102: The generation update program adds 1 to the value of the time stamp 903 in each row of the generation management table 900.

Step 2103: After step 2102, the generation update program restarts the I / O processing stopped in step 2101.

By performing the above processing by the RAID manager and the generation update program, the value of the time stamp 903 in the generation management table 900 is periodically increased by one. That is, the Freeze command issued from the copy manager 6 can be said to be a command for instructing update (addition) of the value of the time stamp 903.

In the example described above, when the storage apparatus 1a receives the Freeze command, the I / O processing of all logical volumes is temporarily stopped. In response to receipt of the Freeze command, the time stamps 903 of all the rows in the generation management table 900 are updated. However, as another embodiment, these processes may be performed for each CTG.

An example will be described. The RAID manager issues a Freeze command including CTG # to the storage apparatus 1a. An example in which CTG # included in the Freeze command is “n” will be described below. Upon receiving this command, the generation update program may add 1 to the value of the time stamp 903 in the row of the generation management table 900 where CTG # 901 is “n” in step 2102. In this case, the generation update program only has to interrupt access (I / O processing) to the physical volume belonging to CTG # n in step 2101.

In the following, the flow of journal creation processing and journal restoration processing performed in the computer system according to the second embodiment will be described, but the flow of these processing is almost the same as that described in the first embodiment. The flow of journal creation processing and journal restoration processing performed in the computer system according to the second embodiment will be described mainly with reference to the drawings (FIGS. 12 to 14) used in the first embodiment.

First, the flow of processing performed when the primary storage apparatus 1a according to the second embodiment receives a write command from the host 2 will be described. Since many of the processes are the same as those described in the first embodiment, the following description focuses on differences from the processes described in the first embodiment.

First, Step 1001 to Step 1004 are the same as the processing described in the first embodiment. As described above, in the computer system according to the second embodiment, the write command issued by the host 2 does not include a time stamp (the write command may include a time stamp, The time stamp is not used in the storage apparatus 1 according to the second embodiment).

When the primary storage apparatus 1a creates a journal creation command in step 1005, the I / O program 101 uses the time stamp (generation #) 903 stored in the generation management table 900. For example, when the SSD 21 to be instructed to create a journal belongs to CTG # 1, the I / O program 101 includes the value stored in the time stamp 903 of the row where the CTG # 901 is “1” in the Timestamp 752. A journal creation command is created and issued to the SSD 21. In other respects, the I / O program 101 performs the same processing as that described in the first embodiment.

Since the processing performed by the SSD 21 that has received the journal creation command is not different between the first embodiment and the second embodiment, a description thereof will be omitted.

Subsequently, a flow of processing when a journal is transmitted from the primary storage apparatus 1a to the secondary storage apparatus 1b in the storage system according to the second embodiment will be described with reference to FIGS. The processing performed in the second embodiment is the same as that described in the first embodiment except that steps 1107 and 1108 in FIG. 13 are replaced with steps 1107 'and 1108' in FIG. Therefore, hereinafter, the processing (steps 1105 to 1108) performed in the secondary storage device 1b will be described with reference to FIG. As in the first embodiment, in the following description, an example in which the SSD 21 specified by the journal read command issued by the secondary storage apparatus 1b is SSD # x will be described.

Step 1105 is the same as the processing described in the first embodiment.

After step 1105, the journal read program 151 determines whether there is a value obtained by subtracting 1 from the value of Timestamp 702 among the received JNCBs that is larger than the arrival generation # 657 ′ of SSD #x (step 1107 ′). . If this determination is affirmative (step 1107 ′: Y), the journal read program 151 adds 1 to the arrived T / S 657 of SSD # x (step 1108 ′), and then ends the processing.

When the generation update program is executed in the primary storage apparatus 1a, the time stamp 903 of the generation management table 900 is periodically increased, but a plurality of journals in which the same value is stored in the JNCB Timestamp 702 may be created. is there. For example, when the secondary storage apparatus 1b receives a journal whose Timestamp 702 is x (x is an integer value equal to or greater than 1), the journal whose Timestamp 702 is x may still remain in the primary storage apparatus 1a without being transmitted. There is sex. Conversely, when the secondary storage apparatus 1b receives a journal whose Timestamp 702 is (x + 1), it is guaranteed that all journals whose Timestamp 702 is x have arrived at the secondary storage apparatus 1b.

Therefore, the secondary storage device 1b executes steps 1107 ′ and 1108 ′, and when it is confirmed that all journals whose Timestamp 702 is x have arrived at the secondary storage device 1b, the arrival of the secondary journal management information 650 is reached. X is stored in the completed time stamp 657. Thereafter, when the secondary storage device 1b receives a journal whose Timestamp 702 is (x + 2), it means that all journals whose Timestamp 702 is (x + 1) have been received by the secondary storage device 1b. By executing 1107 ′ and 1108 ′, (x + 1) is stored in the arrival time stamp 657.

Finally, the journal restoration process performed in the secondary storage system 1b according to the second embodiment will be described. As described in the first embodiment, in the journal restoration process, data consistency is guaranteed by selecting a journal to be restored using the Timestamp 702 included in the arrived T / S 657 or the journal (JNCB). I do. The same processing is performed in the journal restoration processing in the second embodiment. The only difference is that the Timestamp 702 included in the arrived T / S 657 or journal (JNCB) is not the time information (the time received together with the write command from the host 2), but the generation number managed by the primary storage system 1a. In other respects, the journal restoration process in the second embodiment is the same as the process described in the first embodiment.

This is the end of the description of the second embodiment. In the storage system according to the second embodiment, the time stamp included in the write command issued by the host 2 becomes unnecessary, so even if the host 2 that does not have the function of issuing the write command including the time stamp is used, the data Consistency can be guaranteed.

In the above description, an example is described in which the storage apparatus 1a manages the time stamp (generation number), and the storage apparatus 1a adds 1 to the generation number managed by itself in response to receiving the Freeze command from the copy manager 6. did. However, as another embodiment, the copy manager 6 may be configured to manage time stamps.

For example, the copy manager 6 includes a time stamp (for example, a generation number) in the Freeze command and transmits it to the storage device 1a, and is configured to add 1 to the generation number managed by itself when transmitting the Freeze command. It may be. When the storage apparatus 1a receives the Freeze command from the copy manager 6, the generation number included in the Freeze command may be stored in the generation management table 900. Note that the copy manager 6 may be configured to transmit time information as a time stamp. In that case, the copy manager 6 may include the time information (current time) acquired from the clock of the copy manager 6 in the Freeze command when the Freeze command is transmitted.

Subsequently, a computer system according to Embodiment 3 of the present invention will be described. The hardware configuration of the computer system according to the third embodiment is the same as that described in the first or second embodiment. The processing executed by the storage controller of the storage apparatus according to the third embodiment is also the same as that described in the first or second embodiment. For example, the write operation described in the first embodiment is also performed in the storage controller according to the third embodiment. The same processing as the processing, journal transmission processing, and journal restoration processing (FIGS. 12 to 14) is performed. Therefore, when describing various hardware elements, management information, or programs, the same reference numerals as those used in the first or second embodiment will be used.

In the storage device according to the third embodiment, when the SSD 21 receives a journal creation command or a journal restore command, the processing for writing data to a physical page is different from that described in the first or second embodiment. The difference between the process performed in the third embodiment and the process performed in the first or second embodiment will be outlined with reference to FIG.

FIG. 19A illustrates a logical page and a physical page immediately after the SSD 21 according to the first embodiment (or the second embodiment) stores the data specified by the journal creation command in the physical volumes (P-VOL and JVOL). It is a conceptual diagram showing the mapping state of. Here, with reference to FIG. 19, processing performed in the SSD 21, particularly processing in steps 1051 to 1054 in FIG. 12 will be described.

When the P-VOL data write destination location specified by the journal creation command is logical page a in FIG. 19A and the journal volume data (journal data) write destination location is logical page h. Suppose. In order to avoid complicated description, a case will be described here in which the write target data size is equal to the logical page size, and the beginning and end of the data write destination area coincide with the logical page boundary.

First, in step 1053, the SSD 21 selects one unused physical page (a physical page to which data has not yet been written). Here, it is assumed that physical page C is selected. Subsequently, the SSD 21 writes data to the physical page C, and maps the physical page C to the logical page a.

Next, in step 1054, the SSD 21 selects one unused physical page again. Here, it is assumed that the physical page E is selected. Subsequently, the SSD 21 writes journal data to the physical page E, and maps the physical page E to the logical page h. In the first or second embodiment, data (journal) is written to the FM chip 206 (physical page) in this way.

Also in the SSD 21 according to the third embodiment, data is written to the P-VOL and the journal volume by executing the steps 1051 to 1054 (FIG. 12) described in the first embodiment. However, since the data writing process to the physical page is different from the process described in the first embodiment, the mapping between the logical page and the physical page after the execution of steps 1051 to 1054 is the same as that described in the first embodiment (see FIG. 19 (a)).

FIG. 19B illustrates the mapping state between the logical page and the physical page immediately after Steps 1051 to 1054 are executed in the SSD 21 according to the third embodiment and data is written to the P-VOL and the journal volume. ing. As in the example described above, the data write destination position of the P-VOL specified by the journal creation command is the logical page a in FIG. 19B, and data (journal data) is written to the journal volume. Assume that the previous position is logical page h.

The following processing is performed in the SSD 21 according to the third embodiment. First, the processing performed in step 1053 is the same as that described above. That is, data is written to an unused physical page C, and the physical page C is mapped to the logical page a.

However, in step 1054, the following processing is performed. In step 1054, the SSD 21 maps the physical page C written in step 1053 to the logical page h. No unused physical page is selected or data is not written to the physical page. As described above, the data written in the P-VOL and the data written in the journal volume (journal data) by the SSD 21 that has received the journal creation command are the same data. Therefore, in the SSD 21 according to the third embodiment, the physical page (physical page C) to which data has been written in step 1053 is divided into a logical page (logical page a) on the P-VOL and a logical page (logical page on the journal volume). By mapping to both of h), it is possible to suppress redundant data writing to the physical page.

Subsequently, when the physical page (physical page C) is mapped to both the logical page (logical page a) on the P-VOL and the logical page (logical page h) on the journal volume, the SSD 21 becomes a logical page. The process when a write (update) request is received is outlined. For example, when the SSD 21 receives a write (update) request for the logical page a, the SSD 21 writes update data to an unused physical page (hereinafter referred to as “physical page D”), and the physical page D to the logical page a. To map.

In general, when an update (overwrite) is performed on a logical page a, the physical page mapped to the logical page is unmapped and becomes an unallocated state (the status 1453 of the physical page management table is “unassigned”). Changed to "allocation"). However, in the SSD 21 according to the third embodiment, when the physical page C is mapped to both the logical page a and the logical page h, the physical page C is not placed in an unallocated state. This is because when the physical page C is changed to an unallocated state, data written to the logical page h of the journal volume is substantially lost.

The same processing is performed when the SSD 21 receives a write request for the logical page h on the journal volume.

The processing of the SSD 21 according to the third embodiment when receiving the journal creation command has been described above. However, the SSD 21 according to the third embodiment performs the same processing as the processing described above when receiving the journal restore command. Do.

This will be described with reference to FIG. The SSD 21 that has received the journal restore command writes the journal data 859 in the journal restore command to the journal volume in step 1351. At this time, the SSD 21 writes the journal data 859 to an unused physical page (assumed to be physical page C). The physical page C is mapped to a logical page (assumed to be a logical page h). In step 1352, the journal data 859 is written to the S-VOL. At this time, the SSD 21 maps the physical page C in which the journal data 859 is written in step 1351 to a logical page (assumed to be logical page a). No unused physical page is selected or data is not written to the physical page. Thereby, it is possible to prevent the data writing to the physical page from being performed redundantly.

Further, after step 1353 of FIG. 14 is performed, it is not necessary that journal data remain in the journal volume. Therefore, after step 1353, the SSD 21 releases the mapping between the logical page h and the physical page C of the journal volume. Specifically, the SSD 21 may change the status 1404 of the row corresponding to the logical page h to “unallocated” among the rows of the logical-physical conversion table 1400, and set the physical page # 1405 to NULL. However, the mapping between the S-VOL logical page a and the physical page C is not released.

Subsequently, a computer system according to the fourth embodiment will be described with reference to FIG. The computer system according to the fourth embodiment has a configuration in which a primary computer 1 a ′ and a secondary computer 1 b ′ are connected via a WAN 4.

The primary computer 1a 'and the secondary computer 1b' are general-purpose computers such as a server and a personal computer similar to the host 2 in the first embodiment. The primary computer 1a 'includes a CPU 12' (12'-1, 12'-2), a memory 13 ', a network I / F 14', a device I / F 15 ', and a plurality of SSDs 21. The SSD 21 is the same as the SSD 21 described in the first, second, or third embodiment.

The network I / F 14 ′ is an interface for connecting the primary computer 1 a ′ to the WAN 4. The device I / F 15 ′ is an interface for connecting a plurality of SSDs 21 and has the same function as the BE I / F 15 in the first embodiment.

The CPU 12 'is similar to a microprocessor included in a known server or personal computer, and the memory 13' is a volatile storage medium included in a known server or personal computer such as a DRAM. In FIG. 20, two CPUs 12 '(12'-1, 12'-2) are shown, but the number of CPUs 12' is not limited to two. Three or more CPUs 12 'may be mounted on the primary computer 1a', or only one CPU 12 'may be mounted. Alternatively, instead of the plurality of CPUs 12 ', one multi-core processor having a plurality of processor cores may be mounted on the primary computer 1a'.

Further, since the configuration of the secondary computer 1b 'is the same as that of the primary computer 1a', the illustration and description of the configuration are omitted.

The primary computer 1a 'operates as a device having the functions of the host 2 and the primary storage device 1a in the first embodiment. For example, the CPU 12 of the primary computer 1a 'may be configured to form at least two VMs by executing a hypervisor that is a program for forming one or more virtual machines (VMs). The hypervisor is well-known software (program), and logically divides the physical resources (CPU 12 ′, memory 13 ′, etc.) of the primary computer 1a ′, and forms one or a plurality of VMs using the divided physical resources. . The first VM formed on the primary computer 1a 'is a VM on which a program executed on the host 2 in the first embodiment is executed. In the present embodiment, this VM is called a "host VM". The second VM is a VM that realizes the function of the primary storage device 1a in the first embodiment. In the present embodiment, this VM is called a “storage VM”.

In the storage VM, a logical volume is provided to the host VM by executing a program equivalent to the I / O program 101 described in the first embodiment (referred to as an I / O program 101 '). The host VM and the storage VM are configured to be communicable, and the host VM can issue an I / O request (write command or read command) to the logical volume. When the I / O program 101 ′ receives a write command from the host VM, the physical volume (in the SSD 21) (for example, by executing the processing described in the first embodiment (step 1001 to step 1013 in FIG. 12)). Write data (or journal) is stored in the P-VOL and journal volume.

The secondary computer 1b 'has the function of the secondary storage device 1b in the first embodiment. For example, similarly to the primary computer 1a ', the CPU 12 of the secondary computer 1b' may execute the hypervisor to form the storage VM. In the storage VM of the secondary computer 1b ′, a program equivalent to the journal read program 151 described in the first embodiment (referred to as journal read program 151 ′) and a program equivalent to the destage program 152 (referred to as destage program 152 ′). ) Is executed. As a result, the primary computer 1a 'and the secondary computer 1b' perform the processing described in the first embodiment, that is, the journal transmission and journal restoration processing described in FIGS. That is, the computer system according to the fourth embodiment has the same functions as those described in the first embodiment, although the hardware configuration is different from that of the computer system described in the first embodiment.

Here, an example has been described in which the computer system according to the fourth embodiment has the same function as the function described in the first embodiment. However, the computer system according to the fourth embodiment has the same function as the computer system according to the second embodiment. May be provided. In this case, in the primary computer 1a ′, in addition to the host VM and the storage VM, a VM that executes the program (RAID manager) executed by the copy manager 6 described in the second embodiment (referred to as “manager VM”) is also included. The manager VM may periodically issue a Freeze command to the storage VM. Alternatively, the RAID manager may be executed by the host VM.

The embodiment of the present invention has been described above, but this is an example for explaining the present invention, and is not intended to limit the present invention to the embodiment described above. The present invention can be implemented in various other forms.

For example, in the embodiment described above, the example in which redundant data (patty) created in the primary storage device 1a is also stored in the journal volume and copied to the secondary storage device 1b has been described. However, the redundant data is not copied from the primary storage device 1a to the secondary storage device 1b, and the redundant data stored in the secondary storage device 1b is based on the data (journal) transmitted from the primary storage device 1a. 1b may be created.

Also, it is not essential to generate redundant data. In addition, it is not essential that a volume (logical volume) provided to a requester such as a host is formed from a plurality of physical volumes. For example, the primary storage apparatus 1a and the primary computer 1a 'may provide one physical volume provided by the SSD 21 to the host (or host VM) as one logical volume.

1a: Primary storage device, 1b: Secondary storage device, 2: Host, 3: SAN, 4: WAN, 5: Management host, 6: Copy manager

Claims

A primary system having a first processor, a first memory, and a plurality of primary storage devices for storing write data from the requester;
An information processing system connected to the primary system and having a second processor, a second memory, and a plurality of secondary storage devices for storing a copy of data written to the primary storage device Because
Each of the primary storage devices has one or more volumes and a journal volume for storing a journal to be transferred to the secondary storage device,
Each of the secondary storage devices has one or more volumes and a journal volume for storing a journal acquired from the primary system,
The first processor is a command for instructing writing of the write data to the volume, and in response to receiving a write command including a time stamp, generates a journal creation command including the write data and the time stamp. Create and send to the primary storage device,
The primary storage device that has received the journal creation command is:
Storing the write data in the volume;
Creating a journal including the write data, information on the write position of the write data, a sequence number managed by the primary storage device and the time stamp, and storing the journal in the journal volume;
When the second processor acquires a plurality of the journals stored in the journal volume of the plurality of primary storage devices from the primary system, the second processor stores the acquired journals in the second memory;
Determining the journal that can be sent to the secondary storage device based on the time stamp included in each of the journals;
Create a journal restore command based on the determined contents of the journal and issue it to the secondary storage device;
The secondary storage device that has received the journal restore command stores the write data included in the journal restore command in the volume of the secondary storage device.
Information processing system.
The journal created by the nth primary storage device (1 ≦ n ≦ N) among the plurality of primary storage devices is configured to be transmitted to the nth secondary storage device among the plurality of secondary storage devices. And
The second processor is
The nth maximum time stamp, which is the maximum value of the time stamp included in the journal group whose destination is the nth secondary storage device, is managed for each secondary storage device among the plurality of acquired journals. And
When determining the journal that can be transmitted to the secondary storage device, a restoreable time stamp that is a minimum value from the first maximum time stamp to the Nth maximum time stamp is specified,
Determining that, out of the plurality of acquired journals, a journal whose time stamp included in the journal is less than the restoreable time stamp can be transmitted to the secondary storage device;
The information processing system according to claim 1.
The second processor, when there are a plurality of the journals determined to be transmittable,
Of the journals determined to be transmittable, create a journal restore command in order from the journal with the smallest sequence number included in the journal, and issue it to the secondary storage device.
The information processing system according to claim 2.
The primary storage device uses a flash memory as a storage medium, the flash memory has a plurality of physical pages,
The volume and journal volume of the primary storage device are configured as a set of logical pages that are areas of the same size as the physical pages,
When the primary storage device receives a data write request for a logical page on the volume or the journal volume, the primary storage device stores the data in an unused physical page of the flash memory, and is designated by the write request. A logical page configured to map the physical page storing the data;
When the primary storage device receives the journal creation command,
Write data included in the journal creation command is stored in one of the plurality of physical pages;
Mapping the physical page storing the write data to a logical page on the journal volume and mapping to a logical page on the volume;
The information processing system according to claim 2.
The secondary storage device uses a flash memory as a storage medium, the flash memory has a plurality of physical pages,
The volume and journal volume of the secondary storage device are configured as a set of logical pages that are areas of the same size as the physical page,
When the secondary storage device receives a data write request for a logical page on the volume or the journal volume, the secondary storage device stores the data in an unused physical page of the flash memory and is designated by the write request. A logical page configured to map the physical page storing the data;
When the secondary storage device receives the journal restore command,
Write data included in the journal restore command is stored in one of the plurality of physical pages;
Mapping the physical page storing the write data to a logical page on the journal volume and mapping to a logical page on the volume;
The information processing system according to claim 2.
The primary system holds the generation number in the first memory;
When the first processor receives a command instructing to update the generation number from an external device connected to the primary system,
Suspending access to the primary storage device being executed by the first processor;
After increasing the generation number held in the first memory, resuming access to the suspended primary storage device,
When the first processor creates a journal creation command, the generation number is included in the journal creation command instead of the time stamp.
The information processing system according to claim 2.
The requester is a computer connected to the primary system via a storage area network (SAN).
The information processing system according to claim 1.
A primary system having a processor, a memory, and a plurality of storage devices for storing write data from a requester, and connected to a sub-system in which a copy of the write data is stored;
Each of the storage devices has one or more volumes and a journal volume for storing a journal to be transferred to the secondary system,
The processor creates a journal creation command that includes the write data and the time stamp in response to receiving a write command that is a command to write the write data to the volume and includes a time stamp. To the storage device,
The storage device that has received the journal creation command
Storing the write data in the volume;
Creating a journal including the write data, information about the write position of the write data, a sequence number managed by the storage device, and the time stamp, and storing the journal in the journal volume;
Positive system.
The storage device uses a flash memory as a storage medium, the flash memory has a plurality of physical pages,
The volume and journal volume of the storage device are configured as a set of logical pages that are areas of the same size as the physical page,
When the storage device accepts a data write request to a logical page on the volume or the journal volume, the storage device stores the data in an unused physical page of the flash memory, and the logical device specified by the write request A page is configured to map the physical page storing the data;
When the storage device receives the journal creation command,
Write data included in the journal creation command is stored in one of the plurality of physical pages;
Mapping the physical page storing the write data to a logical page on the journal volume and mapping to a logical page on the volume;
The primary system of claim 8.
The primary system holds the generation number in the memory,
When the processor receives a command to update the generation number from an external device connected to the primary system,
Suspending access to the storage device being executed by the processor;
After increasing the generation number held in the memory, resume access to the suspended storage device;
When the processor creates a journal creation command, the generation number is included in the journal creation command instead of the time stamp.
The primary system of claim 8.